Releases: IBM/unitxt
Releases · IBM/unitxt
Release list
Unitxt 1.26.10
Security Fixes
- Fix CWE-95 (Eval Injection) in _get_torch_dtype() (#1964)
- Upgrade vulnerable dependencies in TORR benchmark requirements (#1968)
Bug Fixes
- Fix qa evaluation data classification policy (#1962)
- Update inference tests for WatsonX model deprecations and API changes (#1969)
- CI compatibility fixes (HF_TOKEN, arena-hard migration, datasets 4.8.5, huggingface_hub 1.16, numpy 2.0, and pandas 3.0) (#1966)
What's Changed
- Fix qa evaluation data classification policy by @yoavkatz in #1962
- fix: CI compatibility fixes (HF_TOKEN, arena-hard migration, datasets 4.8.5) by @yoavkatz in #1966
- fix: upgrade vulnerable dependencies in TORR benchmark requirements by @csrajmohan in #1968
- fix: Update inference tests for WatsonX model deprecations and API changes by @yoavkatz in #1969
- Security: Fix CWE-95 (Eval Injection) in _get_torch_dtype() by @yoavkatz in #1964
- fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0 by @yoavkatz in #1971
Full Changelog: 1.26.9...1.26.10
Unitxt 1.26.9
What's Changed
- lazy import of scipy by @assaftibm in #1959
- Fix duplicate-column sorting issue in Text2SQL evaluation utils by @oktie in #1954
- Update version to 1.26.9 by @elronbandel in #1961
Full Changelog: 1.26.8...1.26.9
Unitxt 1.26.8
What's Changed
- add ollama classification engine by @lilacheden in #1955
- make ollamaInferenceEngine handle return_meta_data by @lilacheden in #1956
- lazy import of evaluate by @assaftibm in #1957
- Update version to 1.26.8 by @elronbandel in #1958
Full Changelog: 1.26.7...1.26.8
Unitxt 1.26.7
What's Changed
- Fix examples by @elronbandel in #1907
- Fix inference tests by @elronbandel in #1912
- Minor text2sql metric fixes by @oktie in #1913
- fix mtrag by @dafnapension in #1918
- fix xlam's schema issues by @dafnapension in #1917
- Add ReflectionToolCallingMetricSyntactic for evaluating tool call predictions referenceless by @korenLazar in #1923
- Divide biggen bench into multigual and not multilingual by @martinscooper in #1922
- remove redundant split from airbench2024 by @dafnapension in #1928
- Revert BigGen Benchmark partition by @martinscooper in #1924
- fixed spit names in wiki_bio by @dafnapension in #1925
- Fix erroneous prompts in evaluation tasks (and clean some json-schema-wise) by @dafnapension in #1920
- fix the only 4 erroneous global_mmlu cards that do not pass _source_to_dataset by @dafnapension in #1916
- Normalize llm judge bench target variable by @martinscooper in #1933
- Improved multi turn evaluation to be self contained and use LLM as judge by @yoavkatz in #1929
- Add more RAG judges by @arielge in #1934
- Add ReflectionToolCallingMetric and update related metrics by @korenLazar in #1931
- potential fix for preparation file: prepare/cards/mtrag.py by @dafnapension in #1938
- Lazy load vectara hhem model because it is gated in HF by @yoavkatz in #1946
- Fixed missing sampling_seed in DiverseLabelsSampler by @yoavkatz in #1941
- Correct reflection based tool calling metrics so valid results will be 1. by @yoavkatz in #1940
- Rag metric update again by @dafnapension in #1948
- fix gpt-oss classification inference engines by @lilacheden in #1952
- Update version to 1.26.7 by @elronbandel in #1953
New Contributors
- @korenLazar made their first contribution in #1923
Full Changelog: 1.26.6...1.26.7
Unitxt 1.26.6
What's Changed
- Update pearsonr tests by @elronbandel in #1890
- return source_to_recipe to performance evaluation, once 403 is fixed by bnayahu by @dafnapension in #1891
- remove a card whose preprocess_steps do not match the contents of the loaded dataset by @dafnapension in #1893
- fix an ineffective setting of max size of loader_cache by @dafnapension in #1892
- Fix compatibility with datasets 4.0 by @elronbandel in #1861
- Improve speed in mmlu global by @elronbandel in #1895
- Remove the need for datasets<4.0.0 by @elronbandel in #1897
- Refresh README by @elronbandel in #1898
- Update Readme by @elronbandel in #1899
- Update README by @elronbandel in #1900
- Update README by @elronbandel in #1901
- Fix docs and example of how to use benchmark by @elronbandel in #1903
- Refine condition for avoiding the Benchmark wrapper by @bnayahu in #1904
- Complete transition to datasets 4.0.0 in preparation tests by @dafnapension in #1902
- Make sacrebleu faster and more efficient by @elronbandel in #1906
- Implements LogProbEngine on CrossInference and adds more granite guardian models by @martinscooper in #1905
- Remove IBM GenAI support and moved legacy GenAI metrics to use CrossProviderInferenceEngine by @yoavkatz in #1508
- GPT on rits and minor llm judge criteria changes by @martinscooper in #1909
- The special installation of networkx can be removed as well by @dafnapension in #1908
- Update version to 1.26.6 by @elronbandel in #1911
Full Changelog: 1.26.5...1.26.6
Unitxt 1.26.5
What's Changed
- For load_dataset, use_cache default value is taken from settings by @eladven in #1880
- Support watsonx.ai on-prem credentials by @pratapkishorevarma in #1883
- extend condition to also filter by field exists or not by @dafnapension in #1879
- fix performance test by @dafnapension in #1884
- Add support for inline-defined templates in the UI by @Chemafiz in #1886
- Mitigate HTTP 403 errors in pandas by @bnayahu in #1888
- Biggen benchmark and pearson correlation metric by @martinscooper in #1887
- Update version to 1.26.5 by @elronbandel in #1889
New Contributors
- @pratapkishorevarma made their first contribution in #1883
- @Chemafiz made their first contribution in #1886
Full Changelog: 1.26.4...1.26.5
Unitxt 1.26.4
What's Changed
- Add more Judgebench benchmarks by @martinscooper in #1869
- Make sqlite3 not an optional dependency by @elronbandel in #1871
- Removed legacy topicality, idk, and groundness metrics that worked only on BAM by @yoavkatz in #1875
- Bench and models by @martinscooper in #1872
- Handle a case in ToolCallPostProcessor where prediction is an empty list of tools by @yoavkatz in #1874
- Update version to 1.26.4 by @elronbandel in #1876
Full Changelog: 1.26.3...1.26.4
Unitxt 1.26.3
What's Changed
- LLM Judge: Improve context/prediction fields parsing by @martinscooper in #1856
- Fixed bug in tool inference by @yoavkatz in #1868
- Added a new MetricBasedNer that allows calculating entity similary using any Unitxt metric. by @yoavkatz in #1860
- Update version to 1.26.3 by @elronbandel in #1870
Full Changelog: 1.26.2...1.26.3
Unitxt 1.26.2
What's Changed
- Add tot dataset by @elronbandel in #1865
- Add tokenizer_name to base huggingface inference engines by @elronbandel in #1862
- Add hf to cross provider inference engine by @yoavkatz in #1866
- Update version to 1.26.2 by @elronbandel in #1867
Full Changelog: 1.26.1...1.26.2
Unitxt 1.26.1
Lock datasets dependency to <4.0.0
The latest datasets v4.0.0 release removes support for loading datasets with trust_remote_code=True. This change breaks compatibility with many datasets currently in the Unitxt catalog, as several datasets require this feature to load properly.
This patch restricts the datasets version to below 4.0.0 until we can find or develop replacements for affected datasets.