Release list

Unitxt 1.26.10 Latest

Latest

yoavkatz released this 27 May 10:41

1.26.10

3989798

Security Fixes

Fix CWE-95 (Eval Injection) in _get_torch_dtype() (#1964)
Upgrade vulnerable dependencies in TORR benchmark requirements (#1968)

Bug Fixes

Fix qa evaluation data classification policy (#1962)
Update inference tests for WatsonX model deprecations and API changes (#1969)
CI compatibility fixes (HF_TOKEN, arena-hard migration, datasets 4.8.5, huggingface_hub 1.16, numpy 2.0, and pandas 3.0) (#1966)

What's Changed

Fix qa evaluation data classification policy by @yoavkatz in #1962
fix: CI compatibility fixes (HF_TOKEN, arena-hard migration, datasets 4.8.5) by @yoavkatz in #1966
fix: upgrade vulnerable dependencies in TORR benchmark requirements by @csrajmohan in #1968
fix: Update inference tests for WatsonX model deprecations and API changes by @yoavkatz in #1969
Security: Fix CWE-95 (Eval Injection) in _get_torch_dtype() by @yoavkatz in #1964
fix: Compatibility with huggingface_hub 1.16, numpy 2.0, and pandas 3.0 by @yoavkatz in #1971

Full Changelog: 1.26.9...1.26.10

Contributors

csrajmohan and yoavkatz

Assets 2

Unitxt 1.26.9

elronbandel released this 13 Jan 09:31

1.26.9

20b951e

What's Changed

lazy import of scipy by @assaftibm in #1959
Fix duplicate-column sorting issue in Text2SQL evaluation utils by @oktie in #1954
Update version to 1.26.9 by @elronbandel in #1961

Full Changelog: 1.26.8...1.26.9

Contributors

oktie, elronbandel, and assaftibm

Assets 2

Unitxt 1.26.8

elronbandel released this 06 Jan 12:22

1.26.8

ea47e51

What's Changed

add ollama classification engine by @lilacheden in #1955
make ollamaInferenceEngine handle return_meta_data by @lilacheden in #1956
lazy import of evaluate by @assaftibm in #1957
Update version to 1.26.8 by @elronbandel in #1958

Full Changelog: 1.26.7...1.26.8

Contributors

elronbandel, assaftibm, and lilacheden

Assets 2

Unitxt 1.26.7

elronbandel released this 03 Dec 15:49

1.26.7

f24c1be

What's Changed

Fix examples by @elronbandel in #1907
Fix inference tests by @elronbandel in #1912
Minor text2sql metric fixes by @oktie in #1913
fix mtrag by @dafnapension in #1918
fix xlam's schema issues by @dafnapension in #1917
Add ReflectionToolCallingMetricSyntactic for evaluating tool call predictions referenceless by @korenLazar in #1923
Divide biggen bench into multigual and not multilingual by @martinscooper in #1922
remove redundant split from airbench2024 by @dafnapension in #1928
Revert BigGen Benchmark partition by @martinscooper in #1924
fixed spit names in wiki_bio by @dafnapension in #1925
Fix erroneous prompts in evaluation tasks (and clean some json-schema-wise) by @dafnapension in #1920
fix the only 4 erroneous global_mmlu cards that do not pass _source_to_dataset by @dafnapension in #1916
Normalize llm judge bench target variable by @martinscooper in #1933
Improved multi turn evaluation to be self contained and use LLM as judge by @yoavkatz in #1929
Add more RAG judges by @arielge in #1934
Add ReflectionToolCallingMetric and update related metrics by @korenLazar in #1931
potential fix for preparation file: prepare/cards/mtrag.py by @dafnapension in #1938
Lazy load vectara hhem model because it is gated in HF by @yoavkatz in #1946
Fixed missing sampling_seed in DiverseLabelsSampler by @yoavkatz in #1941
Correct reflection based tool calling metrics so valid results will be 1. by @yoavkatz in #1940
Rag metric update again by @dafnapension in #1948
fix gpt-oss classification inference engines by @lilacheden in #1952
Update version to 1.26.7 by @elronbandel in #1953

New Contributors

@korenLazar made their first contribution in #1923

Full Changelog: 1.26.6...1.26.7

Contributors

oktie, martinscooper, and 6 other contributors

Assets 2

Unitxt 1.26.6

elronbandel released this 07 Aug 08:44

1.26.6

076649a

What's Changed

Update pearsonr tests by @elronbandel in #1890
return source_to_recipe to performance evaluation, once 403 is fixed by bnayahu by @dafnapension in #1891
remove a card whose preprocess_steps do not match the contents of the loaded dataset by @dafnapension in #1893
fix an ineffective setting of max size of loader_cache by @dafnapension in #1892
Fix compatibility with datasets 4.0 by @elronbandel in #1861
Improve speed in mmlu global by @elronbandel in #1895
Remove the need for datasets<4.0.0 by @elronbandel in #1897
Refresh README by @elronbandel in #1898
Update Readme by @elronbandel in #1899
Update README by @elronbandel in #1900
Update README by @elronbandel in #1901
Fix docs and example of how to use benchmark by @elronbandel in #1903
Refine condition for avoiding the Benchmark wrapper by @bnayahu in #1904
Complete transition to datasets 4.0.0 in preparation tests by @dafnapension in #1902
Make sacrebleu faster and more efficient by @elronbandel in #1906
Implements LogProbEngine on CrossInference and adds more granite guardian models by @martinscooper in #1905
Remove IBM GenAI support and moved legacy GenAI metrics to use CrossProviderInferenceEngine by @yoavkatz in #1508
GPT on rits and minor llm judge criteria changes by @martinscooper in #1909
The special installation of networkx can be removed as well by @dafnapension in #1908
Update version to 1.26.6 by @elronbandel in #1911

Full Changelog: 1.26.5...1.26.6

Contributors

martinscooper, bnayahu, and 3 other contributors

Assets 2

Unitxt 1.26.5

elronbandel released this 31 Jul 14:10

1.26.5

8eb8974

What's Changed

For load_dataset, use_cache default value is taken from settings by @eladven in #1880
Support watsonx.ai on-prem credentials by @pratapkishorevarma in #1883
extend condition to also filter by field exists or not by @dafnapension in #1879
fix performance test by @dafnapension in #1884
Add support for inline-defined templates in the UI by @Chemafiz in #1886
Mitigate HTTP 403 errors in pandas by @bnayahu in #1888
Biggen benchmark and pearson correlation metric by @martinscooper in #1887
Update version to 1.26.5 by @elronbandel in #1889

New Contributors

@pratapkishorevarma made their first contribution in #1883
@Chemafiz made their first contribution in #1886

Full Changelog: 1.26.4...1.26.5

Contributors

eladven, martinscooper, and 5 other contributors

Assets 2

Unitxt 1.26.4

elronbandel released this 22 Jul 14:35

1.26.4

83063f9

What's Changed

Add more Judgebench benchmarks by @martinscooper in #1869
Make sqlite3 not an optional dependency by @elronbandel in #1871
Removed legacy topicality, idk, and groundness metrics that worked only on BAM by @yoavkatz in #1875
Bench and models by @martinscooper in #1872
Handle a case in ToolCallPostProcessor where prediction is an empty list of tools by @yoavkatz in #1874
Update version to 1.26.4 by @elronbandel in #1876

Full Changelog: 1.26.3...1.26.4

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

Unitxt 1.26.3

elronbandel released this 16 Jul 17:47

1.26.3

728fcc8

What's Changed

LLM Judge: Improve context/prediction fields parsing by @martinscooper in #1856
Fixed bug in tool inference by @yoavkatz in #1868
Added a new MetricBasedNer that allows calculating entity similary using any Unitxt metric. by @yoavkatz in #1860
Update version to 1.26.3 by @elronbandel in #1870

Full Changelog: 1.26.2...1.26.3

Contributors

martinscooper, elronbandel, and yoavkatz

Assets 2

Unitxt 1.26.2

elronbandel released this 16 Jul 09:44

1.26.2

68aa406

What's Changed

Add tot dataset by @elronbandel in #1865
Add tokenizer_name to base huggingface inference engines by @elronbandel in #1862
Add hf to cross provider inference engine by @yoavkatz in #1866
Update version to 1.26.2 by @elronbandel in #1867

Full Changelog: 1.26.1...1.26.2

Contributors

elronbandel and yoavkatz

Assets 2

Unitxt 1.26.1

elronbandel released this 10 Jul 17:27

1.26.1

b6cc840

Lock datasets dependency to <4.0.0

The latest datasets v4.0.0 release removes support for loading datasets with trust_remote_code=True. This change breaks compatibility with many datasets currently in the Unitxt catalog, as several datasets require this feature to load properly.

This patch restricts the datasets version to below 4.0.0 until we can find or develop replacements for affected datasets.

Assets 2

Uh oh!

Releases: IBM/unitxt

Release list

Unitxt 1.26.10

Security Fixes

Bug Fixes

What's Changed

Contributors

Uh oh!

Unitxt 1.26.9

What's Changed

Contributors

Uh oh!

Unitxt 1.26.8

What's Changed

Contributors

Uh oh!

Unitxt 1.26.7

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.6

What's Changed

Contributors

Uh oh!

Unitxt 1.26.5

What's Changed

New Contributors

Contributors

Uh oh!

Unitxt 1.26.4

What's Changed

Contributors

Uh oh!

Unitxt 1.26.3

What's Changed

Contributors

Uh oh!

Unitxt 1.26.2

What's Changed

Contributors

Uh oh!

Unitxt 1.26.1

Lock datasets dependency to <4.0.0

Uh oh!