AI & ML interests

In the following you find models tuned to be used for sentence / text embedding generation. They can be used with the sentence-transformers package.

Recent Activity

tomaarsen  updated a collection about 8 hours ago
NanoBEIR datasets
tomaarsen  updated a collection about 11 hours ago
NanoBEIR datasets
tomaarsen  updated a collection about 11 hours ago
NanoBEIR datasets
View all activity

sentence-transformers 's collections 5

Embedding Model Datasets
A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers
MS MARCO Mined Triplets
These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets.
NanoBEIR 🍺with BM25 Rankings
NanoBEIR by Zeta Alpha, extended with BM25 scores. Used in the Sentence Transformers CrossEncoderNanoBEIREvaluator prior to ST version 5.2.
Embedding Model Datasets
A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers
Parallel Sentences Datasets
These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual.
MS MARCO Mined Triplets
These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets.
NanoBEIR 🍺with BM25 Rankings
NanoBEIR by Zeta Alpha, extended with BM25 scores. Used in the Sentence Transformers CrossEncoderNanoBEIREvaluator prior to ST version 5.2.