AI Tools.

Search

sentence similarity models

77 models · ranked by HuggingFace downloads

all-MiniLM-L6-v2

Distilled BERT model that encodes sentences into 384-dimensional vectors for measuring semantic similarity. Trained on over a billion sentence pairs spanning scientific papers, web QA, NLI datasets, and community forums. At 22M parameters and 6 transformer layers, it is fast enough for CPU inference while remaining competitive on standard sentence similarity benchmarks.

239,973,503 ↓ · 4,754 ♡

paraphrase-multilingual-MiniLM-L12-v2

Multilingual sentence embedding model covering 50+ languages, built on a 12-layer distilled MiniLM architecture. Produces 384-dimensional vectors designed for semantic similarity and paraphrase detection across language boundaries. Trained on multilingual paraphrase data to align semantically equivalent sentences even when expressed in different languages.

44,875,889 ↓ · 1,218 ♡

all-mpnet-base-v2

Sentence embedding model based on the MPNet architecture, producing 768-dimensional vectors. Trained on over a billion sentence pairs from MS MARCO, NLI datasets, and community QA forums, it is frequently used when accuracy matters more than inference speed among English embedding models. The MPNet backbone enables masked and permuted prediction during pre-training for stronger representations.

36,513,639 ↓ · 1,287 ♡

bge-m3

BAAI's BGE-M3 embedding model supporting over 100 languages with a unified architecture capable of dense, sparse (lexical), and late-interaction (ColBERT-style) retrieval modes from a single checkpoint. Built on XLM-RoBERTa with large-scale multilingual training, it targets multi-lingual and cross-lingual retrieval where a single model must handle diverse language inputs.

20,983,869 ↓ · 2,977 ♡

nomic-embed-text-v1.5

Nomic Embed Text v1.5 is a matryoshka-capable English embedding model from Nomic AI, built on a custom nomic-BERT architecture trained with contrastive learning on large-scale text pairs. Matryoshka Representation Learning allows truncating embeddings to shorter dimensions (e.g. 64, 128, 256) without retraining, enabling flexible precision-cost tradeoffs. The model is transformers.js-compatible for browser-side inference.

15,328,805 ↓ · 812 ♡

multilingual-e5-small

Multilingual-E5-Small is a compact multilingual embedding model from Microsoft Research supporting 100+ languages on a BERT-based backbone, smaller and faster than the E5-large variant. It uses the same instruction-prefix training approach as E5-large ('query:'/'passage:') for asymmetric retrieval. MIT licensed with ONNX and OpenVINO export.

7,279,183 ↓ · 313 ♡

nomic-embed-text-v1

Nomic Embed Text v1 is the original version of Nomic AI's English text embedding model based on nomic-BERT, preceding the v1.5 matryoshka update. It produces 768-dimensional embeddings via contrastive learning and is fully open — model weights, training code, and data are publicly available. Apache 2.0 licensed.

7,231,198 ↓ · 566 ♡

paraphrase-multilingual-mpnet-base-v2

Multilingual MPNet embedding model from the sentence-transformers library, producing 768-dimensional vectors across 50+ languages. Uses an MPNet backbone extended to multilingual training for higher-quality multilingual embeddings than the lighter MiniLM multilingual variant. Suitable when the 384-dim paraphrase-multilingual-MiniLM-L12-v2 is insufficient in accuracy.

5,228,289 ↓ · 459 ♡

multilingual-e5-base

multilingual-e5-base is a multilingual text embedding model from Microsoft using an XLM-RoBERTa backbone, trained with E5's text-pair ranking objective across 94 languages. It produces 768-dimensional sentence embeddings for semantic search, clustering, and cross-lingual retrieval. The base variant balances embedding quality and inference cost between the small and large tiers.

3,852,160 ↓ · 354 ♡

paraphrase-MiniLM-L6-v2

paraphrase-MiniLM-L6-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

3,267,143 ↓ · 147 ♡

all-MiniLM-L12-v2

all-MiniLM-L12-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,851,436 ↓ · 307 ♡

multi-qa-mpnet-base-dot-v1

multi-qa-mpnet-base-dot-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,589,845 ↓ · 191 ♡

gte-multilingual-base

gte-multilingual-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,455,281 ↓ · 358 ♡

all-distilroberta-v1

all-distilroberta-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,347,583 ↓ · 42 ♡

Qwen3-VL-Embedding-2B

Qwen3-VL-Embedding-2B is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,294,366 ↓ · 396 ♡

nomic-embed-text-v2-moe

nomic-embed-text-v2-moe is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,046,558 ↓ · 475 ♡

e5-large

e5-large is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

2,025,877 ↓ · 80 ♡

finance-embeddings-investopedia

finance-embeddings-investopedia is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,962,544 ↓ · 64 ♡

paraphrase-mpnet-base-v2

paraphrase-mpnet-base-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,898,412 ↓ · 48 ♡

e5-base-v2

e5-base-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,833,694 ↓ · 155 ♡

ko-sroberta-multitask

ko-sroberta-multitask is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,730,600 ↓ · 146 ♡

Qwen3-VL-Embedding-8B

Qwen3-VL-Embedding-8B is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,469,719 ↓ · 398 ♡

gte-large

gte-large is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,413,293 ↓ · 302 ♡

embeddinggemma-300m

embeddinggemma-300m is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,380,020 ↓ · 1,629 ♡

stella_en_400M_v5

stella_en_400M_v5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,350,128 ↓ · 232 ♡

e5-large-v2

e5-large-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,319,750 ↓ · 279 ♡

stsb-bert-tiny-safetensors

stsb-bert-tiny-safetensors is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,232,531 ↓ · 4 ♡

e5-small-v2

e5-small-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,210,916 ↓ · 116 ♡

distiluse-base-multilingual-cased-v1

distiluse-base-multilingual-cased-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,156,862 ↓ · 131 ♡

paraphrase-MiniLM-L12-v2

paraphrase-MiniLM-L12-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,070,427 ↓ · 7 ♡

gte-large-en-v1.5

gte-large-en-v1.5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,044,204 ↓ · 234 ♡

msmarco-MiniLM-L12-cos-v5

msmarco-MiniLM-L12-cos-v5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

1,024,490 ↓ · 10 ♡

rubert-tiny2

rubert-tiny2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

959,577 ↓ · 169 ♡

embedic-base

embedic-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

954,604 ↓ · 2 ♡

all-MiniLM-L6-v2-onnx

all-MiniLM-L6-v2-onnx is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

904,432 ↓ · 6 ♡

multi-qa-MiniLM-L6-cos-v1

multi-qa-MiniLM-L6-cos-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

890,701 ↓ · 137 ♡

KR-SBERT-V40K-klueNLI-augSTS

KR-SBERT-V40K-klueNLI-augSTS is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

882,011 ↓ · 83 ♡

gte-small

gte-small is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

829,237 ↓ · 186 ♡

snowflake-arctic-embed-l-v2.0

snowflake-arctic-embed-l-v2.0 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

821,732 ↓ · 242 ♡

LaBSE

LaBSE is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

818,139 ↓ · 336 ♡

all-roberta-large-v1

all-roberta-large-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

805,922 ↓ · 66 ♡

text2vec-base-chinese

text2vec-base-chinese is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

778,999 ↓ · 791 ♡

paraphrase-MiniLM-L3-v2

paraphrase-MiniLM-L3-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

740,030 ↓ · 29 ♡

pubmedbert-base-embeddings

pubmedbert-base-embeddings is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

634,031 ↓ · 185 ♡

distiluse-base-multilingual-cased-v2

distiluse-base-multilingual-cased-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

633,836 ↓ · 208 ♡

S-PubMedBert-MS-MARCO

S-PubMedBert-MS-MARCO is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

576,812 ↓ · 42 ♡

gte-base-en-v1.5

gte-base-en-v1.5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

569,243 ↓ · 70 ♡

bge-small-en-v1.5-onnx-Q

bge-small-en-v1.5-onnx-Q is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

564,135 ↓ · 2 ♡

gte-Qwen2-7B-instruct

gte-Qwen2-7B-instruct is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

534,731 ↓ · 481 ♡

bm25

bm25 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

518,906 ↓ · 30 ♡

snowflake-arctic-embed-m

snowflake-arctic-embed-m is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

516,367 ↓ · 165 ♡

msmarco-bert-base-dot-v5

msmarco-bert-base-dot-v5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

463,495 ↓ · 21 ♡

vietnamese-bi-encoder

vietnamese-bi-encoder is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

454,614 ↓ · 71 ♡

bge-micro-v2

bge-micro-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

452,098 ↓ · 61 ♡

paraphrase-albert-small-v2

paraphrase-albert-small-v2 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

438,065 ↓ · 11 ♡

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

431,736 ↓ · 21 ♡

S-PubMedBert-MedQuAD

S-PubMedBert-MedQuAD is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

430,032 ↓ · 8 ♡

nomic-embed-code

nomic-embed-code is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

420,371 ↓ · 120 ♡

msmarco-MiniLM-L6-v3

msmarco-MiniLM-L6-v3 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

397,777 ↓ · 15 ♡

snowflake-arctic-embed-m-v1.5

snowflake-arctic-embed-m-v1.5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

375,997 ↓ · 71 ♡

ruri-v3-310m

ruri-v3-310m is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

375,663 ↓ · 71 ♡

GIST-Embedding-v0

GIST-Embedding-v0 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

369,391 ↓ · 30 ♡

bengali-sentence-similarity-sbert

bengali-sentence-similarity-sbert is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

368,159 ↓ · 6 ♡

gte-Qwen2-1.5B-instruct

gte-Qwen2-1.5B-instruct is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

361,114 ↓ · 230 ♡

USER-bge-m3

USER-bge-m3 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

357,933 ↓ · 76 ♡

e5-base

e5-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

335,115 ↓ · 25 ♡

stsb-roberta-base

stsb-roberta-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

333,440 ↓ · 1 ♡

gte-base-en-v1.5

gte-base-en-v1.5 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

325,152 ↓ · 0 ♡

all_miniLM_L6_v2_with_attentions

all_miniLM_L6_v2_with_attentions is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

318,866 ↓ · 13 ♡

serafim-335m-portuguese-pt-sentence-encoder-ir

serafim-335m-portuguese-pt-sentence-encoder-ir is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

300,007 ↓ · 0 ♡

gte-base

gte-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

295,682 ↓ · 131 ♡

klue-sroberta-base-continue-learning-by-mnr

klue-sroberta-base-continue-learning-by-mnr is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

295,226 ↓ · 31 ♡

Vietnamese_Embedding

Vietnamese_Embedding is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

294,221 ↓ · 61 ♡

langcache-embed-v1

langcache-embed-v1 is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

293,682 ↓ · 14 ♡

distilbert-multilingual-nli-stsb-quora-ranking

distilbert-multilingual-nli-stsb-quora-ranking is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

288,656 ↓ · 10 ♡

sup-SimCSE-VietNamese-phobert-base

sup-SimCSE-VietNamese-phobert-base is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

288,018 ↓ · 29 ♡

instructor-large

instructor-large is an open-source sentence-similarity model available on HuggingFace. Details are sourced from the public model registry.

287,158 ↓ · 524 ♡