Distilled BERT model that encodes sentences into 384-dimensional vectors for measuring semantic similarity. Trained on over a billion sentence pairs spanning scientific papers, web QA, NLI datasets, and community forums. At 22M parameters and 6 transformer layers, it is fast enough for CPU inference while remaining competitive on standard sentence similarity benchmarks.

243,930,327 4,980

ms-marco-MiniLM-L6-v2

text-ranking

Cross-encoder reranker trained on the MS MARCO passage retrieval dataset, designed to score query-document pairs jointly rather than encoding them independently. Distilled from a 12-layer cross-encoder into 6 layers to reduce latency while retaining re-ranking accuracy. Used as a second-stage ranker on top of fast first-stage retrieval (BM25 or bi-encoder).

78,976,309 267

bge-small-en-v1.5

feature-extraction

Small English dense embedding model from BAAI's BGE (BAAI General Embedding) series, producing 384-dimensional vectors via MIT license. Optimized for MTEB retrieval benchmarks through a retrieval-focused training strategy, it achieves competitive scores relative to its parameter count. Suited for embedding workflows where throughput and cost matter more than peak accuracy.

60,148,419 493

bert-base-uncased

fill-mask

Google's original BERT base model in uncased form, pre-trained on BookCorpus and English Wikipedia via masked language modeling. Tokens are lowercased before processing, making it insensitive to capitalization. It remains a standard fine-tuning base for classification, NER, and extractive QA, though newer encoders outperform it on most benchmarks.

57,757,042 2,686

paraphrase-multilingual-MiniLM-L12-v2

sentence-similarity

Multilingual sentence embedding model covering 50+ languages, built on a 12-layer distilled MiniLM architecture. Produces 384-dimensional vectors designed for semantic similarity and paraphrase detection across language boundaries. Trained on multilingual paraphrase data to align semantically equivalent sentences even when expressed in different languages.

51,516,901 1,278

electra-base-discriminator

ELECTRA base discriminator from Google, pre-trained using replaced token detection rather than masked language modeling. A small generator produces candidate replacements; this model learns to identify which tokens were swapped — a task that uses every token for training signal, making pre-training more efficient than BERT per compute dollar. Intended as a fine-tuning base for classification and token-level tasks.

41,397,308 127

all-mpnet-base-v2

sentence-similarity

Sentence embedding model based on the MPNet architecture, producing 768-dimensional vectors. Trained on over a billion sentence pairs from MS MARCO, NLI datasets, and community QA forums, it is frequently used when accuracy matters more than inference speed among English embedding models. The MPNet backbone enables masked and permuted prediction during pre-training for stronger representations.

34,593,691 1,311

bge-m3

sentence-similarity

BAAI's BGE-M3 embedding model supporting over 100 languages with a unified architecture capable of dense, sparse (lexical), and late-interaction (ColBERT-style) retrieval modes from a single checkpoint. Built on XLM-RoBERTa with large-scale multilingual training, it targets multi-lingual and cross-lingual retrieval where a single model must handle diverse language inputs.

31,091,007 3,131

Qwen3-0.6B

text-generation

Qwen3-0.6B is the 0.6-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 series, fine-tuned from the Qwen3-0.6B-Base for conversational and task-following use. It targets deployment in environments where even a 1B model is too large — edge hardware, mobile devices, or ultra-low-latency services. Apache 2.0 licensed.

27,358,157 1,351

clip-vit-base-patch32

zero-shot-image-classification

OpenAI's CLIP model using a ViT-B/32 image encoder, the smaller of the two widely deployed CLIP variants. Trained contrastively on 400 million image-text pairs, it aligns image and text representations in a shared embedding space for zero-shot classification and retrieval. The B/32 variant sacrifices accuracy versus ViT-L/14 for faster inference.

23,240,836 963

xlm-roberta-base

fill-mask

XLM-RoBERTa base from Facebook AI, pre-trained on 2.5TB of filtered CommonCrawl text across 100 languages using the RoBERTa training procedure. Enables cross-lingual transfer — models fine-tuned on labeled English data can infer on other languages without parallel annotations. The standard starting point for multilingual classification and token-level tasks.

20,744,002 852

nomic-embed-text-v1.5

sentence-similarity

Nomic Embed Text v1.5 is a matryoshka-capable English embedding model from Nomic AI, built on a custom nomic-BERT architecture trained with contrastive learning on large-scale text pairs. Matryoshka Representation Learning allows truncating embeddings to shorter dimensions (e.g. 64, 128, 256) without retraining, enabling flexible precision-cost tradeoffs. The model is transformers.js-compatible for browser-side inference.

18,375,459 852

Kokoro-82M

text-to-speech

Kokoro-82M is a compact 82-million-parameter text-to-speech model fine-tuned from StyleTTS2, targeting natural-sounding English speech synthesis at a size runnable on CPU or modest GPU. Released under Apache 2.0 with a HuggingFace DOI, it gained attention as a high-quality open TTS model at significantly smaller scale than most alternatives. It supports multiple English voice styles.

16,925,704 6,372

clap-htsat-fused

audio-classification

LAION's CLAP (Contrastive Language-Audio Pretraining) model using the HTSAT (Hierarchical Token-Semantic Audio Transformer) encoder, fused with a text encoder to align audio and text in a shared embedding space. Analogous to CLIP for images, it enables zero-shot audio classification and retrieval using natural language descriptions without task-specific labeled audio data.

16,636,514 106

Qwen3-4B

text-generation

Qwen3-4B is Alibaba's 4B parameter model from the Qwen3 series, which introduced a hybrid thinking mode allowing the model to switch between fast direct answering and extended chain-of-thought reasoning. It is a compact model capable of running on consumer hardware while outperforming many 7B predecessors on reasoning benchmarks. Apache 2.0 licensed.

16,075,125 640

bge-reranker-v2-m3

text-classification

BGE-Reranker-v2-M3 is BAAI's multilingual cross-encoder reranker built on XLM-RoBERTa, designed for re-ranking retrieved passages in multilingual RAG or search pipelines. It jointly encodes query-passage pairs to produce relevance scores, providing higher accuracy than bi-encoder similarity for the same candidate set. Apache 2.0 licensed with text-embeddings-inference support.

15,789,545 1,047

colbertv2.0

ColBERTv2 is a late-interaction retrieval model from Stanford that encodes queries and passages as per-token embeddings rather than a single vector, allowing MaxSim matching at retrieval time. This token-level interaction yields higher accuracy than bi-encoders on many retrieval benchmarks while remaining more efficient than cross-encoders. The model is MIT licensed and implemented in PyTorch with ONNX support.

15,023,380 362

chronos-2

time-series-forecasting

Chronos-2 is Amazon's second-generation pretrained foundation model for zero-shot time-series forecasting. It frames forecasting as a language modeling problem over quantized time-series tokens using a T5 encoder-decoder architecture, enabling it to forecast across diverse domains without per-dataset training. Released under Apache 2.0.

15,009,050 333

bge-large-en-v1.5

feature-extraction

BGE-Large-EN-v1.5 is BAAI's highest-capacity English embedding model in the v1.5 series, producing 1024-dimensional vectors. It achieves top MTEB retrieval scores among its generation of English-only embedding models, at the cost of higher compute and storage than BGE-small or BGE-base. MIT licensed with ONNX export support.

14,928,106 688

mobilenetv3_small_100.lamb_in1k

image-classification

MobileNetV3 small model at 100% width multiplier, trained on ImageNet-1k using the LAMB optimizer via the timm library. At under 3M parameters, it targets image classification on mobile and edge hardware where latency and memory are primary constraints. Part of timm's standardized pretrained model zoo with consistent preprocessing and inference APIs.

14,369,555 78

chronos-bolt-small

time-series-forecasting

Chronos-Bolt-Small is a small time-series foundation model from AutoGluon, using a T5-based encoder-decoder architecture for zero-shot forecasting. The 'Bolt' variant improves over original Chronos through training and architectural refinements for better speed and accuracy. Apache 2.0 licensed and part of the AutoGluon time-series forecasting ecosystem.

13,532,584 44

roberta-base

fill-mask

RoBERTa base from Facebook AI, trained with the same architecture as BERT base but significantly more data, longer training schedules, larger batch sizes, and dynamic masking. Pre-trained on BookCorpus, Wikipedia, CC-News, OpenWebText, and Stories — substantially more data than the original BERT. MIT licensed with multi-framework support.

13,342,794 616

gpt2

text-generation

OpenAI's original GPT-2 at 124M parameters, an autoregressive language model trained on WebText (over 8 million web documents filtered from Reddit outlinks). It generates English text continuation given a prompt using next-token prediction, trained without any instruction tuning or RLHF. MIT licensed and runnable on commodity CPU hardware.

13,231,213 3,306

adetailer

ADetailer is a collection of Ultralytics YOLO-based face, body, and hand detection models distributed for use with the Stable Diffusion WebUI's ADetailer extension. The models detect regions of interest in generated images (faces, hands) to trigger targeted inpainting passes for quality improvement. Trained on WIDER FACE and anime segmentation datasets, covering both photorealistic and anime styles.

12,867,952 728

Open-source AI models,compared at a glance.

Browse by pipeline

text generation

image text to text

automatic speech recognition

sentence similarity

feature extraction

fill mask

text classification

image classification

time series forecasting

zero shot image classification

text ranking

translation

any to any

text to image

token classification

image feature extraction

text to speech

audio classification

image to text

object detection

image segmentation

image to video

zero shot classification

depth estimation

question answering

image to image

zero shot object detection

mask generation

summarization

audio to audio

audio text to text

image to 3d

video classification

voice activity detection

visual document retrieval

keypoint detection

robotics

text to audio

text to video

table question answering

other

tabular regression

tabular classification

visual question answering

image text to image

Top by downloads

Open-source AI models,
compared at a glance.