Distilled BERT model that encodes sentences into 384-dimensional vectors for measuring semantic similarity. Trained on over a billion sentence pairs spanning scientific papers, web QA, NLI datasets, and community forums. At 22M parameters and 6 transformer layers, it is fast enough for CPU inference while remaining competitive on standard sentence similarity benchmarks.
Cross-encoder reranker trained on the MS MARCO passage retrieval dataset, designed to score query-document pairs jointly rather than encoding them independently. Distilled from a 12-layer cross-encoder into 6 layers to reduce latency while retaining re-ranking accuracy. Used as a second-stage ranker on top of fast first-stage retrieval (BM25 or bi-encoder).
Small English dense embedding model from BAAI's BGE (BAAI General Embedding) series, producing 384-dimensional vectors via MIT license. Optimized for MTEB retrieval benchmarks through a retrieval-focused training strategy, it achieves competitive scores relative to its parameter count. Suited for embedding workflows where throughput and cost matter more than peak accuracy.
Google's original BERT base model in uncased form, pre-trained on BookCorpus and English Wikipedia via masked language modeling. Tokens are lowercased before processing, making it insensitive to capitalization. It remains a standard fine-tuning base for classification, NER, and extractive QA, though newer encoders outperform it on most benchmarks.
Multilingual sentence embedding model covering 50+ languages, built on a 12-layer distilled MiniLM architecture. Produces 384-dimensional vectors designed for semantic similarity and paraphrase detection across language boundaries. Trained on multilingual paraphrase data to align semantically equivalent sentences even when expressed in different languages.
ELECTRA base discriminator from Google, pre-trained using replaced token detection rather than masked language modeling. A small generator produces candidate replacements; this model learns to identify which tokens were swapped — a task that uses every token for training signal, making pre-training more efficient than BERT per compute dollar. Intended as a fine-tuning base for classification and token-level tasks.
Sentence embedding model based on the MPNet architecture, producing 768-dimensional vectors. Trained on over a billion sentence pairs from MS MARCO, NLI datasets, and community QA forums, it is frequently used when accuracy matters more than inference speed among English embedding models. The MPNet backbone enables masked and permuted prediction during pre-training for stronger representations.
BAAI's BGE-M3 embedding model supporting over 100 languages with a unified architecture capable of dense, sparse (lexical), and late-interaction (ColBERT-style) retrieval modes from a single checkpoint. Built on XLM-RoBERTa with large-scale multilingual training, it targets multi-lingual and cross-lingual retrieval where a single model must handle diverse language inputs.
Qwen3-0.6B is the 0.6-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 series, fine-tuned from the Qwen3-0.6B-Base for conversational and task-following use. It targets deployment in environments where even a 1B model is too large — edge hardware, mobile devices, or ultra-low-latency services. Apache 2.0 licensed.
zero-shot-image-classification OpenAI's CLIP model using a ViT-B/32 image encoder, the smaller of the two widely deployed CLIP variants. Trained contrastively on 400 million image-text pairs, it aligns image and text representations in a shared embedding space for zero-shot classification and retrieval. The B/32 variant sacrifices accuracy versus ViT-L/14 for faster inference.
XLM-RoBERTa base from Facebook AI, pre-trained on 2.5TB of filtered CommonCrawl text across 100 languages using the RoBERTa training procedure. Enables cross-lingual transfer — models fine-tuned on labeled English data can infer on other languages without parallel annotations. The standard starting point for multilingual classification and token-level tasks.
Nomic Embed Text v1.5 is a matryoshka-capable English embedding model from Nomic AI, built on a custom nomic-BERT architecture trained with contrastive learning on large-scale text pairs. Matryoshka Representation Learning allows truncating embeddings to shorter dimensions (e.g. 64, 128, 256) without retraining, enabling flexible precision-cost tradeoffs. The model is transformers.js-compatible for browser-side inference.
Kokoro-82M is a compact 82-million-parameter text-to-speech model fine-tuned from StyleTTS2, targeting natural-sounding English speech synthesis at a size runnable on CPU or modest GPU. Released under Apache 2.0 with a HuggingFace DOI, it gained attention as a high-quality open TTS model at significantly smaller scale than most alternatives. It supports multiple English voice styles.
LAION's CLAP (Contrastive Language-Audio Pretraining) model using the HTSAT (Hierarchical Token-Semantic Audio Transformer) encoder, fused with a text encoder to align audio and text in a shared embedding space. Analogous to CLIP for images, it enables zero-shot audio classification and retrieval using natural language descriptions without task-specific labeled audio data.
Qwen3-4B is Alibaba's 4B parameter model from the Qwen3 series, which introduced a hybrid thinking mode allowing the model to switch between fast direct answering and extended chain-of-thought reasoning. It is a compact model capable of running on consumer hardware while outperforming many 7B predecessors on reasoning benchmarks. Apache 2.0 licensed.
BGE-Reranker-v2-M3 is BAAI's multilingual cross-encoder reranker built on XLM-RoBERTa, designed for re-ranking retrieved passages in multilingual RAG or search pipelines. It jointly encodes query-passage pairs to produce relevance scores, providing higher accuracy than bi-encoder similarity for the same candidate set. Apache 2.0 licensed with text-embeddings-inference support.
ColBERTv2 is a late-interaction retrieval model from Stanford that encodes queries and passages as per-token embeddings rather than a single vector, allowing MaxSim matching at retrieval time. This token-level interaction yields higher accuracy than bi-encoders on many retrieval benchmarks while remaining more efficient than cross-encoders. The model is MIT licensed and implemented in PyTorch with ONNX support.
Chronos-2 is Amazon's second-generation pretrained foundation model for zero-shot time-series forecasting. It frames forecasting as a language modeling problem over quantized time-series tokens using a T5 encoder-decoder architecture, enabling it to forecast across diverse domains without per-dataset training. Released under Apache 2.0.
BGE-Large-EN-v1.5 is BAAI's highest-capacity English embedding model in the v1.5 series, producing 1024-dimensional vectors. It achieves top MTEB retrieval scores among its generation of English-only embedding models, at the cost of higher compute and storage than BGE-small or BGE-base. MIT licensed with ONNX export support.
MobileNetV3 small model at 100% width multiplier, trained on ImageNet-1k using the LAMB optimizer via the timm library. At under 3M parameters, it targets image classification on mobile and edge hardware where latency and memory are primary constraints. Part of timm's standardized pretrained model zoo with consistent preprocessing and inference APIs.
Chronos-Bolt-Small is a small time-series foundation model from AutoGluon, using a T5-based encoder-decoder architecture for zero-shot forecasting. The 'Bolt' variant improves over original Chronos through training and architectural refinements for better speed and accuracy. Apache 2.0 licensed and part of the AutoGluon time-series forecasting ecosystem.
RoBERTa base from Facebook AI, trained with the same architecture as BERT base but significantly more data, longer training schedules, larger batch sizes, and dynamic masking. Pre-trained on BookCorpus, Wikipedia, CC-News, OpenWebText, and Stories — substantially more data than the original BERT. MIT licensed with multi-framework support.
OpenAI's original GPT-2 at 124M parameters, an autoregressive language model trained on WebText (over 8 million web documents filtered from Reddit outlinks). It generates English text continuation given a prompt using next-token prediction, trained without any instruction tuning or RLHF. MIT licensed and runnable on commodity CPU hardware.
ADetailer is a collection of Ultralytics YOLO-based face, body, and hand detection models distributed for use with the Stable Diffusion WebUI's ADetailer extension. The models detect regions of interest in generated images (faces, hands) to trigger targeted inpainting passes for quality improvement. Trained on WIDER FACE and anime segmentation datasets, covering both photorealistic and anime styles.