Qwen3-0.6B is the 0.6-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 series, fine-tuned from the Qwen3-0.6B-Base for conversational and task-following use. It targets deployment in environments where even a 1B model is too large — edge hardware, mobile devices, or ultra-low-latency services. Apache 2.0 licensed.
27,358,157 ↓ · 1,351 ♡
Qwen3-4B is Alibaba's 4B parameter model from the Qwen3 series, which introduced a hybrid thinking mode allowing the model to switch between fast direct answering and extended chain-of-thought reasoning. It is a compact model capable of running on consumer hardware while outperforming many 7B predecessors on reasoning benchmarks. Apache 2.0 licensed.
16,075,125 ↓ · 640 ♡
OpenAI's original GPT-2 at 124M parameters, an autoregressive language model trained on WebText (over 8 million web documents filtered from Reddit outlinks). It generates English text continuation given a prompt using next-token prediction, trained without any instruction tuning or RLHF. MIT licensed and runnable on commodity CPU hardware.
13,231,213 ↓ · 3,306 ♡
Qwen2.5-7B-Instruct is Alibaba Cloud's 7-billion-parameter instruction-tuned language model from the Qwen2.5 series, supporting English and a range of other languages. It targets applications requiring more reasoning and knowledge than sub-3B models, while remaining deployable on a single consumer GPU. Apache 2.0 licensed with text-generation-inference compatibility.
12,806,691 ↓ · 1,377 ♡
Qwen3-8B is the 8-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 family, positioned at the competitive midpoint between 4B and 14B+ tiers. It targets deployment on single consumer or workstation GPUs while providing strong reasoning and multilingual capabilities. Apache 2.0 licensed with text-generation-inference compatibility.
12,750,554 ↓ · 1,151 ♡
OPT-125M is the smallest model in Meta's Open Pretrained Transformer series, a 125-million-parameter decoder-only LLM trained on a dataset comparable to GPT-3's training mix. Released as part of Meta's effort to make large language model weights accessible for research. At 125M parameters it is primarily used for prototyping, educational purposes, and compute-constrained environments.
11,836,914 ↓ · 267 ♡
Qwen2.5-3B-Instruct is a 3-billion-parameter instruction-tuned language model from Alibaba Cloud's Qwen2.5 series, positioned between the 1.5B and 7B tiers. It targets lightweight server deployments and on-device inference scenarios where 7B is too large. The license is 'other' — requires reviewing the specific Qwen 2.5 license terms before commercial deployment.
11,422,175 ↓ · 509 ♡
Qwen2.5-1.5B-Instruct is a 1.5-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen2.5 series, targeting edge and embedded deployment scenarios where even a 3B model is too large. Apache 2.0 licensed, it focuses on basic instruction following and short-context tasks at minimal compute cost.
10,545,806 ↓ · 747 ♡
Llama 3.1-8B-Instruct is Meta's 8-billion-parameter instruction-tuned model, supporting 8 languages including English, German, French, Spanish, Italian, Portuguese, Hindi, and Thai. Released under the Llama 3.1 license (permissive with restrictions for products over 700M users), it was a leading open-weight model at its scale at release. Context window extends to 128K tokens.
9,833,276 ↓ · 6,122 ♡
gemma-3-270m handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
8,300,076 ↓ · 1,035 ♡
Llama 3.2-1B-Instruct is Meta's 1-billion-parameter instruction-tuned model from the Llama 3.2 family, the smallest Llama release targeting ultra-low-resource inference scenarios. It is designed for edge deployment on devices that cannot accommodate even 3B models. The Llama 3.2 license restricts use by products/services with over 700M monthly users.
8,080,501 ↓ · 1,489 ♡
A minimal Qwen2-architecture causal LM created by the TRL (Transformer Reinforcement Learning) team for internal testing purposes. It is not intended for any production use or meaningful text generation — it exists to provide a tiny, fast-loading model compatible with Qwen2 tokenization for unit testing TRL training scripts.
7,158,836 ↓ · 7 ♡
DeepSeek-R1-0528 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
7,150,965 ↓ · 2,453 ♡
DeepSeek-R1 is a 671B parameter mixture-of-experts reasoning model from DeepSeek AI, trained with reinforcement learning to produce explicit chain-of-thought reasoning before answering. It achieves GPT-4-class performance on math, coding, and logical inference benchmarks and is released under an MIT license. Active parameters per forward pass are a subset of the 671B total, reducing compute per generated token.
6,809,722 ↓ · 13,404 ♡
GPT-OSS-20B is a 20-billion-parameter open-source language model released by OpenAI under Apache 2.0 — notable as OpenAI's first substantial open-weight release after years of closed-weights policy. Based on the gpt_oss architecture, it targets high-quality text generation at a scale deployable on research and enterprise GPU infrastructure. FP8 and MXfloat4 quantized variants reduce memory requirements.
6,787,695 ↓ · 4,718 ♡
Qwen3-1.7B is a 1.7-billion-parameter instruction-tuned language model from Alibaba Cloud's Qwen3 series, filling the gap between the 0.6B and 4B tiers. It targets constrained deployment scenarios where sub-1B quality is insufficient but 4B VRAM requirements are too high. Apache 2.0 licensed.
5,729,967 ↓ · 486 ♡
Qwen3-4B-Instruct-2507 is a 4-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 series, updated in July 2025. It targets the mid-range deployment tier between ultra-compact sub-2B models and the 7-8B tier requiring heavier hardware. Apache 2.0 licensed with text-generation-inference compatibility.
5,500,426 ↓ · 881 ♡
Qwen2.5-0.5B-Instruct is Alibaba Cloud's 0.5-billion-parameter instruction-tuned model, the smallest in the Qwen2.5 family. It targets the most resource-constrained deployment scenarios, prioritizing the ability to run on any hardware over output quality. Apache 2.0 licensed and English-focused.
4,711,052 ↓ · 535 ♡
Dolphin 2.9.1 is a community fine-tune of Yi-1.5-34B intended to remove safety filtering and produce an 'uncensored' instruction-tuned model that follows all user requests without refusal. Trained by cognitive computations on OpenHermes, DolphinCoder, and similar datasets. Not Apache/MIT licensed — Yi-1.5-34B's base license applies.
4,626,366 ↓ · 64 ♡
Qwen2.5-Coder-14B-Instruct is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
4,416,447 ↓ · 166 ♡
A randomly initialized minimal LlamaForCausalLM instance with a tiny vocabulary and hidden dimension, used exclusively for fast unit testing and CI pipelines that need a real model interface without meaningful weights.
4,388,195 ↓ · 0 ♡
DistilGPT2 is a knowledge-distilled version of GPT-2 small, with 82M parameters (vs GPT-2's 117M) and approximately 2x faster inference. It retains around 97% of GPT-2 small's language modeling performance while being lighter to serve.
4,306,658 ↓ · 629 ♡
A GGUF conversion of DeepSeek V4 by antirez (Salvatore Sanfilippo, creator of Redis), packaged for local inference via llama.cpp. The model represents antirez's personal interest in local AI and has gathered community attention partly due to the author's reputation.
4,064,333 ↓ · 262 ♡
OpenAI's 120B parameter open-weight language model released under Apache 2.0 in 2025. Supports MXFP4 and 8-bit quantization for multi-GPU deployment via vLLM. Competitive on reasoning and instruction-following benchmarks within the open-weight tier.
3,987,781 ↓ · 4,906 ♡
Qwen3-32B is Alibaba Cloud's 32-billion-parameter instruction-tuned model from the Qwen3 series, targeting deployments requiring stronger reasoning, coding, and instruction following than 7-8B models while remaining lighter than 70B+ alternatives. Apache 2.0 licensed with text-generation-inference compatibility for production serving.
3,938,129 ↓ · 704 ♡
Qwen2-1.5B-Instruct is Alibaba's 1.5B parameter instruction-tuned chat model from the Qwen2 series. Designed to run efficiently on CPU or low-VRAM hardware, it handles short-context instruction-following, summarization, and Q&A tasks in English. It is the practical choice when memory constraints prevent running larger Qwen2 variants.
3,694,383 ↓ · 162 ♡
A community GGUF-quantized finetune that merges Gemma-3-1B-it with elements from GLM-4.7-Flash-Thinking, configured to remove default safety refusals. Primarily targeting users who want a small, locally-runnable model with reduced content restrictions.
3,675,155 ↓ · 71 ♡
Qwen3.6-35B-A3B-NVFP4 is an NVIDIA-optimized FP4 quantization of Qwen3.6-35B-A3B, produced with the ModelOpt toolkit for deployment on NVIDIA H100/H200 GPUs. FP4 weights reduce GPU memory footprint roughly 2x compared to BF16 while maintaining most of the original accuracy for conversational tasks. It is intended for inference on NVIDIA TensorRT-LLM or vLLM backends, not for further fine-tuning.
3,616,724 ↓ · 251 ♡
DeepSeek-V3.2 is a Mixture-of-Experts (MoE) large language model from DeepSeek AI, fine-tuned from DeepSeek-V3.2-Exp-Base. It activates a subset of expert parameters per token rather than the full model, enabling high effective parameter counts at lower per-token compute cost. MIT licensed, making it freely deployable commercially despite its scale.
3,416,736 ↓ · 1,450 ♡
Qwen2.5-7B-Instruct-AWQ is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
3,283,043 ↓ · 46 ♡
Pythia-160M is the smallest model in EleutherAI's Pythia suite, trained on the Pile with checkpoints saved every 512 steps. It is designed for mechanistic interpretability and scaling-law research rather than production use.
2,950,327 ↓ · 42 ♡
DeepSeek-V4-Pro is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
2,797,050 ↓ · 4,988 ♡
MiniMax-M2.7 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
2,713,609 ↓ · 1,217 ♡
DeepSeek-V4-Flash is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
2,481,903 ↓ · 1,540 ♡
Qwen 2.5 0.5B is the smallest base model in Alibaba's Qwen 2.5 family, designed for on-device scenarios requiring minimal memory. It shares the Qwen 2.5 tokenizer with larger models, enabling consistent prompt formatting across the family.
2,395,904 ↓ · 424 ♡
Qwen3-30B-A3B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
2,392,771 ↓ · 900 ♡
GPT-2 Large is OpenAI's 774M-parameter version of the original GPT-2 autoregressive language model from 2019. It produces more coherent text than GPT-2 medium but is significantly outdated compared to modern LLMs.
2,241,831 ↓ · 354 ♡
gemma-3-1b-it is a generative model in the Gemma family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
2,205,530 ↓ · 1,015 ♡
GLM-5-FP8 is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
2,201,357 ↓ · 181 ♡
Qwen2.5-Coder 7B is a code-specialized instruction model trained on 5.5 trillion code tokens, covering 92 programming languages. It achieves competitive performance against much larger code models on pass@1 benchmarks.
2,181,610 ↓ · 737 ♡
TinyLlama 1.1B Chat is a compact instruction-tuned language model trained on 3 trillion tokens with the Llama 2 architecture. It targets deployment on devices with limited RAM while retaining basic instruction-following capability.
2,153,716 ↓ · 1,632 ♡
Llama-3.2-1B is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
2,125,753 ↓ · 2,449 ♡
pythia-70m-deduped is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
2,106,072 ↓ · 28 ♡
Qwen 3 14B is Alibaba's 14-billion-parameter text generation model, offering a significant capacity step up from the 7B class with competitive performance on reasoning, math, and multilingual tasks.
2,068,678 ↓ · 412 ♡
Rio-3.0-Open-Mini is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
2,059,281 ↓ · 9 ♡
SmolLM2-135M-Instruct is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
2,032,740 ↓ · 356 ♡
Qwen 2.5 14B Instruct is Alibaba's mid-tier instruction model with strong multilingual, coding, and math capabilities. It fills the gap between 7B-class models and the more expensive 32B/72B variants for production deployments.
2,024,471 ↓ · 350 ♡
FP8-quantized Qwen3 0.6B, the smallest model in the Qwen3 series. At 0.6B parameters and FP8 precision, it is primarily useful for ultra-low-latency classification or extraction tasks where quality requirements are minimal.
2,019,999 ↓ · 62 ♡
Qwen3-Coder 30B is a code-specialized Mixture-of-Experts model with 30B total and 3B active parameters, instruction-tuned for programming tasks. It targets agentic coding workflows including multi-file editing, tool use, and repository-level understanding.
1,886,299 ↓ · 1,116 ♡
Red Hat's dynamically FP8-quantized version of Llama 3.2 1B Instruct, produced using llm-compressor for deployment on FP8-capable GPUs. Reduces memory and increases throughput while maintaining close-to-full-precision instruction following quality.
1,850,105 ↓ · 4 ♡
Kimi-K2-Instruct-0905 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,842,225 ↓ · 737 ♡
GLM-4.7-Flash is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,782,599 ↓ · 1,751 ♡
Llama 3.2 3B Instruct is Meta's compact instruction-tuned model designed for on-device and edge inference, with strong performance for its size on reasoning and instruction following benchmarks.
1,775,716 ↓ · 2,251 ♡
Qwen2.5-Coder-32B-Instruct is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,773,555 ↓ · 2,046 ♡
Qwen3-14B-AWQ handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,728,091 ↓ · 69 ♡
NVIDIA's NVFP4-quantized version of Google's Gemma-4-26B-A4B mixture-of-experts model, optimized for Blackwell-generation GPUs using Model Optimizer (ModelOpt). NVFP4 is a 4-bit floating-point format native to Hopper/Blackwell, providing better accuracy retention than INT4 at similar memory savings. Requires NIM or TensorRT-LLM for deployment.
1,667,148 ↓ · 83 ♡
tiny-random-Llama-3 is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,664,412 ↓ · 3 ♡
NVIDIA's FP4-quantized version of Gemma 4 31B instruction-tuned, optimized for deployment on Blackwell GPU architecture (B100/B200). Represents the current extreme of low-precision quantization for LLM serving.
1,662,099 ↓ · 512 ♡
NVIDIA-Nemotron-3-Nano-4B-BF16 is NVIDIA's Nemotron Nano 4B, an instruction-tuned LLM derived from a larger Nemotron-H backbone via Neural Architecture Search. Despite the 4B parameter count, it is trained with NVIDIA's Nemotron post-training dataset stack covering math, coding, instruction following, and agentic tool use. BF16 weights are provided for direct inference on A100/H100 GPUs.
1,661,155 ↓ · 93 ♡
OpenELM-1_1B-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,577,675 ↓ · 75 ♡
PowerMoE-3b is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,538,155 ↓ · 21 ♡
NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,525,289 ↓ · 358 ♡
Llama-3.1-8B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,506,730 ↓ · 2,267 ♡
Qwen3-Coder-Next in FP8 precision, targeting high-throughput code generation on FP8-capable hardware (H100 SXM, H200). The FP8 format halves memory requirements vs BF16 while using tensor-core FP8 instructions for near-BF16 throughput. 'Next' in the name indicates this is a more capable successor to the base Qwen3-Coder, with improved instruction following for agentic coding tasks.
1,467,639 ↓ · 153 ♡
Qwen2.5-Coder-32B-Instruct-AWQ is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,454,738 ↓ · 37 ♡
Qwen2.5-1.5B is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,448,942 ↓ · 189 ♡
AWQ 4-bit quantized version of Qwen2.5-14B-Instruct, reducing memory requirements from ~28GB to ~8–10GB while maintaining most of the original model's instruction-following quality through activation-aware quantization.
1,434,354 ↓ · 37 ♡
Meta-Llama-3-8B-Instruct is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,421,713 ↓ · 4,618 ♡
Qwen3-VL-30B-A3B-Instruct-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,397,071 ↓ · 43 ♡
NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,371,764 ↓ · 772 ♡
SmolLM2-135M is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,341,619 ↓ · 204 ♡
GLM-5.1-FP8 is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,328,464 ↓ · 117 ♡
Kimi-K2.5-NVFP4 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,303,613 ↓ · 86 ♡
LLaMA-1B fine-tuned on 150B tokens of RedPajama data filtered and refined by Data-Juicer, a data-cleaning toolkit from Alibaba DAMO. The training corpus was pruned using quality heuristics across Wikipedia, arXiv, Books, and Common Crawl slices. At 1B parameters it trades capability for low inference cost.
1,297,632 ↓ · 3 ♡
Meta's Llama 3 8B base model, pretrained on over 15 trillion tokens with an expanded 128K token vocabulary. It serves as the foundation for instruction-tuned and task-specific finetunes in the Llama 3 ecosystem.
1,278,612 ↓ · 6,583 ♡
DeepSeek-R1-0528-NVFP4-v2 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,234,836 ↓ · 23 ♡
Qwen2.5-72B-Instruct-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,214,437 ↓ · 78 ♡
Mistral 7B Instruct v0.2 improved on v0.1 with a 32K sliding window context and better instruction following. It was the strongest 7B open-weight instruction model available when released and remains competitive for text tasks after later versions raised the bar.
1,214,137 ↓ · 3,163 ♡
bart-large-emojilm is a generative model in the BART family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,195,711 ↓ · 0 ♡
DeepSeek-Coder-V2-Lite-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,146,238 ↓ · 611 ♡
Phi-3.5-mini-instruct is a Phi decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,141,985 ↓ · 990 ♡
h2ovl-mississippi-800m is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,133,453 ↓ · 40 ♡
h2ovl-mississippi-2b handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,100,280 ↓ · 42 ♡
Qwen2.5-32B-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,097,987 ↓ · 352 ♡
Qwen2-0.5B is the smallest base model in Alibaba's Qwen2 family, with 0.5B parameters and a 32K token context window. As a base (non-instruct) model it requires fine-tuning or custom prompting for task-specific behavior. Despite its size, it outperforms several older models of similar scale on standard benchmarks.
1,096,378 ↓ · 168 ♡
Qwen3-Coder-Next is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,091,200 ↓ · 1,472 ♡
SmolLM-1.7B-Instruct-quantized.w4a16 is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,089,245 ↓ · 0 ♡
tiny-gpt2 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,079,540 ↓ · 36 ♡
DeepSeek-V2-Lite-Chat is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,062,155 ↓ · 141 ♡
Qwen2.5-1.5B-quantized.w8a8 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,056,746 ↓ · 4 ♡
Kimi-K2.6-NVFP4 is an NVIDIA-optimized FP4 quantization of Kimi-K2.6, produced with the ModelOpt toolkit for deployment on NVIDIA H100/H200 GPUs. FP4 weights reduce GPU memory footprint roughly 2x compared to BF16 while maintaining most of the original accuracy for conversational tasks. It is intended for inference on NVIDIA TensorRT-LLM or vLLM backends, not for further fine-tuning.
1,045,205 ↓ · 36 ♡
DeepSeek-V3 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,036,965 ↓ · 4,091 ♡
NVIDIA-Nemotron-3-Super-120B-A12B-BF16 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,036,879 ↓ · 385 ♡
Qwen3-8B-AWQ is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
1,036,077 ↓ · 47 ♡
phi-4 is a generative model in the Phi family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
1,011,127 ↓ · 2,258 ♡
DeepSeek-V3-0324 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
1,009,176 ↓ · 3,131 ♡
TinyLlama-1.1B-Chat-v0.3-GPTQ handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
996,517 ↓ · 10 ♡
Qwen3-Coder-30B-A3B-Instruct-FP8 is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
990,211 ↓ · 184 ♡
Qwen2.5-Coder-14B-Instruct-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
936,705 ↓ · 21 ♡
Llama-3.2-1B-Instruct-FP8 is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
895,772 ↓ · 4 ♡
Phi-4-mini-instruct is a generative model in the Phi family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
877,058 ↓ · 776 ♡
Qwen2.5-1.5B-Instruct-AWQ handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
870,118 ↓ · 7 ♡
Qwen3-4B-Instruct-2507-FP8 is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
835,580 ↓ · 78 ♡
Phi-tiny-MoE-instruct is a Phi decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
816,429 ↓ · 38 ♡
Llama-3_3-Nemotron-Super-49B-v1_5 is a 49B sparse NAS-derived model from NVIDIA's Nemotron line, constructed via Neural Architecture Search to prune the original Llama 3.3 70B into a smaller active-parameter footprint while retaining most quality. The 'Super' designation indicates it targets reasoning tasks, coding, and instruction following with near-70B quality at reduced inference cost.
799,463 ↓ · 234 ♡
NVIDIA-Nemotron-Nano-9B-v2 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
794,692 ↓ · 495 ♡
A randomly initialized, architecturally minimal Bamba model used for unit-testing the BambaForCausalLM implementation in Hugging Face Transformers. Bamba is a hybrid SSM-attention architecture. This model has no trained weights — it exists purely for pipeline and shape verification in CI environments.
790,735 ↓ · 0 ♡
Qwen3-30B-A3B-Instruct-2507 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
788,613 ↓ · 815 ♡
Llama-3.2-3B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
776,590 ↓ · 831 ♡
Mamba-130M is a selective state space model (SSM) from the Mamba architecture, offering linear-time inference complexity versus transformer quadratic attention. At 130M parameters it's a research checkpoint used to study SSM behavior, not a production text generator. The HF suffix indicates it's adapted for the Transformers interface.
776,194 ↓ · 73 ♡
Nemotron-3 Nano is NVIDIA's 30B-parameter Mixture-of-Experts model with only 3B active parameters per forward pass, quantised to NVFP4 for Hopper GPU deployment. The model supports six languages and was trained on NVIDIA's Nemotron dataset family spanning code, math, and instruction following. NVFP4 quantisation targets tensor-core efficiency on H100/H200 hardware.
772,279 ↓ · 158 ♡
Qwen2.5-32B-Instruct-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
770,569 ↓ · 101 ♡
Mistral-7B-v0.1 is a generative model in the Mistral family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
761,785 ↓ · 4,112 ♡
Llama-2-7b-hf is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
758,340 ↓ · 2,323 ♡
Qwen3-1.7B-Base handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
749,379 ↓ · 74 ♡
Qwen3-4B-Base handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
745,564 ↓ · 95 ♡
Unsloth's GGUF quantizations of Qwen3-Coder-Next, a code-focused model from the Qwen3 family with extended training on programming datasets. Unsloth applies imatrix calibration during quantization, which improves accuracy at lower bit-widths compared to naive GGUF conversion. Available in multiple quant levels (Q4_K_M, Q8_0, etc.).
737,229 ↓ · 707 ♡
Qwen2.5-Coder-1.5B-Instruct is a compact instruction-tuned code model from Alibaba designed to handle code generation, explanation, and debugging tasks at 1.5B parameters. Despite its small size it scores competitively on HumanEval for its parameter class, making it a practical choice for on-device code assistants or latency-sensitive completion tools.
736,899 ↓ · 128 ♡
Qwen3-235B-A22B is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
730,355 ↓ · 1,097 ♡
Qwen2.5-Math-1.5B is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
715,175 ↓ · 109 ♡
Qwen2-7B-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
688,460 ↓ · 687 ♡
GLM-4.5-Air-AWQ-4bit is a 4-bit AWQ quantization of ZAI's GLM-4.5-Air, a MoE language model optimized for bilingual Chinese-English use. AWQ (Activation-aware Weight Quantization) reduces memory requirements while preserving output quality. The Air variant is a lower-compute subset of GLM-4.5 designed for efficient serving, and the AWQ quantization further reduces VRAM requirements for deployment.
685,644 ↓ · 29 ♡
Qwen2-0.5B-Instruct is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
682,174 ↓ · 201 ♡
Llama-3.3-70B-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
672,608 ↓ · 2,838 ♡
DeepSeek-R1-Distill-Qwen-32B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
670,484 ↓ · 1,568 ♡
Qwen3-4B-Thinking-2507 is an updated (July 2025) thinking-mode variant of Qwen3-4B, fine-tuned to generate extended chain-of-thought reasoning before producing answers. The 2507 suffix indicates a July 2025 update. Thinking mode generates explicit reasoning traces which increase token count but improve accuracy on structured tasks.
668,219 ↓ · 599 ♡
Llama-3.1-70B-Instruct is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
662,065 ↓ · 927 ♡
Llama-3.2-1B-Instruct-Q8_0-GGUF is a Llama decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
657,068 ↓ · 48 ♡
gpt-neox-20b is EleutherAI's 20B autoregressive language model, trained on the Pile dataset and released in 2022 as the largest fully open-weights English LLM at the time. It uses the GPT-NeoX architecture with rotary position embeddings and trained in bf16 on TPUs. While now superseded by much larger models, it remains historically significant and is a baseline for open LLM research.
654,187 ↓ · 584 ♡
Qwen3.5-397B-A17B-NVFP4 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
634,562 ↓ · 100 ♡
DeepSeek-R1-Distill-Qwen-1.5B distills DeepSeek-R1's chain-of-thought reasoning traces into a 1.5B Qwen2 model. The distillation process transfers structured thinking patterns rather than raw capability, producing a model that generates explicit reasoning steps before answers. MIT license makes it broadly usable.
627,219 ↓ · 1,526 ♡
Phi-3-mini-4k-instruct is a generative model in the Phi family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
623,819 ↓ · 1,434 ♡
tiny-random-OPTForCausalLM is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
620,384 ↓ · 0 ♡
SmolLM3-3B is HuggingFace's 3B instruction-tuned language model, the third generation of the SmolLM family targeting on-device and resource-constrained deployment. It is multilingual (English, French, Spanish, Italian, Portuguese, Chinese, Arabic, Russian) and achieves competitive instruction-following quality at the 3B parameter scale. Apache-2.0 licensing makes it a viable base for commercial on-device AI applications.
608,332 ↓ · 975 ♡
Phi-3-mini-4k-instruct-gptq-4bit is a Phi decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
607,129 ↓ · 2 ♡
Rio-3.0-Open is an open-weights LLM released by the Prefeitura do Rio de Janeiro (Rio de Janeiro city government), fine-tuned from Qwen3-235B-A22B on Portuguese and English data for civic and administrative use cases. It is a MoE architecture fine-tune targeting Brazilian Portuguese language understanding and public service applications. MIT licensed for open use.
606,651 ↓ · 5 ♡
TinyLLama-v0 is an early community repackage of the TinyLlama 1.1B base model, offering PyTorch and ONNX checkpoints for fast local experimentation. This is the pre-instruction-tuned base variant; it generates continuations rather than following instructions. The primary value is quick prototyping on hardware too constrained for larger models.
604,555 ↓ · 43 ♡
Nemotron-Labs-Diffusion-8B-Base is NVIDIA's diffusion language model base, applying discrete diffusion to text generation instead of autoregressive decoding. At 8B parameters, it generates text by iteratively denoising token sequences rather than predicting them left-to-right. This enables parallel token generation but requires different inference tooling than standard transformer LLMs.
600,727 ↓ · 6 ♡
MiniMax-M2.5 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
598,953 ↓ · 1,497 ♡
deepseek-coder-7b-instruct-v1.5 is DeepSeek AI's 7B-parameter instruction-tuned code model, the v1.5 release built on a Llama architecture. It is optimized for code generation, completion, and debugging across common programming languages, with a 16K token context window. Version 1.5 improves over earlier DeepSeek-Coder releases on fill-in-the-middle tasks and instruction following for coding-specific prompts.
570,530 ↓ · 156 ♡
Qwen2.5-72B-Instruct is Alibaba's 72B instruction-tuned model from the Qwen 2.5 series, trained on over 18 trillion tokens with improvements in math, coding, and long-context handling up to 128K tokens. It supports 29 languages and uses a non-commercial license for the 72B variant.
554,048 ↓ · 954 ♡
Qwen2.5-7B-Instruct-bnb-4bit is Unsloth's bitsandbytes 4-bit quantized version of Qwen2.5-7B-Instruct, packaged for efficient fine-tuning and inference via the Unsloth framework. The bnb-4bit format enables QLoRA fine-tuning on a single consumer GPU (12-16GB VRAM), making Qwen2.5-7B accessible for custom instruction tuning without requiring multi-GPU setups.
550,127 ↓ · 23 ♡
tiny-random-LlamaForCausalLM handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
548,815 ↓ · 8 ♡
Kimi-K2-Instruct is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
548,684 ↓ · 2,366 ♡
tiny-GptOssForCausalLM is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
547,733 ↓ · 4 ♡
DeepSeek-R1-Distill-Qwen-14B is a 14B model that distills DeepSeek-R1's extended chain-of-thought reasoning into a Qwen2 backbone. It strikes a better capability/size balance than the 1.5B or 7B distillations, handling moderately complex math and coding problems with explicit reasoning traces. MIT license allows unrestricted use.
544,123 ↓ · 656 ♡
RnJ-1-Instruct in FP8 precision from Doradus AI, a reasoning-focused instruct model targeting code and logical problem-solving. FP8 quantization reduces memory footprint while preserving most of the original model's task accuracy.
543,799 ↓ · 4 ♡
Qwen3-8B in FP8 precision from Alibaba, targeting high-throughput serving on Hopper-generation GPUs. FP8 halves the memory footprint of the BF16 checkpoint while matching it in throughput on H100/H200 tensor cores. Qwen3-8B is instruction-tuned with hybrid reasoning mode, toggling between chain-of-thought and direct-answer modes via a flag.
538,830 ↓ · 61 ♡
Qwen2.5-Coder-7B-Instruct-GPTQ-Int4 is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
538,762 ↓ · 14 ♡
Qwen2.5-3B is the 3B base (non-instruct) model from Alibaba's Qwen2.5 series, with a 32K token context window. Base models in this series are primarily useful as fine-tuning starting points. The instruct variant is recommended for most direct applications.
533,489 ↓ · 192 ♡
Granite 4.0-H-Small is IBM's latest Granite generation using a hybrid SSM-Transformer architecture (GraniteMoEHybrid), combining state space models with attention layers for improved long-context efficiency. The small variant targets edge and on-premise deployments where the compute budget is constrained. This is IBM's first Granite model with a hybrid non-pure-Transformer design.
525,622 ↓ · 308 ♡
LLaDA-8B-Instruct is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
516,939 ↓ · 358 ♡
Meta-Llama-3.1-8B-Instruct-FP8 handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
512,254 ↓ · 44 ♡
Qwen3-0.6B-Base is Alibaba's smallest Qwen3 model, a base (non-instruct) LLM at 0.6 billion parameters. It targets on-device, edge, and resource-constrained deployments where even 1.5B models are too large. As a base model it requires instruction tuning or few-shot prompting for task-specific use; the primary value is as a fine-tuning starting point.
507,987 ↓ · 174 ♡
tiny-random-LlamaForCausalLM is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
504,565 ↓ · 20 ♡
Qwen3-8B-Base is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
501,989 ↓ · 107 ♡
An AWQ 4-bit quantisation of DeepSeek V3.2, packaged for vLLM inference. AWQ (Activation-aware Weight Quantisation) identifies and preserves the most salient weights at higher precision, typically losing less perplexity than naive 4-bit approaches. This checkpoint lets teams run the large DeepSeek V3.2 on fewer GPUs than the BF16 original while retaining most benchmark performance.
497,763 ↓ · 11 ♡
Qwen 2.5 7B is Alibaba's base (non-instruction-tuned) language model at the 7B scale, pretrained on 18 trillion tokens. It serves as the foundation for Qwen 2.5 7B Instruct and downstream fine-tunes requiring a strong base without chat formatting.
496,539 ↓ · 292 ♡
BLOOM-560M is the smallest model in the BLOOM family, a collaborative multilingual language model trained under the BigScience initiative on 46 natural languages and 13 programming languages. At 560M parameters it's primarily useful for multilingual research and teaching rather than competitive NLP tasks. The RAIL license restricts certain harmful use cases.
496,322 ↓ · 374 ♡
Qwen2.5-Coder-3B is the 3B base (non-instruct) model from Alibaba's code-specialized Qwen2.5 Coder series, trained on a large corpus of code and programming-related text. As a base model it lacks instruction following and requires fine-tuning or prompting strategies to use for code generation tasks. The instruct variant is better suited for direct use.
491,543 ↓ · 52 ♡
Text-only NVFP4-quantized Qwen3.6-27B with multi-token prediction (MTP) for speculative decoding, optimized for Blackwell and Hopper GPUs via NVIDIA ModelOpt. Stripping vision components reduces memory footprint and inference latency when only text output is needed. Supports 13 languages including Chinese, Japanese, and Korean.
482,608 ↓ · 76 ♡
diffusiongemma-26B-A4B-it-NVFP4 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
474,663 ↓ · 81 ♡
Qwen2.5-32B-Instruct-GPTQ-Int4 is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
470,832 ↓ · 40 ♡
A tiny Qwen3 causal LM checkpoint used for TRL (Transformer Reinforcement Learning) library internal testing. Not a functional AI model; exists to provide a minimal forward-pass target for unit tests and CI pipelines in the Hugging Face TRL codebase.
464,356 ↓ · 1 ♡
Llama-3.1-8B-Instruct-FP8 is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
460,235 ↓ · 37 ♡
Llama-3.3-70B-Instruct-AWQ is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
460,034 ↓ · 11 ♡
Nemotron-3 Super is NVIDIA's 120B MoE model with 12B active parameters per token, quantised to FP8 for Hopper GPU deployment. It uses a Latent MoE architecture with Multi-Token Prediction and is trained on NVIDIA's full Nemotron dataset suite including code, math, and multilingual instruction data. At 120B total capacity it targets tasks that require deep knowledge without the cost of dense 120B inference.
456,307 ↓ · 260 ♡
Meta-Llama-3.1-70B-Instruct-AWQ-INT4 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
451,423 ↓ · 109 ♡
OPT-1.3B is Meta's Open Pre-trained Transformer at 1.3 billion parameters, released in 2022 as part of a suite ranging from 125M to 175B. The model was trained on a curated mix of publicly available datasets and released with full weights and training logs to enable reproducibility research. It has largely been superseded by later open LLMs but remains a useful controlled baseline.
447,564 ↓ · 184 ♡
llama-3.3-70b-instruct-awq is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
446,437 ↓ · 46 ♡
Qwen3-1.7B-GPTQ-Int8 is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
445,296 ↓ · 7 ♡
DeepSeek-R1-0528-Qwen3-8B is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
441,262 ↓ · 1,077 ♡
GPT-Neo-125M is EleutherAI's open recreation of the GPT-2 class of models, pre-trained on the Pile dataset as part of their open language model initiative. At 125M parameters it's a pedagogical and baseline research model rather than a practical text generator. MIT-licensed and available in multiple frameworks.
440,189 ↓ · 228 ♡
MLX 8-bit quantization of Qwen3-Coder-Next for Apple Silicon inference, targeting macOS M-series hardware via the MLX framework. 8-bit quantization preserves more model quality than 4-bit at the cost of higher memory use. Apache-2.0 licensed.
438,855 ↓ · 3 ♡
tiny-random-Gemma2ForCausalLM is a minimal Gemma 2 architecture stub used for unit testing HuggingFace transformers code. Its weights are randomly initialised and it produces meaningless outputs — it exists solely to provide a fast-loading Gemma2 model class for CI and integration tests without requiring the full multi-GB production checkpoint.
438,340 ↓ · 0 ♡
lynx-instruct-30b is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
431,704 ↓ · 3 ♡
Meta-Llama-3.1-8B-Instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
430,911 ↓ · 97 ♡
VertaLily-1.2-1B-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
430,880 ↓ · 5 ♡
Bielik-11B v3.0 is the SpeakLeash community's Polish-focused 11B instruct model, trained on a large Polish text corpus. The third major version targets comprehensive Polish language tasks including complex reasoning, summarization, and instruction following.
429,715 ↓ · 64 ♡
Hermes-4-14B-AWQ-4bit is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
429,668 ↓ · 4 ♡
Qwen3-235B-A22B-Instruct-2507-FP8 is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
428,035 ↓ · 147 ♡
phi-2 is a generative model in the Phi family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
426,768 ↓ · 3,471 ♡
Granite 3.3-8B Instruct is IBM's latest iteration in the Granite 3.x series, an 8B instruction-tuned model trained on IBM's curated dataset blend emphasising enterprise tasks like code, retrieval-augmented generation, and document understanding. The 3.3 update improves on 3.1 and 3.2 in function calling reliability and structured output generation, both critical for agentic enterprise workflows.
419,279 ↓ · 155 ♡
Qwen3-Next-80B-A3B-Instruct-FP8 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
418,920 ↓ · 90 ♡
GPT-Neo 2.7B was EleutherAI's 2021 open replication of GPT-3 architecture trained on the Pile dataset. At release it was one of the largest freely available autoregressive LLMs. By current standards it is a historical baseline — useful for studying early large-scale open LM behaviour and running ablation experiments where reproducibility of older results matters.
418,301 ↓ · 503 ♡
Qwen3-30B-A3B-FP8 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
417,572 ↓ · 84 ♡
t5gemma-s-s-prefixlm handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
415,214 ↓ · 4 ♡
Qwen3-32B-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
411,965 ↓ · 136 ♡
Qwen3-30B-A3B-abliterated is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
411,727 ↓ · 38 ♡
Qwen2.5-Math-7B is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
409,520 ↓ · 111 ♡
SmolLM-135M is HuggingFace's 135M-parameter LLM trained from scratch on HuggingFace's curated SmolLM-Corpus, designed to push the boundary of what is achievable in extremely compact language models. At 135M it outperforms many prior sub-500M models on standard benchmarks. The model uses the Llama architecture for easy ecosystem integration and is English-focused.
408,004 ↓ · 257 ♡
gpt2-medium is a generative model in the GPT-2 family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
402,062 ↓ · 205 ♡
OLMo-2-0425-1B is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
400,712 ↓ · 79 ♡
DeepSeek-R1-Distill-Llama-8B is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
397,797 ↓ · 866 ♡
NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
395,114 ↓ · 350 ♡
Saiga Llama3 8B is a Russian instruction-tuned model based on Llama 3 8B, fine-tuned by Ilya Gusev using the Saiga dataset collection of Russian dialogues and instructions. It is among the most capable open Russian chat models at this parameter count, offering idiomatic Russian language understanding and generation beyond what vanilla Llama 3 provides on Russian prompts.
394,033 ↓ · 141 ♡
Nemotron-Mini-4B-Instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
393,627 ↓ · 183 ♡
gpt-oss-20b-MXFP4-Q8 is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
393,626 ↓ · 67 ♡
An abliterated (safety-removed) version of Qwen2.5-72B-Instruct by huihui-ai, where refusal mechanisms have been removed using directional activation manipulation. This allows the model to respond to requests the original would decline. The abliteration technique is reversible but the resulting model lacks safety guardrails.
392,488 ↓ · 49 ♡
VLM2Vec-Full is TIGER Lab's vision-language embedding model that adapts a multimodal LLM (based on Phi-3.5-V) into a dual-encoder for multimodal retrieval. It enables text-image retrieval and text-text retrieval in a single embedding space.
391,500 ↓ · 29 ♡
LFM2.5-1.2B-Instruct is Liquid AI's 1.2B instruction-tuned model using their Liquid Foundation Model architecture, which combines recurrent and attention mechanisms for improved long-context efficiency. Supports 9 languages and is positioned as an edge-friendly model from a non-transformer architecture lineage. License is listed as 'other' — check Liquid AI's terms.
391,492 ↓ · 600 ♡
tiny-Qwen3MoeForCausalLM is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
389,597 ↓ · 1 ♡
mistral-nemo-instruct-2407-awq is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
389,582 ↓ · 12 ♡
DeepSeek-Coder-V2-Lite-Instruct-AWQ is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
389,440 ↓ · 9 ♡
Qwen3-32B-FP8 is Alibaba's official FP8-quantized checkpoint of the Qwen3-32B instruction-tuned model, targeting Hopper (H100) GPU inference with FP8 tensor core support. FP8 quantization reduces memory by ~50% vs bf16 while preserving most of the model's accuracy. Apache-2.0 licensed.
389,347 ↓ · 83 ♡
Qwen3-Coder-30B-A3B-Instruct-AWQ handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
388,715 ↓ · 8 ♡
Olmo-3-7B-Instruct handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
385,034 ↓ · 128 ♡
Mistral 7B v0.3 in BitsAndBytes 4-bit quantisation, packaged by Unsloth for memory-efficient fine-tuning and inference. The bnb-4bit format uses NF4 quantisation with double quantisation, reducing VRAM for fine-tuning from ~16GB to ~5-6GB for a 7B model. Unsloth applies custom kernels to accelerate LoRA training further on top of the bitsandbytes quantisation.
382,163 ↓ · 22 ♡
Falcon-7B was TII UAE's 7B autoregressive language model released in 2023, trained on the RefinedWeb dataset derived from Common Crawl with aggressive deduplication and filtering. At release it matched GPT-3.5 on several benchmarks while being fully open-weight. Falcon-7B is a base model without instruction tuning; it is notable historically as an early high-quality openly-licensed 7B LLM.
380,206 ↓ · 1,104 ♡
OTel-LLM-1.2B-IT is a transformer decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
379,356 ↓ · 1 ♡
Gemma 2 2B Instruct is Google's smallest instruction-tuned model in the Gemma 2 family, using the same sliding window + full attention hybrid and logit soft-capping as the 9B variant but at 2.6 billion parameters. At release it set a new bar for sub-3B instruction models on standard benchmarks. It is Apache 2.0 licensed and runs on consumer hardware.
378,754 ↓ · 1,396 ♡
DeepSeek-V2-Lite is a lightweight variant of DeepSeek's MoE architecture, designed to bring V2's Multi-head Latent Attention (MLA) and DeepSeekMoE designs to a smaller footprint. It activates fewer experts per token than V2-full while sharing the same architectural innovations. Uses custom model code, requiring trust_remote_code=True.
376,926 ↓ · 180 ♡
Qwen3 80B MoE instruct model activating 3B parameters per token, offering a high-capacity but compute-efficient inference profile. Positioned as a next-generation step-up from the Qwen3-30B-A3B series with additional pretraining compute.
375,814 ↓ · 1,024 ♡
T5-base fine-tuned to paraphrase text in a ChatGPT-style manner, using T5's text-to-text framework. The model was trained on paraphrase datasets with the goal of rewording inputs while preserving meaning. OpenRAIL license applies — includes usage restrictions on harmful applications.
375,311 ↓ · 193 ♡
AWQ 4-bit quantization of Qwen3-Next-80B-A3B-Instruct, an 80B mixture-of-experts model activating approximately 3B parameters per token. At 80B total with AWQ compression, loading requires substantial RAM despite per-token compute being 3B-equivalent. compressed-tensors format targets vLLM.
374,576 ↓ · 66 ♡
mini-coder-1.7b handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
373,536 ↓ · 5 ♡
ReaderLM-v2 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
372,442 ↓ · 791 ♡
MiniMax-M2.7-NVFP4 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
365,637 ↓ · 59 ♡
Hugging Quants' AWQ INT4 quantization of Meta's Llama-3.1-8B-Instruct model. Llama 3.1 8B Instruct is a well-characterized instruction-following model with solid multilingual coverage across 8 languages. The AWQ quantization uses autoawq and is calibrated for minimal accuracy regression on instruction tasks.
362,315 ↓ · 90 ♡
Llama-3.2-1B-Instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
361,824 ↓ · 98 ♡
LM Studio Community's MLX 4-bit quantization of DeepSeek-R1-0528 based on a Qwen3-8B backbone. MLX format targets Apple Silicon (M-series) inference via the MLX framework. The 0528 suffix denotes a May 2025 update to the R1 series. 4-bit quantization reduces memory use to approximately 5-6 GB unified memory.
361,158 ↓ · 12 ♡
LLaMA 2 7B Chat is Meta's 7B RLHF-aligned conversational model from 2023. While superseded by LLaMA 3 and later releases, it remains a well-understood reference model used for fine-tuning experiments, benchmarking, and educational purposes.
358,965 ↓ · 4,760 ♡
bartowski's GGUF conversion of Google's Gemma 2 2B Instruct, providing multiple quantisation levels for llama.cpp and similar runtimes. Gemma 2 2B Instruct is Google's smallest instruction model in the Gemma 2 family; at 2B parameters it runs on very limited hardware. bartowski maintains a well-regarded GGUF quantisation pipeline with imatrix calibration for quality retention at lower bit depths.
357,498 ↓ · 97 ♡
Zamba2-1.2B-instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
357,076 ↓ · 30 ♡
GLM-4.7-Flash is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
356,098 ↓ · 15 ♡
GLM-4.7-Flash-AWQ-4bit is a generative model in the transformer family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
349,472 ↓ · 54 ♡
japanese-gpt-neox-small is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
349,018 ↓ · 15 ♡
Qwen2.5-Coder-1.5B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
345,326 ↓ · 91 ♡
GLM-4.5-Air is Zhipu AI's lightweight MoE variant of their GLM-4.5 series, designed for fast Chinese-English bilingual inference at reduced serving cost. The 'Air' designation indicates a trimmed serving configuration balancing capability and speed. It uses the GLM4_MOE architecture and targets cloud API and enterprise deployments requiring GLM's strong Chinese language performance.
345,090 ↓ · 609 ♡
Qwen2.5-3B-Instruct-AWQ is a generative model in the Qwen family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
344,701 ↓ · 16 ♡
Llama-3.1-8B-Instruct is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
344,157 ↓ · 12 ♡
LLaMmlein 1B is a German-centric small language model from the University of Würzburg's LSX group, trained from scratch on German text. The 'prerelease' indicates this is a preliminary checkpoint shared before the final publication.
343,431 ↓ · 14 ♡
A GPT-2-scale 87M model from entropy fine-tuned on ZINC chemical compound SMILES notation for molecular generation. Generates novel SMILES strings representing drug-like small molecules.
341,273 ↓ · 4 ♡
bloomz-560m is a generative model in the BLOOM family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
340,628 ↓ · 137 ♡
Qwen2.5-Coder-7B-Instruct-GGUF is a BART decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
340,002 ↓ · 49 ♡
Jan v3.5-4B is Homebrew (Jan.ai)'s 4B instruction-tuned model in GGUF format, designed for local deployment via the Jan desktop application and llama.cpp. It is fine-tuned for general assistant tasks including math, coding, and identity-aware conversation. Jan.ai positions this as a private, on-device alternative to cloud AI assistants for consumer use.
339,299 ↓ · 21 ♡
Gemma 2 9B Instruct is Google's instruction-tuned 9B model from the Gemma 2 family, which introduced sliding window + full attention alternation and logit soft-capping for improved training stability. At release it outperformed Llama 3 8B on multiple benchmarks while remaining smaller, making it one of the most downloaded open instruction models in its size class. It is English-focused with some multilingual capability.
338,409 ↓ · 829 ♡
granite-4.1-3b is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
337,339 ↓ · 77 ♡
Bielik-11B-v3.0-Instruct-awq is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
336,461 ↓ · 1 ♡
gemma-4-31B-it-NVFP4-turbo is a Gemma decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
336,377 ↓ · 284 ♡
DeepSeek-R1-Distill-Qwen-7B is a Qwen decoder-only language model for generative text tasks. It accepts a prompt and autoregressively produces token-by-token completions.
334,543 ↓ · 847 ♡
NVFP4 quantization of a Gemma 4 26B MoE instruct model (4B active parameters) from bg-digitalservices, targeting H100/H200 GPU inference. The 26B MoE with 4B active parameters offers strong capability-per-token compute.
333,951 ↓ · 30 ♡
Qwen2.5-Math 1.5B instruct is a compact math-specialized language model from Alibaba, trained on mathematical corpora and fine-tuned for step-by-step problem solving. Despite its small size, it competes with much larger general models on MATH and GSM8K benchmarks.
333,486 ↓ · 55 ♡
An AWQ (Activation-aware Weight Quantization) conversion of Mistral Small 24B Instruct (January 2025), offering 4-bit quantized inference at reduced memory while preserving most of the original model's instruction-following quality.
333,422 ↓ · 29 ♡
Qwen3Guard-Gen is a 0.6B generative content safety model from Alibaba, designed to classify and explain potential policy violations in model outputs. It can generate natural language explanations of why content may be unsafe, unlike binary classifiers.
328,260 ↓ · 73 ♡
An MLX-format conversion of Moonshot AI's Kimi K2.5 MoE for Apple Silicon local inference. Kimi K2 models are large MoE language models from Moonshot AI with strong reasoning, available here for native Apple Silicon inference.
324,555 ↓ · 38 ♡
L3.3-GeneticLemonade-Final-v2-70B is a generative model in the Llama family. It covers a broad range of prompted tasks: summarization, translation, code assistance, and question answering.
323,585 ↓ · 11 ♡
Code LLaMA 7B is Meta's code-specialized 7B model, initialized from LLaMA 2 7B and further trained on code data. It supports code completion, infilling, and code instruction following at a practical 7B parameter budget.
321,306 ↓ · 377 ♡
Unsloth's optimized version of Qwen2.5-7B-Instruct, applying Unsloth's memory and speed improvements for fine-tuning and inference. Unsloth reduces VRAM usage during training by up to 60% through custom CUDA kernels and gradient checkpointing optimizations. Apache-2.0 licensed.
321,264 ↓ · 27 ♡
A 4-bit MLX quantization of LFM2-24B-A2B, Liquid AI's second-generation 24B MoE model activating ~2B parameters per token, prepared for Apple Silicon local inference by the LM Studio community.
320,702 ↓ · 4 ♡
Qwen2.5-Coder-3B-Instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
320,309 ↓ · 105 ♡
OTel-LLM-0.6B-IT is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
318,539 ↓ · 0 ♡
Qwen3-4B-AWQ handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
318,220 ↓ · 29 ♡
Qwen2.5-3B-Instruct-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
317,263 ↓ · 128 ♡
An 8-bit MLX quantization of LFM2-24B-A2B for Apple Silicon, offering higher accuracy than the 4-bit variant at the cost of roughly double the memory requirement. Targets M2/M3 Ultra or M3 Max Macs with sufficient unified memory.
317,211 ↓ · 2 ♡
A 5-bit MLX quantization of LFM2-24B-A2B, sitting between the 4-bit and 8-bit variants in the accuracy/memory tradeoff space. Useful for Apple Silicon users who want more quality than 4-bit but less memory usage than 8-bit.
317,034 ↓ · 1 ♡
granite-4.0-micro is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
316,985 ↓ · 271 ♡
A 6-bit MLX quantization of Liquid AI's LFM2-24B-A2B for Apple Silicon, targeting the sweet spot between memory efficiency and output quality. 6-bit quantization typically preserves instruction-following quality well while cutting memory vs 8-bit.
316,848 ↓ · 3 ♡
Mistral-7B-Instruct-v0.1 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
316,680 ↓ · 1,833 ♡
kogpt2-base-v2 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
312,018 ↓ · 61 ♡
Qwen3-30B-A3B-Instruct-2507-FP8 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
311,787 ↓ · 127 ♡
AWQ 4-bit quantisation of Qwen3-Coder-30B-A3B, a MoE code model with 30B total and 3B active parameters per token. The AWQ quantisation reduces memory requirements while the MoE architecture keeps compute low per token, making this a practical option for running a frontier-class code model on a single high-VRAM GPU. The model is instruction-tuned for agentic coding tasks.
310,634 ↓ · 55 ♡
NVIDIA-Nemotron-Nano-9B-v2-FP8 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
310,539 ↓ · 9 ♡
gemma-2-2b is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
309,987 ↓ · 641 ♡
An abliterated (safety-filter-removed) version of Gemma 4 E4B instruct from the OBLITERATUS project. The abliteration technique modifies weight directions associated with refusal behavior, allowing unconstrained generation at the cost of losing safety alignment.
309,858 ↓ · 715 ♡
OTel-LLM-1.7B-IT is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
309,478 ↓ · 1 ♡
MiniMax-M2.7-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
309,470 ↓ · 174 ♡
OTel-LLM-1B-IT is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
309,136 ↓ · 1 ♡
HyperCLOVAX-SEED-Think-14B-GPTQ is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
309,011 ↓ · 0 ♡
GLM-5 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
308,513 ↓ · 2,085 ♡
Gemma 2B is Google's 2B-parameter open language model from early 2024, trained on 2T tokens of web, code, and math data. It was notable at release for punching above its weight class on benchmarks vs other 2B models available at the time.
307,263 ↓ · 1,185 ♡
OTel-LLM-270M-IT is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
305,687 ↓ · 0 ♡
OTel-LLM-8.3B-IT handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
305,168 ↓ · 2 ♡
Qwen3.5-397B-A17B-Opus-4.6-Reasoning-Uncensored-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
304,096 ↓ · 22 ♡
gpt-oss-20b-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
303,945 ↓ · 684 ♡
GLM-4.7-FP8 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
301,918 ↓ · 123 ♡
gpt-j-6b is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
299,770 ↓ · 1,524 ♡
GLM-4.7-Flash-MLX-8bit is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
299,295 ↓ · 11 ♡
DialoGPT-medium is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
298,332 ↓ · 436 ♡
Solar-Open-100B is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
297,968 ↓ · 475 ♡
Qwen2.5-1.5B-Instruct-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
297,188 ↓ · 95 ♡
GLM-5.1 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
296,811 ↓ · 1,612 ♡
Meta-Llama-3.3-70B-Instruct-AWQ-INT4 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
296,047 ↓ · 31 ♡
EXAONE-Deep-7.8B is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
295,722 ↓ · 102 ♡
An 8-bit MLX quantisation of DeepSeek R1-0528 built on the Qwen3-8B backbone, packaged by LMStudio for native Apple Silicon inference. DeepSeek R1 is a reasoning model that generates extended chain-of-thought traces before answers; this variant applies the R1-0528 update's improved distillation from the larger R1 model. The MLX format enables Metal GPU acceleration on M-series Macs.
295,466 ↓ · 18 ♡
Qwen2.5-Coder-7B-Instruct in AWQ 4-bit quantisation, the official Alibaba release for memory-efficient code generation serving. AWQ preserves the most salient weights at higher precision, enabling deployment of the 7B code model on a single GPU with ~8GB VRAM. It achieves competitive HumanEval and MBPP scores relative to the BF16 original while halving memory requirements.
294,709 ↓ · 25 ♡
GLM-4.7-Flash-MLX-6bit is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
294,338 ↓ · 8 ♡
HyperCLOVAX-SEED-Omni-8B is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
293,359 ↓ · 186 ♡
Llama-3.2-3B-Instruct-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
293,029 ↓ · 206 ♡
Meta-Llama-3.1-8B-Instruct-bnb-4bit is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
291,341 ↓ · 99 ♡
Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
289,276 ↓ · 256 ♡
dummy-GPT2-correct-vocab is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
286,982 ↓ · 0 ♡
Dream-v0-Instruct-7B is a diffusion-based language model — distinct from autoregressive LLMs — that generates text by iteratively denoising a masked sequence rather than left-to-right token prediction. It is instruction-tuned and supports bidirectional context at inference time, which enables flexible text infilling without explicit prompting tricks. This is an early research release exploring the diffusion LM paradigm.
271,115 ↓ · 157 ♡
Qwen2.5-Coder-7B handles instruction prompts, multi-turn dialogue, and open-ended text generation. It follows chat template conventions and supports system-level role instructions.
265,736 ↓ · 148 ♡
Step-3.5-Flash is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
248,316 ↓ · 819 ♡
Darwin-9B-NEG is a 9B model from ansulev, likely a negation-aware variant trained to improve understanding of negative statements in text. The NEG suffix suggests specialization toward negation handling, which remains a known weakness in many transformer language models.
231,963 ↓ · 15 ♡
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
231,881 ↓ · 191 ♡
SmolLM2-360M-Instruct is an open-source text-generation model available on HuggingFace. Details are sourced from the public model registry.
229,139 ↓ · 196 ♡