Gemma 4-E4B-IT is Google DeepMind's edge-optimized 4-billion-parameter any-to-any multimodal model from the Gemma 4 family, designed for deployment on mobile and edge devices rather than servers. The 'any-to-any' pipeline_tag indicates multimodal input and output capability beyond standard image-text-to-text. Apache 2.0 licensed.
6,138,750 ↓ · 1,269 ♡
Gemma 4 E2B is Google's efficient 2B-parameter multimodal model, instruction-tuned for both image-text and text-only prompts. It targets edge and on-device deployment where a sub-3B footprint is necessary.
2,390,353 ↓ · 767 ♡
Qwen3-Omni-30B-A3B-Instruct handles multiple input and output modalities including text, images, and audio within a single unified architecture.
2,020,526 ↓ · 943 ♡
gemma-4-12B-it is Google's Gemma 4 multimodal (text + image) instruction-tuned model. It accepts both text and image inputs and produces text, making it suitable for document analysis, visual Q&A, and structured data extraction. Released under Apache-2.0, it targets users who need a capable VLM without access restrictions.
1,696,240 ↓ · 1,108 ♡
Qwen2.5-Omni-3B handles multiple input and output modalities including text, images, and audio within a single unified architecture.
1,667,766 ↓ · 336 ♡
Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 processes and generates across multiple modalities, enabling cross-modal reasoning in a single model call.
1,369,439 ↓ · 143 ♡
gemma-4-12B-it-qat-w4a16-ct is a quantization-aware trained (QAT) weights for W4A16 deployment version of Google's Gemma 4 multimodal (text + image) instruction-tuned model. 12B parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,270,771 ↓ · 29 ♡
A 4-bit MLX quantization of Google's Gemma 4 E4B instruct model (an efficient 4B-equivalent MoE variant) for Apple Silicon. Targets developers who want Gemma 4 running locally on MacBook-class hardware.
1,083,883 ↓ · 12 ♡
An 8-bit MLX quantization of Google's Gemma 4 E4B instruct model for Apple Silicon. Higher quality than the 4-bit variant at the cost of roughly double the memory, targeting M2/M3 Pro or Max class machines.
1,056,151 ↓ · 7 ♡
gemma-4-E4B-it-MLX-6bit is a MLX 6-bit quantized weights optimized for Apple Silicon inference version of Google's Gemma 4 MoE-based multimodal (text + image) instruction-tuned model. parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,039,621 ↓ · 3 ♡
gemma-4-E4B-it-MLX-5bit is a MLX 5-bit quantized weights optimized for Apple Silicon inference version of Google's Gemma 4 MoE-based multimodal (text + image) instruction-tuned model. parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,038,809 ↓ · 0 ♡
Qwen2.5-Omni-7B is a multimodal model accepting diverse input types and producing outputs across text, vision, and audio modalities.
722,512 ↓ · 1,910 ♡
gemma-4-E4B is a multimodal model accepting diverse input types and producing outputs across text, vision, and audio modalities.
596,652 ↓ · 323 ♡
gemma-4-31B-it-assistant is an open-source any-to-any model available on HuggingFace. Details are sourced from the public model registry.
489,708 ↓ · 304 ♡
Nemotron-3 Nano Omni is NVIDIA's multimodal reasoning model — 30B total parameters with 3B active per token — that extends the Nemotron-H architecture to support any-to-any input and output modalities including audio, image, and text. The Reasoning variant includes a thinking mode for extended chain-of-thought. It runs in BF16 full precision, targeting multi-GPU H100/H200 deployments.
445,501 ↓ · 357 ♡
gemma-4-12B-it-qat-q4_0-gguf is an open-source any-to-any model available on HuggingFace. Details are sourced from the public model registry.
441,974 ↓ · 178 ♡
OneThinker-SFT is a Qwen3-8B model fine-tuned by OneThink with supervised fine-tuning (SFT) on a vision-language task mixture, using the Qwen3-VL architecture for any-to-any multimodal output. Apache-2.0 licensed.
431,837 ↓ · 4 ♡
MiniCPM-o 2.6 is an omnimodal 8B model from OpenBMB supporting speech, image, and text inputs with real-time audio output. It targets on-device multimodal scenarios, particularly mobile and edge deployments, with end-to-end speech conversation capability.
424,139 ↓ · 1,292 ♡
Gemma-4-E2B is Google's 2B edge model from the Gemma-4 family, designed for on-device deployment with multimodal any-to-any capability. The 'E' prefix indicates edge-optimized — smaller memory footprint and lower latency are prioritized over raw capability. Supports image and text input/output in a single model.
391,355 ↓ · 352 ♡
gemma-4-12B-it-qat-GGUF is an open-source any-to-any model available on HuggingFace. Details are sourced from the public model registry.
383,211 ↓ · 269 ♡
Qwen3-Omni-30B-A3B-Thinking is an open-source any-to-any model available on HuggingFace. Details are sourced from the public model registry.
339,575 ↓ · 307 ♡
gemma-4-12B is an open-source any-to-any model available on HuggingFace. Details are sourced from the public model registry.
253,618 ↓ · 566 ♡