ultravox-v0_5-llama-3_2-1b
ultravox-v0_5-llama-3_2-1b is released without a specific pipeline. Common uses include feature extraction, encoder probing, and domain-specific fine-tuning.
4 models · ranked by HuggingFace downloads
ultravox-v0_5-llama-3_2-1b is released without a specific pipeline. Common uses include feature extraction, encoder probing, and domain-specific fine-tuning.
Qwen2-Audio-7B-Instruct is Alibaba's multimodal model handling audio and text inputs, capable of audio analysis, speech-to-text transcription, and audio-grounded Q&A. It's instruction-tuned for dialog about audio content. Apache-2.0 licensed and compatible with the Transformers qwen2_audio model type.
VibeVoice-ASR is Microsoft's HuggingFace-packaged automatic speech recognition model, likely a Whisper-style or custom encoder-decoder ASR system targeting informal or conversational speech. The 'Vibe' branding suggests orientation toward natural conversational audio.
ultravox-v0_6-llama-3_1-8b is an open-source audio-text-to-text model available on HuggingFace. Details are sourced from the public model registry.