Use cases
- Local TTS for accessibility tools and screen readers without API cost
- Podcast and audiobook content creation from text
- Voice assistant response generation on-device or in lightweight servers
- Narration generation for video content at low compute cost
- Research into efficient TTS at sub-100M parameter scale
Pros
- Apache 2.0 license for unrestricted commercial use
- 82M parameters enables CPU and low-end GPU inference
- Natural prosody quality for its parameter count, based on StyleTTS2
- Multiple English voice styles available from a single checkpoint
Cons
- English-only; no multilingual TTS capability
- Prosody and naturalness below larger TTS models for demanding audiobook production
- Limited control over speaking rate and emphasis compared to larger commercial TTS APIs
- Community model without a major lab's production testing or SLA
- Fine-tuning requires StyleTTS2 training expertise
FAQ
What is Kokoro-82M used for?
Local TTS for accessibility tools and screen readers without API cost. Podcast and audiobook content creation from text. Voice assistant response generation on-device or in lightweight servers. Narration generation for video content at low compute cost. Research into efficient TTS at sub-100M parameter scale.
Is Kokoro-82M free to use?
Kokoro-82M is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.
How do I run Kokoro-82M locally?
Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.