AI Tools.

Search

text to speech

XTTS-v2

XTTS-v2 is Coqui's multilingual text-to-speech model supporting 17 languages with voice cloning from a short audio sample. It uses a GPT-style decoder for speech token generation, enabling zero-shot speaker cloning without fine-tuning. The model was released before Coqui's closure and remains available under a non-standard license.

Last reviewed

Use cases

  • Multilingual voice cloning for localization workflows
  • Zero-shot TTS from a 6-second speaker audio sample
  • Audiobook narration in supported languages
  • Game character voice generation with consistent speaker identity
  • Accessibility tools requiring personalized voice output

Pros

  • 17-language multilingual support including Portuguese, Polish, Turkish, and Arabic
  • Voice cloning from a short audio sample without fine-tuning
  • GPT-based decoder produces more natural prosody than older TTS models
  • Widely tested in the Coqui TTS open-source ecosystem

Cons

  • License is 'other' — not Apache/MIT; Coqui has closed operations, review terms carefully for commercial use
  • Voice cloning quality varies significantly with audio sample quality and duration
  • Inference requires more compute than simpler TTS architectures
  • No active maintenance following Coqui's closure
  • Output quality for low-resource languages in the 17-language set varies substantially

FAQ

What is XTTS-v2 used for?

Multilingual voice cloning for localization workflows. Zero-shot TTS from a 6-second speaker audio sample. Audiobook narration in supported languages. Game character voice generation with consistent speaker identity. Accessibility tools requiring personalized voice output.

Is XTTS-v2 free to use?

XTTS-v2 is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run XTTS-v2 locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

coquitext-to-speechlicense:otherregion:us