AI Tools.

Search

automatic speech recognition

wav2vec2-large-xlsr-53-portuguese

wav2vec2-large-xlsr-53-portuguese is a XLSR-53 model fine-tuned on Portuguese Common Voice data for automatic speech recognition using CTC decoding on 16kHz mono audio. It achieves competitive word error rates on both European and Brazilian Portuguese test sets. Part of the community XLSR fine-tuning effort from the 2021 HuggingFace strong speech event.

Last reviewed

Use cases

  • Transcribing Portuguese audio recordings and podcast content
  • Voice command recognition in Portuguese-language applications
  • Portuguese ASR baseline before custom domain data fine-tuning
  • Academic benchmarking on Common Voice Portuguese test splits

Pros

  • Apache 2.0 license enables commercial transcription deployment
  • Compatible with the standard HuggingFace ASR pipeline out of the box
  • Fine-tuned on Common Voice Portuguese, covering both PT-PT and PT-BR accents

Cons

  • CTC decoding without a language model produces higher WER on noisy audio
  • Requires 16kHz mono audio input — resampling adds preprocessing overhead
  • Significantly outperformed by Whisper-large-v3-turbo on Portuguese transcription

FAQ

What is wav2vec2-large-xlsr-53-portuguese used for?

Transcribing Portuguese audio recordings and podcast content. Voice command recognition in Portuguese-language applications. Portuguese ASR baseline before custom domain data fine-tuning. Academic benchmarking on Common Voice Portuguese test splits.

Is wav2vec2-large-xlsr-53-portuguese free to use?

wav2vec2-large-xlsr-53-portuguese is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run wav2vec2-large-xlsr-53-portuguese locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

transformerspytorchjaxwav2vec2automatic-speech-recognitionaudiohf-asr-leaderboardmozilla-foundation/common_voice_6_0ptrobust-speech-eventspeechxlsr-fine-tuning-weekdataset:common_voicedataset:mozilla-foundation/common_voice_6_0doi:10.57967/hf/3572license:apache-2.0model-indexendpoints_compatibledeploy:azureregion:us