AI Tools.

Search

wespeaker-voxceleb-resnet34-LM

WeSpeaker's VoxCeleb-trained ResNet34 speaker embedding model with Large Margin (LM) training, adapted for the pyannote ecosystem. It produces fixed-dimensional speaker embeddings from audio segments, used for speaker verification and as the speaker clustering component in pyannote speaker diarization pipelines. Trained on the VoxCeleb dataset.

Last reviewed

Use cases

  • Speaker verification — confirming whether two audio clips contain the same voice
  • Speaker embedding extraction for downstream clustering in diarization systems
  • Speaker identification in known-speaker enrollment scenarios
  • Audio segment comparison for voice similarity scoring
  • Component in pyannote speaker diarization pipeline

Pros

  • Large Margin training improves speaker discriminability over standard training
  • VoxCeleb training provides broad multilingual speaker coverage
  • CC-BY-4.0 license for commercial use with attribution
  • Integrates directly with pyannote speaker diarization pipeline

Cons

  • No pipeline_tag — requires pyannote or custom code for inference
  • Performance degrades on non-speech audio and noisy recordings
  • Channel mismatch between VoxCeleb (YouTube) and other microphone types can reduce accuracy
  • Does not identify who a speaker is without an enrollment database
  • Speaker embedding quality depends on audio quality and segment length

FAQ

What is wespeaker-voxceleb-resnet34-LM used for?

Speaker verification — confirming whether two audio clips contain the same voice. Speaker embedding extraction for downstream clustering in diarization systems. Speaker identification in known-speaker enrollment scenarios. Audio segment comparison for voice similarity scoring. Component in pyannote speaker diarization pipeline.

Is wespeaker-voxceleb-resnet34-LM free to use?

wespeaker-voxceleb-resnet34-LM is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run wespeaker-voxceleb-resnet34-LM locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

pyannote-audiopytorchpyannotepyannote-audio-modelwespeakeraudiovoicespeechspeakerspeaker-recognitionspeaker-verificationspeaker-identificationspeaker-embeddingdataset:voxceleblicense:cc-by-4.0region:us