roberta-base vs xlm-roberta-base

roberta-base and xlm-roberta-base are both fill-mask models. See each entry for specifics.

roberta-base

Pipeline: fill mask
Downloads: 18,684,651
Likes: 595

RoBERTa base from Facebook AI, trained with the same architecture as BERT base but significantly more data, longer training schedules, larger batch sizes, and dynamic masking. Pre-trained on BookCorpus, Wikipedia, CC-News, OpenWebText, and Stories — substantially more data than the original BERT. MIT licensed with multi-framework support.

xlm-roberta-base

Pipeline: fill mask
Downloads: 18,605,818
Likes: 822

XLM-RoBERTa base from Facebook AI, pre-trained on 2.5TB of filtered CommonCrawl text across 100 languages using the RoBERTa training procedure. Enables cross-lingual transfer — models fine-tuned on labeled English data can infer on other languages without parallel annotations. The standard starting point for multilingual classification and token-level tasks.

Key differences

See individual model pages for architecture and use cases.

Common ground

Both are open-source models on HuggingFace.

Which should you pick?

Pick based on your compute budget and specific task requirements.