roberta-base vs roberta-large

roberta-base and roberta-large are both fill-mask models. See each entry for specifics.

roberta-base

Pipeline: fill mask
Downloads: 18,684,651
Likes: 595

RoBERTa base from Facebook AI, trained with the same architecture as BERT base but significantly more data, longer training schedules, larger batch sizes, and dynamic masking. Pre-trained on BookCorpus, Wikipedia, CC-News, OpenWebText, and Stories — substantially more data than the original BERT. MIT licensed with multi-framework support.

roberta-large

Pipeline: fill mask
Downloads: 18,627,609
Likes: 283

RoBERTa large, the 355M-parameter version of Facebook AI's strongly trained BERT variant, offering doubled hidden size and additional attention heads over RoBERTa base. It provides stronger NLU accuracy at roughly 4x the inference compute cost of the base variant. Used where task accuracy on complex English language understanding outweighs latency constraints.

Key differences

See individual model pages for architecture and use cases.

Common ground

Both are open-source models on HuggingFace.

Which should you pick?

Pick based on your compute budget and specific task requirements.