bert-base-uncased vs roberta-large

bert-base-uncased and roberta-large are both fill-mask models. See each entry for specifics.

bert-base-uncased

Pipeline: fill mask
Downloads: 59,598,776
Likes: 2,641

Google's original BERT base model in uncased form, pre-trained on BookCorpus and English Wikipedia via masked language modeling. Tokens are lowercased before processing, making it insensitive to capitalization. It remains a standard fine-tuning base for classification, NER, and extractive QA, though newer encoders outperform it on most benchmarks.

roberta-large

Pipeline: fill mask
Downloads: 18,627,609
Likes: 283

RoBERTa large, the 355M-parameter version of Facebook AI's strongly trained BERT variant, offering doubled hidden size and additional attention heads over RoBERTa base. It provides stronger NLU accuracy at roughly 4x the inference compute cost of the base variant. Used where task accuracy on complex English language understanding outweighs latency constraints.

Key differences

See individual model pages for architecture and use cases.

Common ground

Both are open-source models on HuggingFace.

Which should you pick?

Pick based on your compute budget and specific task requirements.