Use cases
- Dense retrieval in domains lacking labeled query-document pairs
- Zero-shot dense retrieval baseline comparison
- Unsupervised passage retrieval for domain-specific corpora
- Initial retrieval stage in RAG pipelines where fine-tuning data is scarce
- Research into unsupervised vs. supervised dense retrieval tradeoffs
Pros
- Unsupervised training enables retrieval without any labeled data
- BERT backbone with standard HuggingFace transformers integration
- Publicly available Apache-adjacent weights for research
- Strong baseline for evaluating dense retrieval without supervision
Cons
- No pipeline_tag; requires manual transformers integration for inference
- Outperformed by supervised models (BGE, E5, nomic-embed) on standard benchmarks
- No instruction tuning or asymmetric query-passage training
- Domain-specific retrieval often requires fine-tuning despite unsupervised pretraining
- Less maintained than BAAI BGE and similar production-ready embedding models
FAQ
What is contriever used for?
Dense retrieval in domains lacking labeled query-document pairs. Zero-shot dense retrieval baseline comparison. Unsupervised passage retrieval for domain-specific corpora. Initial retrieval stage in RAG pipelines where fine-tuning data is scarce. Research into unsupervised vs. supervised dense retrieval tradeoffs.
Is contriever free to use?
contriever is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.
How do I run contriever locally?
Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.