AI Tools.

Search

sentence similarity

nomic-embed-text-v1

Nomic Embed Text v1 is the original version of Nomic AI's English text embedding model based on nomic-BERT, preceding the v1.5 matryoshka update. It produces 768-dimensional embeddings via contrastive learning and is fully open — model weights, training code, and data are publicly available. Apache 2.0 licensed.

Last reviewed

Use cases

  • Semantic search and retrieval in English text corpora
  • RAG pipeline embedding where training data transparency matters
  • Research reproducibility for open embedding model benchmarks
  • Integrating with transformers.js for browser-side embedding
  • Building auditable ML pipelines requiring open training data

Pros

  • Apache 2.0 license; training data and code publicly available
  • transformers.js support for browser-side inference
  • ONNX-compatible for production deployment
  • Full openness — training data, code, and weights released

Cons

  • v1.5 with matryoshka support is strictly better — new projects should use v1.5
  • English-only; no multilingual capability
  • Custom nomic_bert architecture requires custom_code trust flag
  • 768-dim output at similar compute to BGE-base without matryoshka flexibility
  • Smaller community adoption than sentence-transformers family models

FAQ

What is nomic-embed-text-v1 used for?

Semantic search and retrieval in English text corpora. RAG pipeline embedding where training data transparency matters. Research reproducibility for open embedding model benchmarks. Integrating with transformers.js for browser-side embedding. Building auditable ML pipelines requiring open training data.

Is nomic-embed-text-v1 free to use?

nomic-embed-text-v1 is an open-source model published on HuggingFace. License terms vary by model — check the model card for the specific license.

How do I run nomic-embed-text-v1 locally?

Most HuggingFace models can be loaded with transformers or the appropriate framework library. See the model card for framework-specific instructions and hardware requirements.

Tags

sentence-transformerspytorchonnxsafetensorsnomic_bertfeature-extractionsentence-similaritymtebtransformerstransformers.jscustom_codeenarxiv:2402.01613license:apache-2.0model-indextext-embeddings-inferenceendpoints_compatibleregion:us