text classification models

47 models · ranked by HuggingFace downloads

bge-reranker-v2-m3

BGE-Reranker-v2-M3 is BAAI's multilingual cross-encoder reranker built on XLM-RoBERTa, designed for re-ranking retrieved passages in multilingual RAG or search pipelines. It jointly encodes query-passage pairs to produce relevance scores, providing higher accuracy than bi-encoder similarity for the same candidate set. Apache 2.0 licensed with text-embeddings-inference support.

15,789,545 ↓ · 1,047 ♡

finbert

FinBERT is a BERT model fine-tuned on financial news and financial communications text for financial sentiment analysis, classifying text as positive, negative, or neutral from a finance domain perspective. Developed by Prosus AI (Naspers), it targets applications where general-purpose sentiment models fail on financial jargon and market-specific framing.

7,754,098 ↓ · 1,179 ♡

bge-reranker-base

Cross-encoder reranker trained on multilingual data (English and Chinese) using XLM-RoBERTa. It scores query-document pairs directly rather than comparing embeddings, making it more accurate than bi-encoders for retrieval pipelines at the cost of higher latency.

3,935,024 ↓ · 238 ♡

twitter-roberta-base-sentiment-latest

RoBERTa-base fine-tuned on ~124M tweets for three-class sentiment classification (positive/neutral/negative). Trained by Cardiff NLP on the TweetEval benchmark, it consistently ranks among the top-performing tweet-specific sentiment models.

3,737,670 ↓ · 809 ♡

distilbert-base-uncased-finetuned-sst-2-english

distilbert-base-uncased-finetuned-sst-2-english is a DistilBERT model fine-tuned on the Stanford Sentiment Treebank v2 (SST-2) for binary positive/negative sentiment classification of English text. It is one of the most downloaded sentiment classifiers on HuggingFace and commonly used as a demonstration or fast baseline. The model achieves approximately 91% accuracy on the SST-2 validation set.

3,468,481 ↓ · 908 ♡

twitter-xlm-roberta-base-sentiment

twitter-xlm-roberta-base-sentiment classifies text into predefined label categories using a RoBERTa encoder fine-tuned with a classification head. It outputs per-class logits.

1,409,824 ↓ · 268 ♡

RADAR-Vicuna-7B

RADAR-Vicuna-7B maps input sequences to one or more labels. Fine-tuned on labeled data, it covers tasks like sentiment analysis, topic detection, and intent classification.

1,398,099 ↓ · 13 ♡

Prompt-Guard-86M

Prompt-Guard-86M is a sequence classifier built on a DeBERTa backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

1,351,778 ↓ · 347 ♡

roberta-base-go_emotions

This RoBERTa-base model fine-tuned on Google's GoEmotions dataset classifies English text into 28 emotion categories or groups them into sentiment buckets. It supports multi-label classification, meaning a sentence can be tagged with several emotions simultaneously. The model is widely used in sentiment analytics pipelines and social media monitoring tools.

1,239,075 ↓ · 679 ♡

tiny-Qwen2ForSequenceClassification-2.5

tiny-Qwen2ForSequenceClassification-2.5 maps input sequences to one or more labels. Fine-tuned on labeled data, it covers tasks like sentiment analysis, topic detection, and intent classification.

1,212,423 ↓ · 1 ♡

roberta-base-openai-detector

roberta-base-openai-detector is a RoBERTa-base binary classifier trained to distinguish GPT-2-generated text from human-written text. It was released by the OpenAI Grover team and works by fine-tuning on paired human/machine samples from the GPT-2 output corpus. As an early-generation AI text detector, it is most accurate on GPT-2 output and significantly less reliable on newer LLMs.

987,894 ↓ · 134 ♡

bert-base-multilingual-uncased-sentiment

bert-base-multilingual-uncased-sentiment is a sequence classifier built on a BERT backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

951,325 ↓ · 478 ♡

twitter-roberta-base-sentiment

twitter-roberta-base-sentiment maps input sequences to one or more labels. Fine-tuned on labeled data, it covers tasks like sentiment analysis, topic detection, and intent classification.

899,594 ↓ · 337 ♡

emotion-english-distilroberta-base

emotion-english-distilroberta-base classifies text into predefined label categories using a RoBERTa encoder fine-tuned with a classification head. It outputs per-class logits.

870,505 ↓ · 497 ♡

finbert-tone

finbert-tone is a sequence classifier built on a BERT backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

783,029 ↓ · 220 ♡

sentiment-polish-gpt2-small

sentiment-polish-gpt2-small is a GPT2-small fine-tuned for Polish sentiment classification, trained on the PolEmo 2.0 dataset with four sentiment classes (positive, negative, ambivalent, neutral). GPT2 architecture fine-tunes cleanly for sequence classification despite being a causal LM by adding a classification head. It is one of the few openly available Polish-specific sentiment models at small scale.

776,849 ↓ · 1 ♡

turn-detector

turn-detector is a sequence classifier built on a Llama backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

718,977 ↓ · 111 ♡

distilbert-base-multilingual-cased-sentiments-student

distilbert-base-multilingual-cased-sentiments-student is a sequence classifier built on a DistilBERT backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

667,380 ↓ · 313 ♡

bge-reranker-v2-gemma

bge-reranker-v2-gemma is BAAI's cross-encoder reranker built on a Gemma backbone, trained for multilingual passage reranking. Unlike bi-encoders that embed query and document independently, it attends jointly to both as a cross-encoder for higher reranking accuracy. It is designed to be used as the second stage after a bi-encoder retrieval step in two-stage RAG pipelines.

657,857 ↓ · 85 ♡

xlm-roberta-base-language-detection

xlm-roberta-base-language-detection maps input sequences to one or more labels. Fine-tuned on labeled data, it covers tasks like sentiment analysis, topic detection, and intent classification.

550,325 ↓ · 374 ♡

jina-reranker-m0

jina-reranker-m0 is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

480,468 ↓ · 120 ♡

robertuito-sentiment-analysis

robertuito-sentiment-analysis classifies text into predefined label categories using a RoBERTa encoder fine-tuned with a classification head. It outputs per-class logits.

477,762 ↓ · 100 ♡

koelectra-small-v3-nsmc

KoELECTRA-small-v3 fine-tuned on the Naver Sentiment Movie Corpus (NSMC) for Korean binary sentiment classification. It uses the ELECTRA discriminator architecture, which is more parameter-efficient than BERT for the same task performance.

455,626 ↓ · 7 ♡

Qwen2.5-1.5B-apeach

Qwen2.5-1.5B-apeach is a Korean hate speech detection classifier fine-tuned on the APEACH dataset using Qwen2.5-1.5B as the backbone. APEACH (Automated Pipeline for Evaluation Against Crowdsourced Hate speech) is a Korean benchmark for detecting offensive and hateful content. This model converts the LLM into a sequence classifier for binary or multi-class Korean content moderation.

455,185 ↓ · 6 ♡

MedCPT-Cross-Encoder

MedCPT-Cross-Encoder is a BERT-based cross-encoder from NCBI fine-tuned for medical text relevance scoring, trained on PubMed query-article pairs. It takes a query and a candidate passage and scores their relevance, making it a reranking component in medical information retrieval pipelines. The model is typically paired with MedCPT's query encoder for a full retrieval-reranking system.

427,815 ↓ · 31 ♡

fasttext-language-identification

Meta's fastText-based language identification model, capable of identifying 176 languages from short text strings. Extremely fast CPU inference makes it practical for preprocessing pipelines that need to route text by language.

426,910 ↓ · 269 ♡

Qwen3-Reranker-4B-W4A16-G128

A W4A16 (4-bit weights, 16-bit activations) quantized version of the Qwen3-Reranker-4B, enabling efficient cross-encoder reranking inference. The group-size-128 quantization balances compression ratio against accuracy retention for reranking tasks.

409,991 ↓ · 2 ♡

KR-FinBert-SC

KR-FinBert-SC is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

371,938 ↓ · 39 ♡

OTel-Reranker-0.6B

A 0.6B cross-encoder reranker from farbodtavakkoli specialized for reranking OpenTelemetry log, trace, and metric documents. Pairs with OTel-Embedding models for a two-stage observability retrieval pipeline.

360,544 ↓ · 1 ♡

roberta-large-mnli

RoBERTa-large fine-tuned on the Multi-Genre Natural Language Inference (MNLI) corpus, commonly used for zero-shot text classification via the NLI entailment trick. One of the most frequently used models for zero-shot classification before dedicated models like DeBERTa-MNLI improved further.

349,227 ↓ · 210 ♡

cryptobert

cryptobert is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

344,003 ↓ · 190 ♡

deberta-large-mnli

deberta-large-mnli classifies text into predefined label categories using a DeBERTa encoder fine-tuned with a classification head. It outputs per-class logits.

341,460 ↓ · 32 ♡

internlm2-1_8b-reward

internlm2-1_8b-reward is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

339,215 ↓ · 16 ♡

deberta-v3-base-prompt-injection-v2

deberta-v3-base-prompt-injection-v2 is a sequence classifier built on a DeBERTa backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

329,160 ↓ · 108 ♡

beto-sentiment-analysis

BETO-based sentiment analysis model from finiteautomata, fine-tuned for Spanish sentiment classification. BETO is the Spanish BERT model, and this checkpoint targets positive/negative/neutral classification on Spanish text.

324,795 ↓ · 35 ♡

xlm-emo-t

xlm-emo-t is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

309,766 ↓ · 11 ♡

phishing-email-detection-distilbert_v2.4.1

phishing-email-detection-distilbert_v2.4.1 is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

308,997 ↓ · 26 ♡

inclusively-classification

inclusively-classification is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

307,175 ↓ · 1 ♡

distilroberta-finetuned-financial-news-sentiment-analysis

distilroberta-finetuned-financial-news-sentiment-analysis is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

304,970 ↓ · 458 ♡

distilbert-base-uncased-emotion

distilbert-base-uncased-emotion is a sequence classifier built on a DistilBERT backbone. Given a string, it scores each candidate label and returns the highest-confidence prediction.

303,852 ↓ · 164 ♡

rubert-base-cased-sentiment-rusentiment

rubert-base-cased-sentiment-rusentiment is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

302,698 ↓ · 15 ♡

phobert-base-vietnamese-sentiment

phobert-base-vietnamese-sentiment is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

302,680 ↓ · 16 ♡

multilingual-sentiment-analysis

multilingual-sentiment-analysis classifies text into predefined label categories using a DistilBERT encoder fine-tuned with a classification head. It outputs per-class logits.

300,736 ↓ · 373 ♡

FinBERT-PT-BR

FinBERT-PT-BR is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

295,424 ↓ · 29 ♡

roberta_toxicity_classifier

roberta_toxicity_classifier classifies text into predefined label categories using a RoBERTa encoder fine-tuned with a classification head. It outputs per-class logits.

287,679 ↓ · 72 ♡

twitter-xlm-roberta-base-sentiment-multilingual

twitter-xlm-roberta-base-sentiment-multilingual is an open-source text-classification model available on HuggingFace. Details are sourced from the public model registry.

286,538 ↓ · 31 ♡

deberta-xlarge-mnli

DeBERTa-XLarge fine-tuned on the Multi-Genre Natural Language Inference (MNLI) dataset for zero-shot text classification via entailment. DeBERTa's disentangled attention mechanism improves on BERT for NLI tasks. The xlarge variant (900M parameters) provides strong NLI accuracy but is expensive for high-throughput use.

261,138 ↓ · 23 ♡