token classification models

20 models · ranked by HuggingFace downloads

indonesian-roberta-base-posp-tagger

indonesian-roberta-base-posp-tagger performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.

2,780,091 ↓ · 10 ♡

bert-base-NER

bert-base-NER performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.

1,687,260 ↓ · 718 ♡

stanford-deidentifier-base

stanford-deidentifier-base uses a BERT encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.

1,220,506 ↓ · 81 ♡

wikineural-multilingual-ner

wikineural-multilingual-ner performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.

823,591 ↓ · 165 ♡

A DistilBERT model fine-tuned on Twitter/X data for token classification tasks, likely part-of-speech tagging or named entity recognition on social media text. BERTweet-based initialization means it handles informal spelling, hashtags, and abbreviations better than standard BERT. The training split and label schema are not publicly documented.

749,718 ↓ · 0 ♡

fullstop-punctuation-multilang-large

fullstop-punctuation-multilang-large assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.

667,216 ↓ · 177 ♡

bert-large-cased-finetuned-conll03-english

bert-large-cased-finetuned-conll03-english assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.

660,324 ↓ · 96 ♡

punctuate-all

punctuate-all uses a RoBERTa encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.

525,030 ↓ · 28 ♡

xlm-roberta-large-ner-hrl

xlm-roberta-large-ner-hrl uses a RoBERTa encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.

524,801 ↓ · 15 ♡

layoutreader

layoutreader is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.

508,613 ↓ · 43 ♡

sat-3l-sm

SAT-3l-sm (Segment Any Text, 3-layer small) is a multilingual text segmentation model supporting over 85 languages, designed to split continuous text into meaningful sentence or paragraph segments. Unlike rule-based sentence tokenisers that rely on punctuation, SAT uses a contextual XLM-based token classifier to handle languages with unusual or absent punctuation conventions. The small variant trades some accuracy for faster inference.

459,079 ↓ · 12 ♡

deid_roberta_i2b2

deid_roberta_i2b2 is a RoBERTa model fine-tuned on the i2b2 de-identification dataset to detect and classify protected health information (PHI) in clinical notes. It identifies PHI spans such as names, dates, locations, and IDs. MIT-licensed for integration into clinical NLP de-identification pipelines.

431,180 ↓ · 39 ♡

llmlingua-2-xlm-roberta-large-meetingbank

llmlingua-2-xlm-roberta-large-meetingbank assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.

430,411 ↓ · 28 ♡

ner-english-fast

Flair's fast English NER model using the Flair framework's sequence labeling approach with character-level language model embeddings. 'Fast' indicates a smaller, speed-optimized variant compared to Flair's standard NER model. Recognizes standard NE classes (PER, ORG, LOC, MISC).

422,834 ↓ · 26 ♡

Search

token classification models

indonesian-roberta-base-posp-tagger

bert-base-NER

stanford-deidentifier-base

wikineural-multilingual-ner

finetuned-bertweet-poskd3

fullstop-punctuation-multilang-large

bert-large-cased-finetuned-conll03-english

punctuate-all

xlm-roberta-large-ner-hrl

layoutreader

sat-3l-sm

deid_roberta_i2b2

llmlingua-2-xlm-roberta-large-meetingbank

ner-english-fast

bert-base-NER-Russian

privacy-filter

roberta-large-ner-english

bert-portuguese-ner

bert-base-chinese-ws

bert-base-multilingual-cased-ner-hrl