indonesian-roberta-base-posp-tagger
indonesian-roberta-base-posp-tagger performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.
20 models · ranked by HuggingFace downloads
indonesian-roberta-base-posp-tagger performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.
bert-base-NER performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.
stanford-deidentifier-base uses a BERT encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.
wikineural-multilingual-ner performs sequence labeling: each input token receives a class label aligned to its text position. Typical tasks include NER, chunking, and slot filling.
A DistilBERT model fine-tuned on Twitter/X data for token classification tasks, likely part-of-speech tagging or named entity recognition on social media text. BERTweet-based initialization means it handles informal spelling, hashtags, and abbreviations better than standard BERT. The training split and label schema are not publicly documented.
fullstop-punctuation-multilang-large assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.
bert-large-cased-finetuned-conll03-english assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.
punctuate-all uses a RoBERTa encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.
xlm-roberta-large-ner-hrl uses a RoBERTa encoder with a per-token classification head. The BIO tagging scheme is standard for its NER fine-tunes.
layoutreader is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.
SAT-3l-sm (Segment Any Text, 3-layer small) is a multilingual text segmentation model supporting over 85 languages, designed to split continuous text into meaningful sentence or paragraph segments. Unlike rule-based sentence tokenisers that rely on punctuation, SAT uses a contextual XLM-based token classifier to handle languages with unusual or absent punctuation conventions. The small variant trades some accuracy for faster inference.
deid_roberta_i2b2 is a RoBERTa model fine-tuned on the i2b2 de-identification dataset to detect and classify protected health information (PHI) in clinical notes. It identifies PHI spans such as names, dates, locations, and IDs. MIT-licensed for integration into clinical NLP de-identification pipelines.
llmlingua-2-xlm-roberta-large-meetingbank assigns labels to individual tokens in a sequence, directly applicable to named entity recognition, part-of-speech tagging, and span extraction.
Flair's fast English NER model using the Flair framework's sequence labeling approach with character-level language model embeddings. 'Fast' indicates a smaller, speed-optimized variant compared to Flair's standard NER model. Recognizes standard NE classes (PER, ORG, LOC, MISC).
A BERT-base model fine-tuned for named entity recognition on Russian text. Handles standard NER categories (persons, organizations, locations) on Russian-language inputs using the standard token-classification approach.
privacy-filter is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.
roberta-large-ner-english is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.
bert-portuguese-ner is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.
bert-base-chinese-ws is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.
bert-base-multilingual-cased-ner-hrl is an open-source token-classification model available on HuggingFace. Details are sourced from the public model registry.