videomae-small-finetuned-kinetics-xd-violence-binary
A VideoMAE-Small model fine-tuned on XD-Violence, a multi-scene violence detection dataset covering realistic violent video clips from films and surveillance footage. The model performs binary video classification (violent/non-violent) using temporal self-supervised pre-training. VideoMAE's masked autoencoder approach requires fewer labelled examples than supervised-only baselines for video tasks.