clip-vit-large-patch14
- Pipeline
- zero shot image classification
- Downloads
- 25,187,308
- Likes
- 2,000
OpenAI's CLIP model using a ViT-L/14 image encoder, trained contrastively on 400 million image-text pairs from the internet. It aligns image and text in a shared embedding space, enabling zero-shot image classification by comparing image embeddings against text label embeddings. The ViT-L/14 variant offers higher accuracy than the smaller ViT-B/32 at greater compute cost.