clip-vit-large-patch14-336
- Pipeline
- zero shot image classification
- Downloads
- 14,075,831
- Likes
- 304
OpenAI CLIP ViT-L/14 at 336×336px input resolution, a higher-resolution variant of the standard ViT-L/14 CLIP model. The larger input patch size reduces information loss during tokenization, improving performance on classification tasks requiring fine-grained visual detail. Otherwise shares the same contrastive training on 400M image-text pairs as the base ViT-L/14.