Qwen2.5-VL-7B-Instruct
- Pipeline
- image text to text
- Downloads
- 8,919,144
- Likes
- 1,518
Qwen2.5-VL-7B-Instruct is Alibaba Cloud's 7-billion-parameter vision-language model from the Qwen2.5-VL series, accepting image and video inputs alongside text for visual question answering, document understanding, and grounding tasks. It supports multiple image resolutions dynamically and shows improved OCR and document reasoning compared to the earlier Qwen-VL series. Apache 2.0 licensed.