Qwen3-VL-2B-Instruct
- Pipeline
- image text to text
- Downloads
- 186,904,434
- Likes
- 386
Qwen3-VL-2B-Instruct is a 2-billion-parameter vision-language model from Alibaba Cloud that jointly processes images and text for visual question answering, captioning, and document understanding. Its 2B scale positions it as one of the smaller instruction-tuned VLMs capable of zero-shot visual reasoning. Apache 2.0 licensed.