Skip to content
#

qwen2-5-vl

Here are 44 public repositories matching this topic...

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.

  • Updated Nov 24, 2025
  • Python
Qwen-3VL-Multimodal-Understanding

Qwen3-VL-4B-Instruct model from Alibaba's Qwen series for multimodal tasks involving images and text. It enables users to upload an image and perform various vision-language tasks, such as querying details, generating captions, detecting points of interest.

  • Updated Nov 18, 2025
  • Python

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen team. It leverages specialized LoRA adapters for efficient, low-step inference (as few as 4 steps).

  • Updated Nov 24, 2025
  • Python

Multimodal-OCR3 is an advanced Optical Character Recognition (OCR) application that leverages multiple state-of-the-art multimodal models to extract text from images.

  • Updated Nov 11, 2025
  • Python

A comprehensive multimodal OCR application that supports both image and video document processing using state-of-the-art vision-language models. This application provides an intuitive Gradio interface for extracting text, converting documents to markdown, and performing advanced document analysis.

  • Updated Dec 2, 2025
  • Python

Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.

  • Updated Nov 26, 2025
  • Python

A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Click on the search bar"), and receive parsed actions with visualized indicators overlaid on the image.

  • Updated Dec 8, 2025
  • Python

Improve this page

Add a description, image, and links to the qwen2-5-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-5-vl topic, visit your repo's landing page and select "manage topics."

Learn more