One-click conversion of static diagrams (flowcharts, architecture diagrams, technical schematics) into editable DrawIO (mxGraph) XML files.
Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
π Click above or `https://db121-img2xml.cn/` to try Edit Banana online! Upload an image, get editable DrawIO XML in seconds.
To intuitively demonstrate the high-fidelity conversion effect, the following provides a one-to-one comparison between 3 groups of "original static images" and "DrawIO editable reconstruction results". All elements can be individually dragged, styled, and modified.
β¨ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)
- Accurate text recognition, supporting direct subsequent editing and format adjustment
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization
- Advanced Segmentation: Uses SAM 3 (Segment Anything Model 3) for state-of-the-art segmentation of diagram elements (shapes, arrows, icons).
-
Fixed 4-Round VLM Scanning: A structured, iterative extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V) ensuring no element is left behind:
- Initial Generic Extraction: Captures standard shapes and icons.
- Single Word Scan: VLM scans blank areas for single objects.
- Two-Word Scan: Refines extraction for specific attributes.
- Phrase Scan: Captures complex descriptions or grouped objects.
-
High-Quality OCR:
- Azure Document Intelligence for precise text localization.
- Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
-
Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX (
$\int f(x) dx$ ). - Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
- Smart Background Removal: Integrated RMBG-2.0 model to automatically remove backgrounds from icons, pictures, and arrows, ensuring they layer correctly in DrawIO.
- Arrow Handling: Arrows are extracted as transparent images (rather than complex vector paths) to guarantee visual fidelity, handling dashed lines, curves, and complex routing without error.
-
Vector Shape Recovery: Standard shapes are converted to native DrawIO vector shapes with accurate fill and stroke colors.
- Supported Shapes: Rectangle, Rounded Rectangle, Diamond (Decision), Ellipse (Start/End), Cylinder (Database), Cloud, Hexagon, Triangle, Parallelogram, Actor, Title Bar, Text Bubble, Section Panel.
-
User System:
- Registration: New users receive 30 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
- Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.
- Input: Image (PNG/JPG).
- Segmentation (SAM3):
- Initial pass with standard prompts (rectangle, arrow, icon).
- Iterative loop: Analyze unrecognized regions -> Ask MLLM for visual prompts -> Re-run SAM3 mask decoder.
- Element Processing:
- Vector Shapes: Color extraction (Fill/Stroke) + Geometry mapping.
- Image Elements (Icons/Arrows): Crop -> Padding -> Mask Filtering -> RMBG-2.0 Background Removal -> Base64 Encoding.
- Text Extraction (Parallel):
- Azure OCR detects text bounding boxes.
- High-res crops of text regions are sent to Mistral/LLM.
- Latex conversion for formulas.
- XML Generation:
- Merges spatial data from SAM3 and Text OCR.
- Applies Z-Index sorting (Layers).
- Generates
.drawio.xmlfile.
sam3_workflow/
βββ config/ # Configuration files
βββ flowchart_text/ # OCR & Text Extraction Module
β βββ src/ # OCR Source Code (Azure, Mistral, Alignment)
β βββ main.py # OCR Entry point
βββ frontend/ # React Web Application
βββ input/ # [Manual] Input images directory
βββ models/ # [Manual] Model weights (RMBG, SAM3)
β βββ rmbg/ # [Manual] RMBG-2.0
βββ output/ # [Manual] Results directory
βββ sam3/ # SAM3 Model Library
βββ scripts/ # Utility Scripts
β βββ merge_xml.py # XML Merging & Orchestration
βββ main.py # CLI Entry point (Modular Pipeline)
βββ server_pa.py # FastAPI Backend Server (Service-based)
βββ requirements.txt # Python dependencies
Follow these steps to set up the project locally.
- Python 3.10+
- Node.js & npm (for the frontend)
- CUDA-capable GPU (Highly recommended)
git clone https://github.com/XiangjianYi/Image2DrawIO.git
cd Image2DrawIOAfter cloning, you must manually create the following resource directories (ignored by Git):
# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output
# Create model directories
mkdir -p models/rmbgDownload the required models and place them in the correct paths:
| Model | Download | Target Path |
|---|---|---|
| RMBG-2.0 | RMBG-2.0 | models/rmbg/model.onnx |
| SAM 3 | https://modelscope.cn/models/facebook/sam3 | models/sam3.pt (or as configured) |
Note: For SAM 3 (or the specific segmentation checkpoint used), place the
.ptfile inmodels/and updateconfig.yaml.
Backend:
pip install -r requirements.txtFrontend:
cd frontend
npm install
cd ..- Config File: Copy the example config.
cp config/config.yaml.example config/config.yaml
- Environment Variables: Create a
.envfile in the root directory.AZURE_ENDPOINT=your_azure_endpoint AZURE_API_KEY=your_azure_key # Add other keys as needed
Start the Backend:
python server_pa.py
# Server runs at http://localhost:8000Start the Frontend:
cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173Open your browser, upload an image, and view the result in the embedded DrawIO editor.
To process a single image:
python main.py -i input/test_diagram.pngThe output XML will be saved in the output/ directory.
Customize the pipeline behavior in config/config.yaml:
- sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
- paths: Set input/output directories.
- dominant_color: Fine-tune color extraction sensitivity.
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | β Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | Automatically associate arrows with target shapes | |
| DrawIO Template Adaptation | π Planned | Support custom template import |
| Batch Export Optimization | π Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | π Planned | Support local VLM deployment, independent of APIs |
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
π‘ If the QR code has expired, please submit an Issue to request an updated one.
Thanks to all developers who have contributed to the project and promoted its iteration!
| Name/ID | |
|---|---|
| Chai Chengliang | ccl@bit.edu.cn |
| Zhang Chi | zc315@bit.edu.cn |
| Deng Qiyan | |
| Rao Sijing | |
| Yi Xiangjian | |
| Li Jianhui | |
| Shen Chaoyuan | |
| Zhang Junkai | |
| Han Junyi | |
| You Zirui | |
| Xu Haochen | |
| Yang Haotian | |
| An Minghao | |
| Yu Mingjie |
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
π If this project helps you, please star it to show your support!
(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)








