🍌 Edit Banana

Image to DrawIO (XML) Converter

One-click conversion of static diagrams (flowcharts, architecture diagrams, technical schematics) into editable DrawIO (mxGraph) XML files.
Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

👆 Click above or `https://db121-img2xml.cn/` to try Edit Banana online! Upload an image, get editable DrawIO XML in seconds.

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

To intuitively demonstrate the high-fidelity conversion effect, the following provides a one-to-one comparison between 3 groups of "original static images" and "DrawIO editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario No.	Original Static Diagram (Input · Non-editable)	DrawIO Reconstruction Result (Output · Fully Editable)
Scenario 1: Basic Flowchart
Scenario 2: Multi-level Architecture Diagram
Scenario 3: Technical Schematic
Scenario 4: Scientific Formula Diagram

✨ Conversion Highlights:

Preserves the layout logic, color matching, and element hierarchy of the original diagram

1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)

Accurate text recognition, supporting direct subsequent editing and format adjustment

All elements are independently selectable, supporting native DrawIO template replacement and layout optimization

Key Features

Advanced Segmentation: Uses SAM 3 (Segment Anything Model 3) for state-of-the-art segmentation of diagram elements (shapes, arrows, icons).
Fixed 4-Round VLM Scanning: A structured, iterative extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V) ensuring no element is left behind:
1. Initial Generic Extraction: Captures standard shapes and icons.
2. Single Word Scan: VLM scans blank areas for single objects.
3. Two-Word Scan: Refines extraction for specific attributes.
4. Phrase Scan: Captures complex descriptions or grouped objects.
High-Quality OCR:
- Azure Document Intelligence for precise text localization.
- Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
- Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX ($\int f(x) dx$).
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
Smart Background Removal: Integrated RMBG-2.0 model to automatically remove backgrounds from icons, pictures, and arrows, ensuring they layer correctly in DrawIO.
Arrow Handling: Arrows are extracted as transparent images (rather than complex vector paths) to guarantee visual fidelity, handling dashed lines, curves, and complex routing without error.
Vector Shape Recovery: Standard shapes are converted to native DrawIO vector shapes with accurate fill and stroke colors.
- Supported Shapes: Rectangle, Rounded Rectangle, Diamond (Decision), Ellipse (Start/End), Cylinder (Database), Cloud, Hexagon, Triangle, Parallelogram, Actor, Title Bar, Text Bubble, Section Panel.
User System:
- Registration: New users receive 30 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.

Architecture Pipeline

Input: Image (PNG/JPG).
Segmentation (SAM3):
- Initial pass with standard prompts (rectangle, arrow, icon).
- Iterative loop: Analyze unrecognized regions -> Ask MLLM for visual prompts -> Re-run SAM3 mask decoder.
Element Processing:
- Vector Shapes: Color extraction (Fill/Stroke) + Geometry mapping.
- Image Elements (Icons/Arrows): Crop -> Padding -> Mask Filtering -> RMBG-2.0 Background Removal -> Base64 Encoding.
Text Extraction (Parallel):
- Azure OCR detects text bounding boxes.
- High-res crops of text regions are sent to Mistral/LLM.
- Latex conversion for formulas.
XML Generation:
- Merges spatial data from SAM3 and Text OCR.
- Applies Z-Index sorting (Layers).
- Generates .drawio.xml file.

Project Structure

sam3_workflow/
├── config/                 # Configuration files
├── flowchart_text/         # OCR & Text Extraction Module
│   ├── src/                # OCR Source Code (Azure, Mistral, Alignment)
│   └── main.py             # OCR Entry point
├── frontend/               # React Web Application
├── input/                  # [Manual] Input images directory
├── models/                 # [Manual] Model weights (RMBG, SAM3)
│   └── rmbg/               # [Manual] RMBG-2.0
├── output/                 # [Manual] Results directory
├── sam3/                   # SAM3 Model Library
├── scripts/                # Utility Scripts
│   └── merge_xml.py        # XML Merging & Orchestration
├── main.py                 # CLI Entry point (Modular Pipeline)
├── server_pa.py            # FastAPI Backend Server (Service-based)
└── requirements.txt        # Python dependencies

Installation & Setup

Follow these steps to set up the project locally.

1. Prerequisites

Python 3.10+
Node.js & npm (for the frontend)
CUDA-capable GPU (Highly recommended)

2. Clone Repository

git clone https://github.com/XiangjianYi/Image2DrawIO.git
cd Image2DrawIO

3. Initialize Directory Structure

After cloning, you must manually create the following resource directories (ignored by Git):

# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output

# Create model directories
mkdir -p models/rmbg

4. Download Model Weights

Download the required models and place them in the correct paths:

Model	Download	Target Path
RMBG-2.0	RMBG-2.0	`models/rmbg/model.onnx`
SAM 3	https://modelscope.cn/models/facebook/sam3	`models/sam3.pt` (or as configured)

Note: For SAM 3 (or the specific segmentation checkpoint used), place the .pt file in models/ and update config.yaml.

5. Install Dependencies

Backend:

pip install -r requirements.txt

Frontend:

cd frontend
npm install
cd ..

6. Configuration

Config File: Copy the example config.

cp config/config.yaml.example config/config.yaml

Environment Variables: Create a .env file in the root directory.

AZURE_ENDPOINT=your_azure_endpoint
AZURE_API_KEY=your_azure_key
# Add other keys as needed

Usage

1. Web Interface (Recommended)

Start the Backend:

python server_pa.py
# Server runs at http://localhost:8000

Start the Frontend:

cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173

Open your browser, upload an image, and view the result in the embedded DrawIO editor.

2. Command Line Interface (CLI)

To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory.

Configuration `config.yaml`

Customize the pipeline behavior in config/config.yaml:

sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
paths: Set input/output directories.
dominant_color: Fine-tune color extraction sensitivity.

📌 Development Roadmap

Feature Module	Status	Description
Core Conversion Pipeline	✅ Completed	Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection	⚠️ In Development	Automatically associate arrows with target shapes
DrawIO Template Adaptation	📍 Planned	Support custom template import
Batch Export Optimization	📍 Planned	Batch export to DrawIO files (.drawio)
Local LLM Adaptation	📍 Planned	Support local VLM deployment, independent of APIs

🤝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

Fork this repository
Create a feature branch (git checkout -b feature/xxx)
Commit your changes (git commit -m 'feat: add xxx')
Push to the branch (git push origin feature/xxx)
Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

Scan to join the Edit Banana community

💡 If the QR code has expired, please submit an Issue to request an updated one.

🤩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

Name/ID	Email
Chai Chengliang	ccl@bit.edu.cn
Zhang Chi	zc315@bit.edu.cn
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
Yang Haotian
An Minghao
Yu Mingjie

📄 License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).

🌟 Star History

🌟 If this project helps you, please star it to show your support!

(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
auth		auth
config		config
db		db
flowchart_text		flowchart_text
frontend		frontend
pipeline		pipeline
sam3		sam3
sam3_service		sam3_service
scripts		scripts
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
README_zh.md		README_zh.md
TECHNICAL_REPORT.md		TECHNICAL_REPORT.md
TECHNICAL_REPORT_zh.md		TECHNICAL_REPORT_zh.md
__init__.py		__init__.py
requirements.txt		requirements.txt
schemas.py		schemas.py
server.py		server.py
server_pa.py		server_pa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍌 Edit Banana

Image to DrawIO (XML) Converter

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

Key Features

Architecture Pipeline

Project Structure

Installation & Setup

1. Prerequisites

2. Clone Repository

3. Initialize Directory Structure

4. Download Model Weights

5. Install Dependencies

6. Configuration

Usage

1. Web Interface (Recommended)

2. Command Line Interface (CLI)

Configuration `config.yaml`

📌 Development Roadmap

🤝 Contribution Guidelines

💬 Join WeChat Group

🤩 Contributors

📄 License

🌟 Star History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

BIT-DataLab/Edit-Banana

Folders and files

Latest commit

History

Repository files navigation

🍌 Edit Banana

Image to DrawIO (XML) Converter

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

Key Features

Architecture Pipeline

Project Structure

Installation & Setup

1. Prerequisites

2. Clone Repository

3. Initialize Directory Structure

4. Download Model Weights

5. Install Dependencies

6. Configuration

Usage

1. Web Interface (Recommended)

2. Command Line Interface (CLI)

Configuration config.yaml

📌 Development Roadmap

🤝 Contribution Guidelines

💬 Join WeChat Group

🤩 Contributors

📄 License

🌟 Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Configuration `config.yaml`

Packages