Skip to content

Edit Banana: A framework for converting statistical figures into editable formats.

Notifications You must be signed in to change notification settings

BIT-DataLab/Edit-Banana

Repository files navigation

Edit Banana Logo

🍌 Edit Banana

Image to DrawIO (XML) Converter

One-click conversion of static diagrams (flowcharts, architecture diagrams, technical schematics) into editable DrawIO (mxGraph) XML files.
Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Python License GitHub CUDA WeChat


Try Online Demo

πŸ‘† Click above or `https://db121-img2xml.cn/` to try Edit Banana online! Upload an image, get editable DrawIO XML in seconds.


πŸ“Έ Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

To intuitively demonstrate the high-fidelity conversion effect, the following provides a one-to-one comparison between 3 groups of "original static images" and "DrawIO editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario No. Original Static Diagram (Input Β· Non-editable) DrawIO Reconstruction Result (Output Β· Fully Editable)
Scenario 1: Basic Flowchart Original Diagram 1 Reconstruction Result 1
Scenario 2: Multi-level Architecture Diagram Original Diagram 2 Reconstruction Result 2
Scenario 3: Technical Schematic Original Diagram 3 Reconstruction Result 3
Scenario 4: Scientific Formula Diagram Original Diagram 4 Reconstruction Result 4

✨ Conversion Highlights:

  1. Preserves the layout logic, color matching, and element hierarchy of the original diagram
  2. 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)
  3. Accurate text recognition, supporting direct subsequent editing and format adjustment
  4. All elements are independently selectable, supporting native DrawIO template replacement and layout optimization

Key Features

  • Advanced Segmentation: Uses SAM 3 (Segment Anything Model 3) for state-of-the-art segmentation of diagram elements (shapes, arrows, icons).
  • Fixed 4-Round VLM Scanning: A structured, iterative extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V) ensuring no element is left behind:
    1. Initial Generic Extraction: Captures standard shapes and icons.
    2. Single Word Scan: VLM scans blank areas for single objects.
    3. Two-Word Scan: Refines extraction for specific attributes.
    4. Phrase Scan: Captures complex descriptions or grouped objects.
  • High-Quality OCR:
    • Azure Document Intelligence for precise text localization.
    • Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
    • Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX ($\int f(x) dx$).
    • Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
  • Smart Background Removal: Integrated RMBG-2.0 model to automatically remove backgrounds from icons, pictures, and arrows, ensuring they layer correctly in DrawIO.
  • Arrow Handling: Arrows are extracted as transparent images (rather than complex vector paths) to guarantee visual fidelity, handling dashed lines, curves, and complex routing without error.
  • Vector Shape Recovery: Standard shapes are converted to native DrawIO vector shapes with accurate fill and stroke colors.
    • Supported Shapes: Rectangle, Rounded Rectangle, Diamond (Decision), Ellipse (Start/End), Cylinder (Database), Cloud, Hexagon, Triangle, Parallelogram, Actor, Title Bar, Text Bubble, Section Panel.
  • User System:
    • Registration: New users receive 30 free credits.
    • Credit System: Pay-per-use model prevents resource abuse.
  • Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
  • Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.

Architecture Pipeline

  1. Input: Image (PNG/JPG).
  2. Segmentation (SAM3):
    • Initial pass with standard prompts (rectangle, arrow, icon).
    • Iterative loop: Analyze unrecognized regions -> Ask MLLM for visual prompts -> Re-run SAM3 mask decoder.
  3. Element Processing:
    • Vector Shapes: Color extraction (Fill/Stroke) + Geometry mapping.
    • Image Elements (Icons/Arrows): Crop -> Padding -> Mask Filtering -> RMBG-2.0 Background Removal -> Base64 Encoding.
  4. Text Extraction (Parallel):
    • Azure OCR detects text bounding boxes.
    • High-res crops of text regions are sent to Mistral/LLM.
    • Latex conversion for formulas.
  5. XML Generation:
    • Merges spatial data from SAM3 and Text OCR.
    • Applies Z-Index sorting (Layers).
    • Generates .drawio.xml file.

Project Structure

sam3_workflow/
β”œβ”€β”€ config/                 # Configuration files
β”œβ”€β”€ flowchart_text/         # OCR & Text Extraction Module
β”‚   β”œβ”€β”€ src/                # OCR Source Code (Azure, Mistral, Alignment)
β”‚   └── main.py             # OCR Entry point
β”œβ”€β”€ frontend/               # React Web Application
β”œβ”€β”€ input/                  # [Manual] Input images directory
β”œβ”€β”€ models/                 # [Manual] Model weights (RMBG, SAM3)
β”‚   └── rmbg/               # [Manual] RMBG-2.0
β”œβ”€β”€ output/                 # [Manual] Results directory
β”œβ”€β”€ sam3/                   # SAM3 Model Library
β”œβ”€β”€ scripts/                # Utility Scripts
β”‚   └── merge_xml.py        # XML Merging & Orchestration
β”œβ”€β”€ main.py                 # CLI Entry point (Modular Pipeline)
β”œβ”€β”€ server_pa.py            # FastAPI Backend Server (Service-based)
└── requirements.txt        # Python dependencies

Installation & Setup

Follow these steps to set up the project locally.

1. Prerequisites

  • Python 3.10+
  • Node.js & npm (for the frontend)
  • CUDA-capable GPU (Highly recommended)

2. Clone Repository

git clone https://github.com/XiangjianYi/Image2DrawIO.git
cd Image2DrawIO

3. Initialize Directory Structure

After cloning, you must manually create the following resource directories (ignored by Git):

# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output

# Create model directories
mkdir -p models/rmbg

4. Download Model Weights

Download the required models and place them in the correct paths:

Model Download Target Path
RMBG-2.0 RMBG-2.0 models/rmbg/model.onnx
SAM 3 https://modelscope.cn/models/facebook/sam3 models/sam3.pt (or as configured)

Note: For SAM 3 (or the specific segmentation checkpoint used), place the .pt file in models/ and update config.yaml.

5. Install Dependencies

Backend:

pip install -r requirements.txt

Frontend:

cd frontend
npm install
cd ..

6. Configuration

  1. Config File: Copy the example config.
    cp config/config.yaml.example config/config.yaml
  2. Environment Variables: Create a .env file in the root directory.
    AZURE_ENDPOINT=your_azure_endpoint
    AZURE_API_KEY=your_azure_key
    # Add other keys as needed

Usage

1. Web Interface (Recommended)

Start the Backend:

python server_pa.py
# Server runs at http://localhost:8000

Start the Frontend:

cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173

Open your browser, upload an image, and view the result in the embedded DrawIO editor.

2. Command Line Interface (CLI)

To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory.

Configuration config.yaml

Customize the pipeline behavior in config/config.yaml:

  • sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
  • paths: Set input/output directories.
  • dominant_color: Fine-tune color extraction sensitivity.

πŸ“Œ Development Roadmap

Feature Module Status Description
Core Conversion Pipeline βœ… Completed Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection ⚠️ In Development Automatically associate arrows with target shapes
DrawIO Template Adaptation πŸ“ Planned Support custom template import
Batch Export Optimization πŸ“ Planned Batch export to DrawIO files (.drawio)
Local LLM Adaptation πŸ“ Planned Support local VLM deployment, independent of APIs

🀝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/xxx)
  3. Commit your changes (git commit -m 'feat: add xxx')
  4. Push to the branch (git push origin feature/xxx)
  5. Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

πŸ’¬ Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

WeChat Group QR Code
Scan to join the Edit Banana community

πŸ’‘ If the QR code has expired, please submit an Issue to request an updated one.

🀩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

Name/ID Email
Chai Chengliang ccl@bit.edu.cn
Zhang Chi zc315@bit.edu.cn
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
Yang Haotian
An Minghao
Yu Mingjie

πŸ“„ License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).


🌟 Star History

🌟 If this project helps you, please star it to show your support!

Star History Chart(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)

About

Edit Banana: A framework for converting statistical figures into editable formats.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •