Skip to content

Djanghao/widget2code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Widget2Code Logo

🎨 Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs

Widget2Code is a baseline framework that strengthens both perceptual understanding and system-level generation for transforming visual widgets into UI code. It leverages advanced vision-language models to automatically generate production-ready WidgetDSL from screenshots, featuring icon detection across 57,000+ icons, layout analysis, component recognition and generation. This repository provides the implementation and tools needed to generate high-fidelity widget code.

πŸ“‹ Table of Contents

πŸ”₯πŸ”₯πŸ”₯ News

  • πŸ“¦ Dec 22, 2025: Benchmark dataset uploaded to Hugging Face
  • πŸ“„ Dec 22, 2025: Paper uploaded to arXiv
  • πŸš€ Dec 16, 2025: We release the complete Widget2Code framework including inference code, interactive playground, batch processing scripts, and evaluation tools.

πŸŽ₯ Demo

playground.mp4

πŸ“– Overview

Widget2Code is a baseline framework that strengthens both perceptual understanding and system-level generation for transforming visual widgets into UI code.

πŸ—οΈ Architecture

Widget2Code employs a sophisticated multi-stage generation pipeline:

Generation Pipeline

  1. Image Preprocessing: Resolution normalization, format conversion, and quality analysis
  2. Layout Detection: Multi-stage layout analysis with intelligent retry mechanism for robust component positioning
  3. Icon Retrieval: FAISS-based similarity search across 57,000+ icons with dual-encoder (text + image) matching
  4. Chart Recognition: Specialized detection and classification for 8 chart types using vision models
  5. Color Extraction: Advanced palette and gradient analysis with perceptual color matching
  6. DSL Generation: LLM-based structured output generation with domain-specific prompts
  7. Validation: Schema validation, constraint checking, and error correction
  8. Compilation: DSL to React JSX/HTML transformation with optimization
  9. Rendering: Render from code to png in headless browser
Widget2Code Architecture

πŸ“œ System Requirements

Hardware Requirements

  • GPU: NVIDIA GPU with CUDA support (recommended for icon retrieval acceleration)
  • Memory: Minimum 8GB RAM, 16GB+ recommended for batch processing

Software Requirements

  • Operating System: Linux, macOS, or Windows (WSL2)
  • Node.js: 18.x or higher
  • Python: 3.10 or higher

πŸ› οΈ Dependencies and Installation

Quick Install

One-Command Setup:

./scripts/setup/install.sh

Installs all dependencies including Node.js packages and isolated Python environment.

βš™οΈ Configuration

Create .env file with API credentials and ground truth directory:

cp .env.example .env
# Edit .env and configure:
# - API credentials
# - GT_DIR: Path to ground truth directory for evaluation (e.g., ./data/widget2code-benchmark/test)

πŸš€ Quick Start

Step 1: Start API Service

# Start API backend (required for batch processing)
npm run api

Step 2: Generate Widgets (Batch)

# Batch generation with 5 concurrent workers
./scripts/generation/generate-batch.sh ./mockups ./output 5

# Force regenerate all images
./scripts/generation/generate-batch.sh ./mockups ./output 5 --force

Step 3: Render Widgets (Batch)

# Batch rendering with 5 concurrent workers
./scripts/rendering/render-batch.sh ./output 5

# Force rerender all widgets
./scripts/rendering/render-batch.sh ./output 5 --force

Step 4: Evaluate Results

# Evaluate generated widgets against ground truth
# If GT_DIR is set in .env, -g flag is optional
./scripts/evaluation/run_evaluation.sh ./output

# Or specify ground truth directory explicitly
./scripts/evaluation/run_evaluation.sh ./output -g ./data/widget2code-benchmark/test

# Use GPU and more workers for faster evaluation
./scripts/evaluation/run_evaluation.sh ./output -g ./data/widget2code-benchmark/test --cuda -w 16

Interactive Playground (Optional)

# Start interactive playground
npm run playground

πŸ“Š Benchmarks & Evaluation

Performance Comparison

Widget2Code achieves state-of-the-art performance across multiple quality dimensions including layout accuracy, legibility, style preservation, perceptual similarity, and geometric precision.

Benchmark Results

Evaluation Datasets

Widget2Code has been evaluated on 13 benchmark datasets:

  1. Seed1.6-Thinking
  2. Gemini2.5-Pro
  3. GPT-4o
  4. Qwen3-VL
  5. Qwen3-VL-235b
  6. Design2Code
  7. DCGen
  8. LatCoder
  9. UICopilot
  10. WebSight-VLM-8B
  11. ScreenCoder
  12. UI-UG
  13. Widget2Code

Download Benchmarks

Download the Widget2Code Benchmark Dataset to the ./data/ folder.

After downloading, set GT_DIR=./data/widget2code-benchmark/test in your .env file, or use the -g flag when running evaluation scripts. The test split (./data/widget2code-benchmark/test) should be used as ground truth for evaluation.

Benchmark Results: All Methods Results (465MB) - Download evaluation results across all 13 benchmark datasets and methods from Google Drive.

To use the benchmark results:

# Install gdown (if not already installed)
pip install gdown

# Download using gdown (465MB)
gdown --fuzzy "https://drive.google.com/file/d/1LAYReu4fUES1IE0qM7h-zNGvyUgYnqwz/view?usp=sharing"

# If download fails, manually download from the link above

# Extract to project root directory
unzip benchmarks_backup_20251216.zip

# Run evaluation on all benchmarks (using test split as ground truth)
./scripts/evaluation/run_all_benchmarks.sh -g ./data/widget2code-benchmark/test --cuda -w 16

πŸ“š Citation

If you find Widget2Code useful for your research or projects, please cite our work:

@article{widget2code2025,
  title={Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs},
  author={Houston H. Zhang, Tao Zhang, Baoze Lin, Yuanqi Xue, Yincheng Zhu, Huan Liu, Li Gu, Linfeng Ye, Ziqiang Wang, Xinxin Zuo, Yang Wang, Yuanhao Yu, Zhixiang Chi},
  journal={arXiv preprint},
  year={2025}
}

About

Official implementation of "Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs"

Resources

Stars

Watchers

Forks