- Overview
- Features
- Tech Stack
- Quick Start
- Architecture
- Project Structure
- API Documentation
- ML Pipeline
- Deployment
- Contributing
- License
- Acknowledgments
UtsukushiiAI (Japanese: 美しい - "Beautiful") is an open-source, local-first generative media platform designed to automate the creation of high-energy, beat-synced "Manga Music Videos" (MMVs).
Designed for transparency and privacy, UtsukushiiAI runs entirely on your local hardware. It utilizes Computer Vision (CV) to segment static manga pages and Generative AI to animate them—without ever requiring your sensitive creative assets to leave your machine.
- Local-First Processing: Keep your storage and compute on your own machine.
- Environment-Driven Configuration: Setup your own MongoDB URIs and third-party API keys (YouTube, OpenAI, etc.) strictly via
.envfiles. - Rhythm Intelligence: Precisely sync visual transitions to audio beats automatically.
- Privacy by Design: User data stays on their machine unless explicitly sent to third-party APIs.
- Efficiency: Reduce manual video editing time from 4 hours to < 2 minutes using local ML pipelines.
- Multi-modal drag-and-drop for Manga (PDF, PNG, JPG) and Music (MP3, WAV)
- Automatic manga page extraction from PDF files
- YouTube audio download via yt-dlp for "Vibe-Matching" music
- Interactive Bounding Box (BBox) adjustment for AI-detected panels
- Real-time preview of panel segmentation
- Layer management for foreground/background elements
- Visual waveform display with automated beat-marker detection
- Millisecond-accurate sync point placement
- Drag-and-drop transition effects
- YOLOv12 panel detection with Manga109 fine-tuning
- MiDaS depth estimation for 3D "Wiggle" parallax effects
- Stable Video Diffusion (SVD) for character animation
- FFmpeg/MoviePy video composition
- Multiple aspect ratios: 9:16 (Stories/Reels), 16:9 (YouTube), 1:1 (Instagram)
- Quality presets: Draft, Standard, High, Ultra
- Direct upload to social platforms
| Technology | Purpose |
|---|---|
| Next.js 15 | App Router, React Server Components |
| TypeScript | Type safety across full stack |
| Zustand | Global state management |
| Remotion | Frame-accurate video preview |
| Framer Motion | Cinematic UI animations |
| Tailwind CSS | Utility-first styling |
| Technology | Purpose |
|---|---|
| Express.js | REST API server |
| Socket.io | Real-time progress updates |
| JWT | Authentication |
| File System | Local storage management |
| Technology | Purpose |
|---|---|
| FastAPI | High-performance ML API |
| Python 3.11+ | ML runtime |
| PyTorch | Deep learning framework |
| YOLOv12 | Panel detection |
| SAM 2 | Instance segmentation |
| MiDaS | Depth estimation |
| Stable Video Diffusion | Character animation |
| Librosa | Audio analysis |
| FFmpeg | Video processing |
| Technology | Purpose |
|---|---|
| MongoDB | Project metadata, panel JSON |
| Redis | Render queue, session cache |
| Local Storage | Object storage (disk) |
| Docker | Containerization |
- Node.js 20.x+
- Python 3.11+
- Docker & Docker Compose
- MongoDB 6.x
- Redis 7.x
- Local Storage (Disk Space)
- Clone the repository
git clone https://github.com/utsukushii/utsukushii-ai.git
cd utsukushii-ai- Environment Configuration
# Copy environment files
cp apps/web/.env.example apps/web/.env.local
cp apps/api/.env.example apps/api/.env
cp apps/worker/.env.example apps/worker/.env
# Edit with your credentials
# ALL infrastructure must be configured here before starting!
# Required: MONGODB_URI, REDIS_URL, YOUTUBE_API_KEY, OPENAI_API_KEY, etc.- Start with Docker Compose
# Start all services
docker-compose up -d
# Or start individual services
docker-compose up -d mongodb redis- Install and Run Locally
# Frontend
cd apps/web
npm install
npm run dev
# Backend API
cd apps/api
npm install
npm run dev
# ML Worker
cd apps/worker
pip install -r requirements.txt
uvicorn main:app --reload- Access the Application
- Frontend: http://localhost:3000
- API: http://localhost:4000
- ML Worker: http://localhost:8000
UtsukushiiAI follows a Polyglot Microservices architecture:
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ http://localhost:3000 │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ API Gateway (Express.js) │
│ http://localhost:4000 │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Auth API │ │ Project API │ │ Render API │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
│ MongoDB │ │ Redis │ │ Local Storage │
│ (State) │ │ (Queue) │ │ (Storage) │
└─────────────┘ └─────────────┘ └─────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ML Worker (FastAPI + Python) │
│ http://localhost:8000 │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │Detector │ │Segmenter │ │DepthEst │ │AudioAnalyzer│ │
│ └──────────┘ └──────────┘ └──────────┘ └────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Video Composer (FFmpeg) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Event-Driven: Real-time progress via WebSockets
- CQRS: Separate read/write models for project data
- Repository Pattern: Data abstraction layer
- Factory Pattern: Component creation for different video formats
- Observer Pattern: Job status updates
utsukushii-ai/
├── apps/
│ ├── web/ # Next.js 15 Frontend
│ │ ├── src/
│ │ │ ├── app/ # App Router pages
│ │ │ ├── components/ # React components
│ │ │ ├── hooks/ # Custom hooks
│ │ │ ├── lib/ # Utilities
│ │ │ ├── stores/ # Zustand stores
│ │ │ └── types/ # TypeScript types
│ │ └── public/ # Static assets
│ │
│ ├── api/ # Express.js Backend
│ │ ├── src/
│ │ │ ├── controllers/ # Route handlers
│ │ │ ├── middleware/ # Express middleware
│ │ │ ├── models/ # Business logic
│ │ │ ├── routes/ # API routes
│ │ │ ├── services/ # Core services
│ │ │ └── utils/ # Utilities
│ │ └── tests/ # API tests
│ │
│ └── worker/ # FastAPI ML Worker
│ ├── src/
│ │ ├── models/ # ML models
│ │ ├── pipelines/ # Processing pipelines
│ │ ├── services/ # ML services
│ │ └── utils/ # Utilities
│ ├── models/ # Downloaded model weights
│ └── tests/ # ML tests
│
├── packages/
│ ├── shared/ # Shared types & utilities
│ │ ├── types/ # TypeScript types
│ │ └── utils/ # Shared utilities
│ │
│ ├── database/ # MongoDB connection & schemas
│ │
│ ├── cache/ # Redis client & utilities
│ │
│ └── storage/ # Local storage utilities
│
├── tools/
│ ├── scripts/ # Build & deployment scripts
│ └── configs/ # Configuration files
│
├── docs/ # Documentation
│ ├── architecture/
│ ├── api/
│ └── assets/
│
├── docker-compose.yml # Local development
├── docker-compose.prod.yml # Production deployment
├── turbo.json # Turborepo config
├── package.json # Root package.json
└── README.md # This file
| Endpoint | Method | Description |
|---|---|---|
/api/auth/register |
POST | Register new user |
/api/auth/login |
POST | Login user |
/api/auth/logout |
POST | Logout user |
/api/auth/refresh |
POST | Refresh JWT token |
/api/auth/me |
GET | Get current user |
| Endpoint | Method | Description |
|---|---|---|
/api/projects |
GET | List user projects |
/api/projects |
POST | Create new project |
/api/projects/:id |
GET | Get project details |
/api/projects/:id |
PUT | Update project |
/api/projects/:id |
DELETE | Delete project |
/api/projects/:id/panels |
GET | Get project panels |
/api/projects/:id/panels |
POST | Add panel to project |
/api/projects/:id/panels/:panelId |
PUT | Update panel |
/api/projects/:id/panels/:panelId |
DELETE | Delete panel |
| Endpoint |
| ----------------------------- | ------ | ----------------------- |
| /api/render/start | POST | Start render job |
| /api/render/:jobId | GET | Get render status |
| /api/render/:jobId | DELETE | Cancel render job |
| /api/render/:jobId/download | GET | Download rendered video |
| Endpoint | Method | Description |
|---|---|---|
/api/upload/direct |
POST | Get direct upload URL |
/api/upload/complete |
POST | Confirm upload complete |
| Event | Direction | Description |
|---|---|---|
render:progress |
Server→Client | Render progress update |
render:complete |
Server→Client | Render completed |
render:error |
Server→Client | Render error |
panel:detected |
Server→Client | Panel detection complete |
- Input: Manga image (PNG/JPG/PDF page)
- Output: Bounding box coordinates for each panel
- Model: Fine-tuned on Manga109 dataset
- Coordinates: Normalized (0.0 - 1.0) for scale invariance
- Input: Panel bounding boxes + original image
- Output: Binary masks for characters/foreground
- Model: SAM 2 (Segment Anything 2)
- Input: Original image
- Output: Depth map for parallax effect
- Effect: "Wiggle" 3D animation based on depth
- Input: Audio file (MP3/WAV)
- Output: BPM, onset timestamps, beat markers
- Algorithm: Harmonic-Percussive Source Separation (HPSS)
- Input: Segmented character images
- Output: Animated frames with subtle movement
- Model: Stable Video Diffusion
- Input: All layers + audio + beat markers
- Output: Final MP4 video (9:16, 30fps)
- Effects: Glow, glitch, transitions synced to beats
docker-compose up -dSee .env.example files in each app directory for required variables.
We welcome contributions! Please read our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Use TypeScript for all new code
- Follow ESLint and Prettier configurations
- Write unit tests for new features
- Document public APIs with JSDoc
This project is licensed under the MIT License - see the LICENSE file for details.
- Manga109 Dataset for training data
- Ultralytics for YOLOv12
- Meta AI for SAM 2
- Stability AI for Stable Video Diffusion
- The community for feedback and support
Made with ❤️ by the UtsukushiiAI Team