Skip to content

Mananwebdev160408/UtsukushiiAI

Repository files navigation

UtsukushiiAI - Generative Manga-to-Shorts (MMV) Animation Platform

UtsukushiiAI Logo

Discord Twitter License Version

Table of Contents


Overview

UtsukushiiAI (Japanese: 美しい - "Beautiful") is an open-source, local-first generative media platform designed to automate the creation of high-energy, beat-synced "Manga Music Videos" (MMVs).

Designed for transparency and privacy, UtsukushiiAI runs entirely on your local hardware. It utilizes Computer Vision (CV) to segment static manga pages and Generative AI to animate them—without ever requiring your sensitive creative assets to leave your machine.

Core Objectives

  • Local-First Processing: Keep your storage and compute on your own machine.
  • Environment-Driven Configuration: Setup your own MongoDB URIs and third-party API keys (YouTube, OpenAI, etc.) strictly via .env files.
  • Rhythm Intelligence: Precisely sync visual transitions to audio beats automatically.
  • Privacy by Design: User data stays on their machine unless explicitly sent to third-party APIs.
  • Efficiency: Reduce manual video editing time from 4 hours to < 2 minutes using local ML pipelines.

Features

The Forge (Upload)

  • Multi-modal drag-and-drop for Manga (PDF, PNG, JPG) and Music (MP3, WAV)
  • Automatic manga page extraction from PDF files
  • YouTube audio download via yt-dlp for "Vibe-Matching" music

The Canvas Studio

  • Interactive Bounding Box (BBox) adjustment for AI-detected panels
  • Real-time preview of panel segmentation
  • Layer management for foreground/background elements

Rhythm Timeline

  • Visual waveform display with automated beat-marker detection
  • Millisecond-accurate sync point placement
  • Drag-and-drop transition effects

Render Engine

  • YOLOv12 panel detection with Manga109 fine-tuning
  • MiDaS depth estimation for 3D "Wiggle" parallax effects
  • Stable Video Diffusion (SVD) for character animation
  • FFmpeg/MoviePy video composition

Export Options

  • Multiple aspect ratios: 9:16 (Stories/Reels), 16:9 (YouTube), 1:1 (Instagram)
  • Quality presets: Draft, Standard, High, Ultra
  • Direct upload to social platforms

Tech Stack

Frontend

Technology Purpose
Next.js 15 App Router, React Server Components
TypeScript Type safety across full stack
Zustand Global state management
Remotion Frame-accurate video preview
Framer Motion Cinematic UI animations
Tailwind CSS Utility-first styling

Backend (Orchestration)

Technology Purpose
Express.js REST API server
Socket.io Real-time progress updates
JWT Authentication
File System Local storage management

Backend (ML Inference)

Technology Purpose
FastAPI High-performance ML API
Python 3.11+ ML runtime
PyTorch Deep learning framework
YOLOv12 Panel detection
SAM 2 Instance segmentation
MiDaS Depth estimation
Stable Video Diffusion Character animation
Librosa Audio analysis
FFmpeg Video processing

Data & Infrastructure

Technology Purpose
MongoDB Project metadata, panel JSON
Redis Render queue, session cache
Local Storage Object storage (disk)
Docker Containerization

Quick Start

Prerequisites

  • Node.js 20.x+
  • Python 3.11+
  • Docker & Docker Compose
  • MongoDB 6.x
  • Redis 7.x
  • Local Storage (Disk Space)

Local Development Setup

  1. Clone the repository
git clone https://github.com/utsukushii/utsukushii-ai.git
cd utsukushii-ai
  1. Environment Configuration
# Copy environment files
cp apps/web/.env.example apps/web/.env.local
cp apps/api/.env.example apps/api/.env
cp apps/worker/.env.example apps/worker/.env

# Edit with your credentials
# ALL infrastructure must be configured here before starting!
# Required: MONGODB_URI, REDIS_URL, YOUTUBE_API_KEY, OPENAI_API_KEY, etc.
  1. Start with Docker Compose
# Start all services
docker-compose up -d

# Or start individual services
docker-compose up -d mongodb redis
  1. Install and Run Locally
# Frontend
cd apps/web
npm install
npm run dev

# Backend API
cd apps/api
npm install
npm run dev

# ML Worker
cd apps/worker
pip install -r requirements.txt
uvicorn main:app --reload
  1. Access the Application

Architecture

UtsukushiiAI follows a Polyglot Microservices architecture:

┌─────────────────────────────────────────────────────────────┐
│                        Frontend (Next.js)                    │
│                  http://localhost:3000                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                   API Gateway (Express.js)                  │
│                  http://localhost:4000                       │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐ │
│  │  Auth API   │  │ Project API │  │   Render API        │ │
│  └─────────────┘  └─────────────┘  └─────────────────────┘ │
└─────────────────────────┬───────────────────────────────────┘
                          │
         ┌────────────────┼────────────────┐
         ▼                ▼                ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────────┐
│  MongoDB    │  │    Redis    │  │  Local Storage  │
│  (State)    │  │   (Queue)   │  │  (Storage)      │
└─────────────┘  └─────────────┘  └─────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│               ML Worker (FastAPI + Python)                 │
│                  http://localhost:8000                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐ │
│  │Detector  │  │Segmenter │  │DepthEst  │  │AudioAnalyzer│ │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘ │
│  ┌──────────────────────────────────────────────────────┐ │
│  │              Video Composer (FFmpeg)                 │ │
│  └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Design Patterns

  • Event-Driven: Real-time progress via WebSockets
  • CQRS: Separate read/write models for project data
  • Repository Pattern: Data abstraction layer
  • Factory Pattern: Component creation for different video formats
  • Observer Pattern: Job status updates

Project Structure

utsukushii-ai/
├── apps/
│   ├── web/                    # Next.js 15 Frontend
│   │   ├── src/
│   │   │   ├── app/           # App Router pages
│   │   │   ├── components/    # React components
│   │   │   ├── hooks/         # Custom hooks
│   │   │   ├── lib/           # Utilities
│   │   │   ├── stores/        # Zustand stores
│   │   │   └── types/         # TypeScript types
│   │   └── public/            # Static assets
│   │
│   ├── api/                   # Express.js Backend
│   │   ├── src/
│   │   │   ├── controllers/  # Route handlers
│   │   │   ├── middleware/   # Express middleware
│   │   │   ├── models/        # Business logic
│   │   │   ├── routes/        # API routes
│   │   │   ├── services/      # Core services
│   │   │   └── utils/         # Utilities
│   │   └── tests/             # API tests
│   │
│   └── worker/               # FastAPI ML Worker
│       ├── src/
│       │   ├── models/        # ML models
│       │   ├── pipelines/    # Processing pipelines
│       │   ├── services/      # ML services
│       │   └── utils/         # Utilities
│       ├── models/            # Downloaded model weights
│       └── tests/             # ML tests
│
├── packages/
│   ├── shared/               # Shared types & utilities
│   │   ├── types/           # TypeScript types
│   │   └── utils/           # Shared utilities
│   │
│   ├── database/            # MongoDB connection & schemas
│   │
│   ├── cache/               # Redis client & utilities
│   │
│   └── storage/             # Local storage utilities
│
├── tools/
│   ├── scripts/             # Build & deployment scripts
│   └── configs/            # Configuration files
│
├── docs/                    # Documentation
│   ├── architecture/
│   ├── api/
│   └── assets/
│
├── docker-compose.yml       # Local development
├── docker-compose.prod.yml  # Production deployment
├── turbo.json              # Turborepo config
├── package.json            # Root package.json
└── README.md               # This file

API Documentation

Authentication

Endpoint Method Description
/api/auth/register POST Register new user
/api/auth/login POST Login user
/api/auth/logout POST Logout user
/api/auth/refresh POST Refresh JWT token
/api/auth/me GET Get current user

Projects

Endpoint Method Description
/api/projects GET List user projects
/api/projects POST Create new project
/api/projects/:id GET Get project details
/api/projects/:id PUT Update project
/api/projects/:id DELETE Delete project
/api/projects/:id/panels GET Get project panels
/api/projects/:id/panels POST Add panel to project
/api/projects/:id/panels/:panelId PUT Update panel
/api/projects/:id/panels/:panelId DELETE Delete panel

| Method | Description Rendering

| Endpoint | | ----------------------------- | ------ | ----------------------- | | /api/render/start | POST | Start render job | | /api/render/:jobId | GET | Get render status | | /api/render/:jobId | DELETE | Cancel render job | | /api/render/:jobId/download | GET | Download rendered video |

Presigned URLs

Endpoint Method Description
/api/upload/direct POST Get direct upload URL
/api/upload/complete POST Confirm upload complete

WebSocket Events

Event Direction Description
render:progress Server→Client Render progress update
render:complete Server→Client Render completed
render:error Server→Client Render error
panel:detected Server→Client Panel detection complete

ML Pipeline

Stage 1: Panel Detection (YOLOv12)

  • Input: Manga image (PNG/JPG/PDF page)
  • Output: Bounding box coordinates for each panel
  • Model: Fine-tuned on Manga109 dataset
  • Coordinates: Normalized (0.0 - 1.0) for scale invariance

Stage 2: Instance Segmentation (SAM 2)

  • Input: Panel bounding boxes + original image
  • Output: Binary masks for characters/foreground
  • Model: SAM 2 (Segment Anything 2)

Stage 3: Depth Estimation (MiDaS)

  • Input: Original image
  • Output: Depth map for parallax effect
  • Effect: "Wiggle" 3D animation based on depth

Stage 4: Audio Analysis (Librosa)

  • Input: Audio file (MP3/WAV)
  • Output: BPM, onset timestamps, beat markers
  • Algorithm: Harmonic-Percussive Source Separation (HPSS)

Stage 5: Character Animation (SVD)

  • Input: Segmented character images
  • Output: Animated frames with subtle movement
  • Model: Stable Video Diffusion

Stage 6: Video Composition (FFmpeg)

  • Input: All layers + audio + beat markers
  • Output: Final MP4 video (9:16, 30fps)
  • Effects: Glow, glitch, transitions synced to beats

Deployment

Docker Compose

docker-compose up -d

Environment Variables

See .env.example files in each app directory for required variables.


Contributing

We welcome contributions! Please read our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Code Style

  • Use TypeScript for all new code
  • Follow ESLint and Prettier configurations
  • Write unit tests for new features
  • Document public APIs with JSDoc

License

This project is licensed under the MIT License - see the LICENSE file for details.


Acknowledgments


Made with ❤️ by the UtsukushiiAI Team

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors