Hi, I'm Daniel Lee . I build Agentic AI Systems and Multimodal AI pipelines — from autonomous LLM agents with complex tool-use to vision-language models that bridge perception and reasoning. As a full-stack AI engineer, I ship end-to-end products from model architecture to production-ready web services. Striving to be a Problem Definer who doesn't just solve tasks, but poses the next big challenges.
2025
- Top Excellence Award (1st Prize)
- Top Excellence Award (Institute for Information & Communication Technology Planning & Evaluation Director's Award)
- Silver Medal
- [HCLT 2025] Enhancing Multi-Hop Complex Query Retrieval Efficiency through the Integration of RAG and Graph RAG
Scene24 - AI Cinematic Ad Generator for SaaS
- Product: Generates 30-second cinematic launch ad videos from a product URL — built for solo SaaS founders and indie hackers underserved by Synthesia (avatar-heavy) and Runway (agency-priced). Direct-manipulation editor with Canva-grade snap / multi-select / alignment lets you fine-tune after generation.
- AI Architecture: Claude Agent SDK with brand / motion-director / critic sub-agents orchestrate Remotion for programmatic rendering. No avatars, no AI-generated UI — real product screenshots animated with cinematic motion patterns.
LecTranscribe - AI Lecture Transcription SaaS
- Live SaaS: Full-stack lecture transcription platform with credit-based billing (Lemon Squeezy) and a Chrome Extension on the Web Store for one-click LMS & YouTube transcription.
- AI Agent: LangGraph-based router dispatching to specialized nodes (note/exam/prompt/QA), Gemini 2.5 Flash streaming with automatic model fallback on rate limits.
IP-to-Portrait - High-Fidelity Face Synthesis Pipeline
- Advanced AI Pipeline: End-to-end face synthesis preserving identity, background, and lighting using SDXL Inpainting & IP-Adapter FaceID Plus v2.
- Multimodal Integration: Auto-prompting via Gemini 2.5 Flash VLM and precision masking with BiSeNet & InsightFace.
- Tech: Next.js, FastAPI, Celery, Redis, PyTorch, Diffusers, ONNX Runtime.
Research Focus
- Vision-Language Models (VLM) & VLA: Interested in multimodal understanding and reasoning
- Agentic Systems: Developing autonomous decision-making loops and agentic workflows.
- RAG & Graph RAG: Exploring advanced retrieval and knowledge graph integration for agents.
|
AI apps, demos & services |
Impl of Multimodal model |
Scheduling, Logic, Multicycle |




