Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,8 @@
"tutorials/video/wan/fun-control",
"tutorials/video/wan/fun-camera",
"tutorials/video/wan/fun-inp",
"tutorials/video/wan/wan-flf"
"tutorials/video/wan/wan-flf",
"tutorials/video/wan/wan-infinitetalk"
]
}
]
Expand Down Expand Up @@ -2449,7 +2450,8 @@
"zh/tutorials/video/wan/fun-control",
"zh/tutorials/video/wan/fun-camera",
"zh/tutorials/video/wan/fun-inp",
"zh/tutorials/video/wan/wan-flf"
"zh/tutorials/video/wan/wan-flf",
"zh/tutorials/video/wan/wan-infinitetalk"
]
}
]
Expand Down Expand Up @@ -4637,7 +4639,8 @@
"ja/tutorials/video/wan/fun-control",
"ja/tutorials/video/wan/fun-camera",
"ja/tutorials/video/wan/fun-inp",
"ja/tutorials/video/wan/wan-flf"
"ja/tutorials/video/wan/wan-flf",
"ja/tutorials/video/wan/wan-infinitetalk"
]
}
]
Expand Down Expand Up @@ -6854,4 +6857,4 @@
"destination": "/zh/:slug*"
}
]
}
}
122 changes: 122 additions & 0 deletions ja/tutorials/video/wan/wan-infinitetalk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: "ComfyUI Wan2.1 InfiniteTalk ワークフロー例"
description: "InfiniteTalk は Wan2.1 をベースとした音声駆動型フルボディ動画ダビングモデルです。入力音声に合わせてキャラクターのリップシンクと身体動作を自動的に同期します。"
sidebarTitle: "InfiniteTalk"
---

import UpdateReminder from '/snippets/ja/tutorials/update-reminder.mdx'

**Wan2.1 InfiniteTalk** は、Comfy Org と Wan コミュニティの協力により開発された、Wan2.1 ベースのオープンソース音声駆動型動画生成モデルです。1枚の参照画像と音声入力から、全身が話す動画を生成できます——キャラクターの口の動きや身体動作が、入力された音声に自動的に同期します。

**主な特徴**:
- **音声駆動リップシンク** — 入力音声に合わせた自然な口の動きを生成
- **フルボディモーション** — 元のアイデンティティ、背景、カメラの動きを保持しながら同期した身体動作を追加
- **デュアルモード** — 単一キャラクターと複数キャラクターの両方に対応
- **ComfyUI ネイティブ** — `WanInfiniteTalkToVideo` ノードが組み込まれており、カスタムノードは不要

**関連リンク**:
- [Wan2.1 コードリポジトリ](https://github.com/Wan-Video/Wan2.1)
- [Wan2.1 モデルリポジトリ](https://huggingface.co/Wan-AI)

<Card title="Subgraph について" icon="book-open" href="/ja/interface/features/subgraph">
このワークフローは Subgraph ノードを使用してモジュール処理を行っています。Subgraph のドキュメントを確認して、ワークフローのカスタマイズと拡張方法を学んでください。
</Card>

## InfiniteTalk 画像から動画へのワークフロー

<CardGroup cols={2}>
<Card title="Comfy Cloud で実行" icon="cloud" href="https://cloud.comfy.org/?template=video_wan2_1_infinitetalk&utm_source=docs&utm_medium=referral&utm_campaign=wan2-1-infinitetalk">
Comfy Cloud で開く
</Card>
<Card title="ワークフローをダウンロード" icon="download" href="https://github.com/Comfy-Org/workflow_templates/blob/main/templates/video_wan2_1_infinitetalk.json">
JSON をダウンロードするか、テンプレートライブラリで "InfiniteTalk" を検索
</Card>
</CardGroup>

![Wan2.1 InfiniteTalk ワークフロー](https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/templates/video_wan2_1_infinitetalk-1.webp)

<UpdateReminder />

## モデルのインストール

以下のモデルをダウンロードし、正しいディレクトリに配置してください。

**diffusion_models**

- [Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors](https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/resolve/main/I2V/Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors)

**text_encoders**

- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)

**model_patches**

- [wan2.1_infiniteTalk_single_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/model_patches/wan2.1_infiniteTalk_single_fp16.safetensors) — 単一キャラクター向け
- [wan2.1_infiniteTalk_multi_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/model_patches/wan2.1_infiniteTalk_multi_fp16.safetensors) — 複数キャラクター向け

**audio_encoders**

- [wav2vec2-chinese-base_fp16.safetensors](https://huggingface.co/Kijai/wav2vec2_safetensors/resolve/main/wav2vec2-chinese-base_fp16.safetensors)

**vae**

- [Wan2_1_VAE_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Wan2_1_VAE_bf16.safetensors)

**loras**

- [lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors)

### モデルの保存場所

```
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors
│ ├── 📂 text_encoders/
│ │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│ ├── 📂 model_patches/
│ │ ├── wan2.1_infiniteTalk_single_fp16.safetensors
│ │ └── wan2.1_infiniteTalk_multi_fp16.safetensors
│ ├── 📂 audio_encoders/
│ │ └── wav2vec2-chinese-base_fp16.safetensors
│ ├── 📂 vae/
│ │ └── Wan2_1_VAE_bf16.safetensors
│ └── 📂 loras/
│ └── lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors
```

### サンプル入力ファイル

以下のサンプルファイルをダウンロードし、対応するノードにドラッグしてください。

<CardGroup cols={2}>
<Card title="サンプル画像をダウンロード" icon="image" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/two_character_talking.png">
キャラクター参照画像
</Card>
<Card title="スピーカー1 音声をダウンロード" icon="music" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/audio_speaker1_woman.mp3">
キャラクター1 用音声
</Card>
<Card title="スピーカー2 音声をダウンロード" icon="music" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/audio_speaker2_man.mp3">
キャラクター2 用音声
</Card>
</CardGroup>

## ワークフローの手順

1. **入力画像を読み込む** — キャラクター参照画像を `Load Image` ノードにドラッグします。複数キャラクターの場合は、Mask Editor を使用して各キャラクターのマスクを作成します。
2. **音声トラックを読み込む** — 音声ファイルを `Load Audio` ノードに接続します(キャラクターごとに1つ)。
3. **拡散モデルを読み込む** — `Load Diffusion Model` ノードが `Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors` を使用していることを確認します。
4. **モデルパッチを読み込む** — `ModelPatchLoader` ノードで適切な InfiniteTalk パッチ(単一または複数キャラクター用)を読み込みます。
5. **InfiniteTalk を設定する** — `WanInfiniteTalkToVideo` ノードのパラメータ(動画の長さ、動作の大きさなど)を調整します。
6. **生成する** — ワークフローを実行します。モデルは入力音声に同期した全身話し動画を生成します。

### 動画の長さを延長する

各 **Video Extend** グループは、動画を約 3.24 秒(25fps で 81 フレーム)延長します。音声が長い場合は、以下の手順で対応できます:

1. "Video Extend" グループを範囲選択します
2. `Ctrl-C`(コピー)、次に `Ctrl-Shift-V`(接続ごとペースト)を押します
3. `Batch Images` ノードの `IMAGE` 出力を新しい `WanInfiniteTalkToVideo` ノードの `previous_frames` に接続します
4. `Batch Images` ノードの `IMAGE` 出力を新しい `Batch Images` ノードの `images` に接続します
5. 前のグループの `WanInfiniteTalkToVideo` 出力と新しいグループの `VAEDecode` 入力の間の接続を調整します
122 changes: 122 additions & 0 deletions tutorials/video/wan/wan-infinitetalk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
---
title: "ComfyUI Wan2.1 InfiniteTalk Workflow Example"
description: "InfiniteTalk is an audio-driven full-body video dubbing model built on Wan2.1, enabling character lip-sync and body motion synchronization with any input audio."
sidebarTitle: "InfiniteTalk"
---

import UpdateReminder from '/snippets/tutorials/update-reminder.mdx'

**Wan2.1 InfiniteTalk** is an open-source audio-driven video generation model built on Wan2.1, developed by Comfy Org in partnership with the Wan community. It enables you to generate full-body talking videos from a single reference image and an audio input — the character's mouth movements and body motions are automatically synchronized to match the provided audio.

**Key Features**:
- **Audio-Driven Lip Sync** — Generate natural mouth movements that match the input audio
- **Full-Body Motion** — Preserves identity, background, and camera movement while adding synchronized body motion
- **Dual Mode** — Supports both single-character and multi-character scenarios
- **ComfyUI Native** — Built-in `WanInfiniteTalkToVideo` node, no custom nodes required

**Related Links**:
- [Wan2.1 Code Repository](https://github.com/Wan-Video/Wan2.1)
- [Wan2.1 Model Repository](https://huggingface.co/Wan-AI)

<Card title="Learn about Subgraph" icon="book-open" href="/interface/features/subgraph">
This workflow uses Subgraph nodes for modular processing. Check out the Subgraph documentation to learn how to customize and extend the workflow.
</Card>

## InfiniteTalk image-to-video workflow

<CardGroup cols={2}>
<Card title="Run in Comfy Cloud" icon="cloud" href="https://cloud.comfy.org/?template=video_wan2_1_infinitetalk&utm_source=docs&utm_medium=referral&utm_campaign=wan2-1-infinitetalk">
Open in Comfy Cloud
</Card>
<Card title="Download Workflow" icon="download" href="https://github.com/Comfy-Org/workflow_templates/blob/main/templates/video_wan2_1_infinitetalk.json">
Download JSON or search "InfiniteTalk" in Template Library
</Card>
</CardGroup>

![Wan2.1 InfiniteTalk Workflow Preview](https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/templates/video_wan2_1_infinitetalk-1.webp)

<UpdateReminder />

## Model Installation

The following models need to be downloaded and placed in the correct directories:

**diffusion_models**

- [Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors](https://huggingface.co/Kijai/WanVideo_comfy_fp8_scaled/resolve/main/I2V/Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors)

**text_encoders**

- [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)

**model_patches**

- [wan2.1_infiniteTalk_single_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/model_patches/wan2.1_infiniteTalk_single_fp16.safetensors) — For single-character scenarios
- [wan2.1_infiniteTalk_multi_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/model_patches/wan2.1_infiniteTalk_multi_fp16.safetensors) — For multi-character scenarios

**audio_encoders**

- [wav2vec2-chinese-base_fp16.safetensors](https://huggingface.co/Kijai/wav2vec2_safetensors/resolve/main/wav2vec2-chinese-base_fp16.safetensors)

**vae**

- [Wan2_1_VAE_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Wan2_1_VAE_bf16.safetensors)

**loras**

- [lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/resolve/main/Lightx2v/lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors)

### Model Storage Location

```
📂 ComfyUI/
├── 📂 models/
│ ├── 📂 diffusion_models/
│ │ └── Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors
│ ├── 📂 text_encoders/
│ │ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│ ├── 📂 model_patches/
│ │ ├── wan2.1_infiniteTalk_single_fp16.safetensors
│ │ └── wan2.1_infiniteTalk_multi_fp16.safetensors
│ ├── 📂 audio_encoders/
│ │ └── wav2vec2-chinese-base_fp16.safetensors
│ ├── 📂 vae/
│ │ └── Wan2_1_VAE_bf16.safetensors
│ └── 📂 loras/
│ └── lightx2v_I2V_14B_480p_cfg_step_distill_rank64_bf16.safetensors
```

### Sample Input Files

Download these sample input files and drag them into the corresponding nodes:

<CardGroup cols={2}>
<Card title="Download Sample Image" icon="image" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/two_character_talking.png">
Character reference image
</Card>
<Card title="Download Speaker 1 Audio" icon="music" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/audio_speaker1_woman.mp3">
Audio track for character 1
</Card>
<Card title="Download Speaker 2 Audio" icon="music" href="https://raw.githubusercontent.com/Comfy-Org/workflow_templates/main/input/audio_speaker2_man.mp3">
Audio track for character 2
</Card>
</CardGroup>

## Workflow Steps

1. **Load the input image** — Drag the character reference image to the `Load Image` node. For multi-character scenarios, use the Mask Editor to draw masks for each character.
2. **Load audio tracks** — Connect audio files to the `Load Audio` nodes (one per character).
3. **Load diffusion model** — Ensure the `Load Diffusion Model` node is using `Wan2_1-I2V-14B-480p_fp8_e4m3fn_scaled_KJ.safetensors`.
4. **Load model patches** — Load the appropriate InfiniteTalk patch (`single` or `multi` variant) via the `ModelPatchLoader` node.
5. **Configure InfiniteTalk** — Adjust the `WanInfiniteTalkToVideo` node parameters (video length, motion amount, etc.).
6. **Generate** — Run the workflow. The model will produce a full-body talking video synchronized with the input audio.

### Extending Video Length

Each **Video Extend** group extends the video by approximately 3.24 seconds (81 frames at 25 fps). If your audio is longer, you can:

1. Box-select the "Video Extend" group
2. Press `Ctrl-C` (copy), then `Ctrl-Shift-V` (paste with connections)
3. Connect the `Batch Images` node's `IMAGE` output to the new `WanInfiniteTalkToVideo` node's `previous_frames`
4. Connect the `Batch Images` node's `IMAGE` output to the new `Batch Images` node's `images`
5. Adjust the connection between the previous group's `WanInfiniteTalkToVideo` output and the new group's `VAEDecode` input
Loading
Loading