EvolvingLMMs-Lab · anxiangsir · May 28, 2026 · May 28, 2026
diff --git a/README.md b/README.md
@@ -55,7 +55,7 @@ href="https://discord.gg/PmdGHMFNP">Discord</a></b>
 
 ### 🎬 Codec-Aligned Vision Encoders
 
-Forget uniform patchification. **OneVision-Encoder** and **OneVision-Encoder-Lang** are HEVC-style vision transformers that treat video like a codec stream — selecting only motion- and residual-rich patches and sampling dense frames sparsely instead of sparse frames densely. The result is dramatically longer temporal coverage under the same token budget, where prior ViT backbones simply run out of context.
+Beyond uniform patchification. **OneVision-Encoder** and **OneVision-Encoder-Lang** are HEVC-style vision transformers that add a codec-stream input mode alongside image and uniform-frame video — selecting only motion- and residual-rich patches and sampling dense frames sparsely instead of sparse frames densely. The result is dramatically longer temporal coverage under the same token budget, where prior ViT backbones simply run out of context.
 
 ### 🧊 One Model, Every Modality
 

diff --git a/docs/page/assets/codec-vs-frame.js b/docs/page/assets/codec-vs-frame.js
@@ -538,7 +538,7 @@
         x: L.sidePad, y: 58, 'class': 'cvf-title'
       }, 'Same token budget. Codec sampling unlocks low-frame regimes.'));
       var subText = el('text', { x: L.sidePad, y: 80, 'class': 'cvf-panel-sub' });
-      subText.innerHTML = 'Across four temporal grounding benchmarks, codec-stream input matches or exceeds uniform frame sampling, with the largest gains at <tspan font-weight="700" fill="#0d9488">low frame budgets</tspan> where uniform sampling starves the model. <tspan fill="#94a3b8">Hover any point. Click legend to toggle a series.</tspan>';
+      subText.innerHTML = 'Across four temporal grounding benchmarks, codec-stream input matches or exceeds uniform frame sampling, with the largest gains at <tspan font-weight="700" fill="#0d9488">low frame budgets</tspan> where the frame budget is too tight to cover the temporal signal. <tspan fill="#94a3b8">Hover any point. Click legend to toggle a series.</tspan>';
       svg.appendChild(subText);
       svg.appendChild(el('line', {
         x1: L.sidePad, y1: 100, x2: W - L.sidePad, y2: 100, 'class': 'cvf-divider'

diff --git a/docs/page/projects/index.html b/docs/page/projects/index.html
@@ -56,7 +56,7 @@ <h3 class="timeline-title">LLaVA-OneVision-2</h3>
                     <span class="i18n" data-lang="en">Video MLLM with codec-aligned dense input</span><span class="i18n" data-lang="zh">基于 codec 的视频多模态大模型</span>
                   </p>
                   <p class="timeline-description">
-                    <span class="i18n" data-lang="en">An 8B-class video MLLM trained with a four-stage progressive pipeline that scales comprehension from 30-second clips to 15-minute footage. Replaces uniform frame sampling with codec-aligned dense input to preserve native temporal signal, and ships every dataset, training recipe, and checkpoint as a fully reproducible release.</span><span class="i18n" data-lang="zh">8B 级视频多模态大模型，通过四阶段渐进式训练把视频理解能力从 30 秒短片扩展到 15 分钟长视频；用基于 codec 的密集输入取代均匀抽帧，保留原生时序信号；数据、配方与权重全流程开源、完全可复现。</span>
+                    <span class="i18n" data-lang="en">An 8B-class video MLLM trained with a four-stage progressive pipeline that scales comprehension from 30-second clips to 15-minute footage. Adds codec-aligned dense input as a new video input mode alongside image and uniform frame sampling, preserving native temporal signal; ships every dataset, training recipe, and checkpoint as a fully reproducible release.</span><span class="i18n" data-lang="zh">8B 级视频多模态大模型，通过四阶段渐进式训练把视频理解能力从 30 秒短片扩展到 15 分钟长视频；在图像输入和视频均匀抽帧之外，新增基于 codec 的密集输入模式，保留原生时序信号；数据、配方与权重全流程开源、完全可复现。</span>
                   </p>
                   <div class="timeline-meta">
                     <span class="timeline-pill timeline-pill-latest">Latest</span>