Skip to content

Improve rendering: ink-bounds, draw batching, text cache#30

Merged
jserv merged 1 commit intomainfrom
rendering
Mar 8, 2026
Merged

Improve rendering: ink-bounds, draw batching, text cache#30
jserv merged 1 commit intomainfrom
rendering

Conversation

@jserv
Copy link
Contributor

@jserv jserv commented Mar 8, 2026

This introduces rendering optimizations that reduce backend overhead for partial-redraw and text-heavy UI scenes:

  1. iui_emit_box: universal draw interception point replacing all direct ctx->renderer.draw_box() calls. Route through ink-bounds tracking and optional batch system.
  2. Ink-bounds tracking: per-frame union bounding box of all draw calls (boxes, text, lines, circles, arcs). Backend can query iui_ink_bounds_get() to blit only the dirty region instead of full framebuffer. Uses FLT_MAX/-FLT_MAX initialization to eliminate first-extend branch on Arm Cortex-M pipeline. Negative width/height inputs are normalized before accumulation.
  3. Draw call batching — 256-command buffer with clip-rect grouping on flush. All primitive types (rect, text, line, circle, arc) are routed through the batch when enabled, preserving draw order. Vector font text correctly bypasses the text batch (renders through iui_emit_box which is already batch-aware).
  4. Text width cache — 64-entry hash table with linear probing and LFU eviction. Eliminates 99% of backend text_width callbacks. Cache key includes font_height to prevent cross-size poisoning when typography helpers temporarily mutate font metrics. Amortized O(k) decay per frame avoids O(N) scan. IUI_TEXT_CACHE_SIZE enforced as power-of-two via _Static_assert.

Benchmark (noop backend, 480x800, 5000 frames, median-of-3):

  • Settings scene: 58 draws/frame, 1.9 us baseline
  • Dashboard scene: 98 draws/frame, 2.4 us baseline
  • Form scene: 66 draws/frame, 0.8 us baseline
  • Text cache: 99% text_width elimination across all scenes
  • Ink-bounds coverage: 100-119% of screen area (tight)

The ink-bounds overhead (47-72%) measured with noop backend is expected: zero-cost draw calls make tracking arithmetic dominate. With a real GPU/framebuffer backend, the dirty-rect savings from partial redraws offset this cost significantly.


Summary by cubic

Adds ink-bounds tracking, draw batching, and a text-width cache to cut redraw work and reduce backend text measurement calls. Routes all box draws through iui_emit_box and adds a rendering benchmark.

  • New Features

    • iui_emit_box intercepts box draws and flows through ink-bounds and optional batching.
    • Ink-bounds tracking (per-frame union of all draws) with iui_ink_bounds_get; reset in iui_begin_frame.
    • Draw batching (256-command buffer) with clip-rect grouping; preserves order across primitives.
    • Text width cache (64-entry LFU) keyed by string and font height; enforced power-of-two via _Static_assert.
    • Added tests/bench-render.c to profile Settings, Dashboard, and Form scenes.
  • Migration

    • Enable via iui_batch_enable, iui_dirty_enable, iui_ink_bounds_enable, and iui_text_cache_enable.
    • Backends: call iui_ink_bounds_get each frame to blit only the returned rect; fall back to full-frame if invalid.
    • Custom widgets should use iui_emit_box instead of ctx->renderer.draw_box. IUI_TEXT_CACHE_SIZE must be a power of two.

Written for commit 01c9607. Summary will update on new commits.

cubic-dev-ai[bot]

This comment was marked as resolved.

This introduces rendering optimizations that reduce backend overhead for
partial-redraw and text-heavy UI scenes:
1. iui_emit_box: universal draw interception point replacing all direct
   ctx->renderer.draw_box() calls. Route through ink-bounds tracking and
   optional batch system.
2. Ink-bounds tracking: per-frame union bounding box of all draw calls
   (boxes, text, lines, circles, arcs). Backend can query
   iui_ink_bounds_get() to blit only the dirty region instead of full
   framebuffer. Uses FLT_MAX/-FLT_MAX initialization to eliminate
   first-extend branch on Arm Cortex-M pipeline. Negative width/height
   inputs are normalized before accumulation.
3. Draw call batching — 256-command buffer with clip-rect grouping on
   flush. All primitive types (rect, text, line, circle, arc) are routed
   through the batch when enabled, preserving draw order. Vector font
   text correctly bypasses the text batch (renders through
   iui_emit_box which is already batch-aware).
4. Text width cache — 64-entry hash table with linear probing and LFU
   eviction. Eliminates 99% of backend text_width callbacks. Cache key
   includes font_height to prevent cross-size poisoning when typography
   helpers temporarily mutate font metrics. Amortized O(k) decay per
   frame avoids O(N) scan. IUI_TEXT_CACHE_SIZE enforced as power-of-two
   via _Static_assert.

Benchmark (noop backend, 480x800, 5000 frames, median-of-3):
- Settings scene: 58 draws/frame, 1.9 us baseline
- Dashboard scene: 98 draws/frame, 2.4 us baseline
- Form scene: 66 draws/frame, 0.8 us baseline
- Text cache: 99% text_width elimination across all scenes
- Ink-bounds coverage: 100-119% of screen area (tight)

The ink-bounds overhead (47-72%) measured with noop backend is expected:
zero-cost draw calls make tracking arithmetic dominate. With a real
GPU/framebuffer backend, the dirty-rect savings from partial redraws
offset this cost significantly.
@jserv jserv merged commit eb8d9c8 into main Mar 8, 2026
12 checks passed
@jserv jserv deleted the rendering branch March 8, 2026 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant