Skip to content

Quality regression: generated visualizations are subpar and broken #58

@GeneralJerel

Description

@GeneralJerel

Problem

The quality of generated visualizations from the deployed agent has degraded. The same prompt ("sphere icosahedron morph") produces broken or low-quality outputs across multiple attempts.

Evidence

Two downloaded outputs from the same prompt attached as .html files in .chalk/:

Attempt 1 — CSS 3D divs (sphere-icosahedron-morph (1).html)

  • Uses 20 <div> elements with CSS transform-style: preserve-3d and clip-path transitions
  • No WebGL / Three.js despite it being available via the import map
  • Hover interaction conflicts with the CSS spin animation (style.animation = 'none' vs CSS animation: spin 10s)
  • The morph is cosmetic — just toggling border-radius: 50%clip-path: polygon() on flat divs
  • Faces positioned via manual JS math that approximates 3D but doesn't actually render proper geometry

Attempt 2 — Canvas 2D fake 3D (sphere-icosahedron-morph.html)

  • Uses canvas.getContext('2d') instead of WebGL
  • Sphere is just a radial gradient circle, not actual geometry
  • Icosahedron faces are projected triangles drawn with ctx.moveTo/lineTo — no real lighting
  • Uses color-mix() CSS which has mixed browser support
  • The "morph" is interpolating between a gradient blob and wireframe triangles — visually unconvincing

What a good output would look like

  • Use Three.js (available in the import map at https://esm.sh/three)
  • Proper IcosahedronGeometry with MeshStandardMaterial or custom shaders
  • Real WebGL lighting and smooth vertex-level morphing between sphere and icosahedron
  • Orbit controls or smooth auto-rotation

Possible causes

  1. Model regression — GPT-5.4 (gpt-5.4-2026-03-05) may be producing lower-quality code for complex 3D visualizations compared to earlier versions
  2. System prompt lacks quality guidance — The current prompt mentions widgetRenderer capabilities but doesn't guide the model toward using Three.js/WebGL for 3D content, or set quality expectations for interactive visualizations
  3. No few-shot examples — The agent has no reference for what "good" output looks like, so it falls back to simpler CSS/Canvas approaches

Suggested investigation

  • Compare output quality between GPT-5.4 and other models (e.g. Claude) for the same prompts
  • Add quality guidance to the system prompt (e.g. "For 3D visualizations, use Three.js from the import map")
  • Consider adding few-shot examples of high-quality widget HTML in the agent skills
  • Test a broader set of prompts to determine if regression is model-wide or specific to 3D content

Environment

  • Model: gpt-5.4-2026-03-05 via langchain_openai.ChatOpenAI
  • Agent: LangGraph with CopilotKit middleware
  • Widget renderer: sandboxed iframe with import map (Three.js, GSAP, D3, Chart.js available)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions