-
Notifications
You must be signed in to change notification settings - Fork 122
Open
Description
Problem
The quality of generated visualizations from the deployed agent has degraded. The same prompt ("sphere icosahedron morph") produces broken or low-quality outputs across multiple attempts.
Evidence
Two downloaded outputs from the same prompt attached as .html files in .chalk/:
Attempt 1 — CSS 3D divs (sphere-icosahedron-morph (1).html)
- Uses 20
<div>elements with CSStransform-style: preserve-3dandclip-pathtransitions - No WebGL / Three.js despite it being available via the import map
- Hover interaction conflicts with the CSS spin animation (
style.animation = 'none'vs CSSanimation: spin 10s) - The morph is cosmetic — just toggling
border-radius: 50%↔clip-path: polygon()on flat divs - Faces positioned via manual JS math that approximates 3D but doesn't actually render proper geometry
Attempt 2 — Canvas 2D fake 3D (sphere-icosahedron-morph.html)
- Uses
canvas.getContext('2d')instead of WebGL - Sphere is just a radial gradient circle, not actual geometry
- Icosahedron faces are projected triangles drawn with
ctx.moveTo/lineTo— no real lighting - Uses
color-mix()CSS which has mixed browser support - The "morph" is interpolating between a gradient blob and wireframe triangles — visually unconvincing
What a good output would look like
- Use Three.js (available in the import map at
https://esm.sh/three) - Proper
IcosahedronGeometrywithMeshStandardMaterialor custom shaders - Real WebGL lighting and smooth vertex-level morphing between sphere and icosahedron
- Orbit controls or smooth auto-rotation
Possible causes
- Model regression — GPT-5.4 (
gpt-5.4-2026-03-05) may be producing lower-quality code for complex 3D visualizations compared to earlier versions - System prompt lacks quality guidance — The current prompt mentions
widgetRenderercapabilities but doesn't guide the model toward using Three.js/WebGL for 3D content, or set quality expectations for interactive visualizations - No few-shot examples — The agent has no reference for what "good" output looks like, so it falls back to simpler CSS/Canvas approaches
Suggested investigation
- Compare output quality between GPT-5.4 and other models (e.g. Claude) for the same prompts
- Add quality guidance to the system prompt (e.g. "For 3D visualizations, use Three.js from the import map")
- Consider adding few-shot examples of high-quality widget HTML in the agent skills
- Test a broader set of prompts to determine if regression is model-wide or specific to 3D content
Environment
- Model:
gpt-5.4-2026-03-05vialangchain_openai.ChatOpenAI - Agent: LangGraph with CopilotKit middleware
- Widget renderer: sandboxed iframe with import map (Three.js, GSAP, D3, Chart.js available)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels