Skip to content

Fix/521 image collection placeholders#917

Open
matthewYang08 wants to merge 4 commits into
Pipelex:devfrom
matthewYang08:fix/521-image-collection-placeholders
Open

Fix/521 image collection placeholders#917
matthewYang08 wants to merge 4 commits into
Pipelex:devfrom
matthewYang08:fix/521-image-collection-placeholders

Conversation

@matthewYang08
Copy link
Copy Markdown

@matthewYang08 matthewYang08 commented May 18, 2026

🔗 Related Issues

#521

📝 Description

🔄 Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📚 Documentation update
  • 🧹 Code refactor
  • ⚡ Performance improvement
  • ✅ Test update

🧪 Tests

📋 Checklist

  • Linting / type-checking / unit tests pass locally with my changes
  • I've added tests that prove my fix is effective or that my feature works
  • My code follows the style guidelines of this project
  • I've made corresponding changes to the documentation
  • My changes generate no new warnings

Summary by cubic

Hide field names for image-only collections in prompt rendering to reduce noise, while keeping inner keys for dict image groups and labels for single-image fields. Extend nested image discovery to handle dict containers. Addresses Linear 521.

  • Bug Fixes
    • Rendering: omit outer field names for image-only collections (list, tuple, dict, ListContent); keep dict item keys for context; keep field names when mixed with non-images.
    • Discovery: treat dict[...] as a generic container when searching for nested image fields.
    • Tests: added coverage for dicts, tuples, and mixed dicts; updated expected prompt output.

Written for commit d632696. Summary will update on new commits. Review in cubic

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 18, 2026

Greptile Summary

This PR updates image collection handling in structured prompt rendering. The main changes are:

  • Adds dict support to nested image field discovery.
  • Omits labels for image-only list, tuple, and dict rendering.
  • Adds tests for dict and tuple image placeholders.

Confidence Score: 3/5

These issues should be fixed before merging.

  • Dict key types can be mistaken for extractable prompt images.

  • Optional image types inside containers can be missed by nested image discovery.

  • Some image collection prompts can lose role or field context.

  • pipelex/core/stuffs/image_field_search.py

  • pipelex/core/stuffs/structured_content.py

Important Files Changed

Filename Overview
pipelex/core/stuffs/image_field_search.py Adds dict traversal to image field detection, but key types and nested unions need stricter handling.
pipelex/core/stuffs/structured_content.py Changes prompt rendering for image-only collections, with label-loss and ListContent consistency issues.

Comments Outside Diff (3)

  1. pipelex/core/stuffs/image_field_search.py, line 147-157 (link)

    P1 Dict keys misclassified

    This now scans every dict type argument, including the key type. A field like dict[ImageContent, str] gets reported as an image field even though the rendered/extracted prompt images can only come from values. That can make template analysis classify obj.map as an image reference, and direct extraction later asks working memory for an ImageContent where the actual attribute is a dict.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: pipelex/core/stuffs/image_field_search.py
    Line: 147-157
    
    Comment:
    **Dict keys misclassified**
    
    This now scans every `dict` type argument, including the key type. A field like `dict[ImageContent, str]` gets reported as an image field even though the rendered/extracted prompt images can only come from values. That can make template analysis classify `obj.map` as an image reference, and direct extraction later asks working memory for an `ImageContent` where the actual attribute is a dict.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. pipelex/core/stuffs/image_field_search.py, line 147-153 (link)

    P1 Nested optionals are skipped

    Top-level optional fields are unwrapped before this helper runs, but optional image types inside containers are not. For list[ImageContent | None] or dict[str, ImageContent | None], the container arg is a union, so this branch neither recurses nor checks ImageContent. Those valid image collections are missed, which can make | with_images validation reject a structure that does contain nested images.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: pipelex/core/stuffs/image_field_search.py
    Line: 147-153
    
    Comment:
    **Nested optionals are skipped**
    
    Top-level optional fields are unwrapped before this helper runs, but optional image types inside containers are not. For `list[ImageContent | None]` or `dict[str, ImageContent | None]`, the container arg is a union, so this branch neither recurses nor checks `ImageContent`. Those valid image collections are missed, which can make `| with_images` validation reject a structure that does contain nested images.
    
    How can I resolve this? If you propose a fix, please make it concise.
  3. pipelex/core/stuffs/structured_content.py, line 172-173 (link)

    P2 ListContent bypasses detection

    ListContent implements ImageRenderable, so this branch returns before _is_image_collection_value() can inspect its items. A StructuredContent field containing ListContent[ImageContent] still renders with the field label, while an equivalent plain list[ImageContent] now renders without it. This makes equivalent image collections produce different prompt text.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: pipelex/core/stuffs/structured_content.py
    Line: 172-173
    
    Comment:
    **ListContent bypasses detection**
    
    `ListContent` implements `ImageRenderable`, so this branch returns before `_is_image_collection_value()` can inspect its items. A `StructuredContent` field containing `ListContent[ImageContent]` still renders with the field label, while an equivalent plain `list[ImageContent]` now renders without it. This makes equivalent image collections produce different prompt text.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
pipelex/core/stuffs/image_field_search.py:147-157
**Dict keys misclassified**

This now scans every `dict` type argument, including the key type. A field like `dict[ImageContent, str]` gets reported as an image field even though the rendered/extracted prompt images can only come from values. That can make template analysis classify `obj.map` as an image reference, and direct extraction later asks working memory for an `ImageContent` where the actual attribute is a dict.

### Issue 2 of 4
pipelex/core/stuffs/image_field_search.py:147-153
**Nested optionals are skipped**

Top-level optional fields are unwrapped before this helper runs, but optional image types inside containers are not. For `list[ImageContent | None]` or `dict[str, ImageContent | None]`, the container arg is a union, so this branch neither recurses nor checks `ImageContent`. Those valid image collections are missed, which can make `| with_images` validation reject a structure that does contain nested images.

### Issue 3 of 4
pipelex/core/stuffs/structured_content.py:148-151
**Image labels are dropped**

For image-only collections this drops the outer field name entirely, and the dict branch below drops keys as well. A structure like `photos_by_role={"before": img1, "after": img2}` now renders only bare `[Image N]` tokens, so the prompt no longer tells the model which image is which. That loses semantic context for common image comparison and gallery prompts.

### Issue 4 of 4
pipelex/core/stuffs/structured_content.py:172-173
**ListContent bypasses detection**

`ListContent` implements `ImageRenderable`, so this branch returns before `_is_image_collection_value()` can inspect its items. A `StructuredContent` field containing `ListContent[ImageContent]` still renders with the field label, while an equivalent plain `list[ImageContent]` now renders without it. This makes equivalent image collections produce different prompt text.

Reviews (1): Last reviewed commit: "Merge branch 'dev' into fix/521-image-co..." | Re-trigger Greptile

@matthewYang08
Copy link
Copy Markdown
Author

Hi maintainers, this PR is ready for CI.
Could you please approve workflows so checks can run? Thanks.

Comment on lines +148 to +151
if self._is_image_collection_value(field_value):
parts.append(rendered)
else:
parts.append(f"{field_name}: {rendered}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Image labels are dropped

For image-only collections this drops the outer field name entirely, and the dict branch below drops keys as well. A structure like photos_by_role={"before": img1, "after": img2} now renders only bare [Image N] tokens, so the prompt no longer tells the model which image is which. That loses semantic context for common image comparison and gallery prompts.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pipelex/core/stuffs/structured_content.py
Line: 148-151

Comment:
**Image labels are dropped**

For image-only collections this drops the outer field name entirely, and the dict branch below drops keys as well. A structure like `photos_by_role={"before": img1, "after": img2}` now renders only bare `[Image N]` tokens, so the prompt no longer tells the model which image is which. That loses semantic context for common image comparison and gallery prompts.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

Re-trigger cubic

# Check if this container or its nested contents have images
# Check if it's a generic container type (list/tuple/dict).
# Example: list[ImageContent], tuple[ImageContent, ...], dict[str, ImageContent].
if hasattr(field_specific_type, "__origin__") and field_specific_type.__origin__ in {list, tuple, dict}: # type: ignore[union-attr]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that dict is included here, check_generic_container_for_images iterates dict.__args__ indiscriminately, both the key type and value type. So a field typed dict[ImageContent, str] would be reported as
an image field via the key type, and downstream template analysis would treat it as an image reference.

In practice this is rare (Pipelex dicts are essentially always dict[str, ...]), but since we're adding the dict branch, let's make it robust. Could you tighten check_generic_container_for_images so that when
container_type.__origin__ is dict, it only recurses into the value type, not the key?

and add a test

Comment on lines +148 to 150
# Check if arg_type is itself a generic container - recurse!
if hasattr(arg_type, "__origin__") and arg_type.__origin__ in {list, tuple, dict}: # type: ignore[union-attr]
if check_generic_container_for_images(arg_type):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level optionals are unwrapped before this helper runs, but optionals inside containers aren't. For list[ImageContent | None] or dict[str, ImageContent | None], the inner arg_type is a Union /
UnionType. Its __origin__ is typing.Union, which isn't in {list, tuple, dict}, and isinstance(arg_type, type) is False — so the loop falls through and the collection is missed. | with_images
validation would then reject a perfectly valid structure.

This is technically pre-existing (the same gap exists for list[ImageContent | None] before your change), so if you'd rather scope this PR tightly, we can land it separately. But it'd be a clean addition here:
when an arg_type is a Union/UnionType, iterate its non-None members and run the same checks on each.

@thomashebrard
Copy link
Copy Markdown
Member

thomashebrard commented May 25, 2026

Hey @matthewYang08 , thanks for your contribution and the fix, its great! but there is something missing:

Issue #521 specifically names ListContent[ImageContent] as the broken scenario, and your new _is_image_collection_value helper has a dedicated
ListContent branch — but no test exercises it. Could you add a test in tests/unit/pipelex/core/stuffs/test_structured_content_render_with_images.py covering a StructuredContent field of type

@thomashebrard
Copy link
Copy Markdown
Member

You can also merge dev inside your branch, it is behind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants