Conversation
|
Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text: I have read the DCO document and I hereby sign the DCO. You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot. |
Greptile SummaryThis PR introduces Elorian, a new multimodal VLM evaluation pipeline built on top of DataDesigner. It adds a full Key issues found:
|
| Filename | Overview |
|---|---|
| elorian/judges.py | LLM judge registry; contains two P1 issues: docstring example uses hyphenated alias and provider-prefixed model_id, and build_prompt uses str.format() which raises KeyError on custom templates with curly braces |
| elorian/pipeline.py | Core pipeline orchestrator; global monkey-patch on ImageContext applied at import time and unconditional os.environ mutation in _get_data_designer() are process-wide side effects; also minor describe() truncation bug |
| elorian/run_example.py | Working example script; commented-out judge block contains a hyphenated alias and a doubly-prefixed model_id that would fail at runtime if uncommented; also has an unused import |
| elorian/models.py | VLM model registry and provider management; clean implementation with clear separation between ModelSpec, ModelRegistry, and PROVIDERS dict |
| elorian/image_utils.py | Image loading utilities; module docstring claims URL loading support but no such function is implemented; otherwise straightforward PIL/HuggingFace loaders |
| elorian/init.py | Package init with clean __all__ exports; no issues |
| elorian/README.md | Comprehensive documentation for the Elorian pipeline; well-structured with good examples; consistently enforces the underscore alias rule |
Sequence Diagram
sequenceDiagram
participant User
participant Pipeline as MultimodalEvalPipeline
participant ImgUtils as image_utils
participant ModelReg as ModelRegistry
participant JudgeReg as JudgeRegistry
participant DD as DataDesigner
User->>ImgUtils: load_images_from_hf_dataset()
ImgUtils-->>User: seed_df (uuid, base64_image, label)
User->>Pipeline: __init__(seed_df, model_registry, judge_registry)
User->>Pipeline: preview(num_records)
Pipeline->>ModelReg: to_model_configs()
ModelReg-->>Pipeline: [ModelConfig(claude_vision), ModelConfig(gpt4o_vision)]
Pipeline->>JudgeReg: specs
JudgeReg-->>Pipeline: [JudgeSpec(judge_claude)]
Pipeline->>Pipeline: _build_config()
Note over Pipeline: Adds seed dataset<br/>Adds response_{alias} columns (LLMTextColumnConfig)<br/>Adds eval_{alias} columns (LLMJudgeColumnConfig)
Pipeline->>Pipeline: _get_data_designer()
Note over Pipeline: Sets DATA_DESIGNER_MODEL_BACKEND=litellm_bridge
Pipeline->>DD: validate(builder)
Pipeline->>DD: preview(builder, num_records)
DD-->>User: PreviewResults (response_claude_vision, response_gpt4o_vision, eval_judge_claude)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 9-13
Comment:
**Docstring example contradicts usage rules on two counts**
The docstring example at the top of the file violates the rules established elsewhere:
1. `alias="my-judge"` uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases **must use underscores**, because hyphens break Jinja2 template rendering (e.g., `{{ response_my-judge }}` is parsed as `{{ response_my }} - judge`). This example will silently produce broken column references at runtime.
2. `model_id="anthropic/claude-sonnet-4-20250514"` includes the provider prefix. However, `JudgeSpec.to_model_config()` already passes `provider=self.provider` to `dd.ModelConfig`, so including the prefix in `model_id` would result in a doubly-prefixed LiteLLM call (`anthropic/anthropic/claude-sonnet-4-20250514`), causing an API routing failure. The actual registered default (`judge_claude`) correctly uses `model_id="claude-sonnet-4-20250514"` without a prefix.
```suggestion
registry = JudgeRegistry()
registry.register(JudgeSpec(
alias="my_judge",
model_id="claude-sonnet-4-20250514",
scores=[...],
prompt_template="...",
))
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/judges.py
Line: 117-132
Comment:
**`str.format()` raises `KeyError` for custom templates containing curly braces**
`self.prompt_template.format(responses_block=responses_block)` will raise a `KeyError` for any custom `prompt_template` that contains `{...}` placeholders other than `{responses_block}` — for example a template that includes a JSON snippet like `{"key": "value"}` or a named slot like `{task_description}`. This is a common pattern in judge prompts.
The fix is to use a single `.replace()` call, which is immune to stray braces:
```python
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 51-52
Comment:
**Global monkey-patch applied at module import time**
`ImageContext._format_base64_context` is patched unconditionally the moment `elorian.pipeline` is imported. Any other code in the same process that imports `data_designer.config.models.ImageContext` — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside `_get_data_designer()`) or, better, contributing the fix upstream and removing the patch entirely once it lands.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 143-151
Comment:
**`os.environ` mutation is a permanent, process-wide side effect**
Setting `os.environ["DATA_DESIGNER_MODEL_BACKEND"]` inside `_get_data_designer()` permanently overrides the environment for the entire process — including any other code that creates a `DataDesigner` instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the `DataDesigner` constructor instead, or at least restoring the original value:
```python
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prev
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 192-194
Comment:
**`...` is always appended even when the prompt is shorter than 80 characters**
`self.prompt[:80] + "..."` unconditionally appends the ellipsis. For a short custom prompt (e.g., `"Describe this image."`), the output reads `Prompt: Describe this image....` which is misleading.
```suggestion
lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}")
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/image_utils.py
Line: 1-8
Comment:
**Module docstring lists URL loading as supported, but no such function exists**
The module docstring lists `- A list of image URLs` as a supported source, but there is no `load_images_from_urls()` function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
```suggestion
"""Utilities for loading and processing images into DataDesigner seed datasets.
Supports loading from:
- A local directory of image files
- A HuggingFace dataset with an image column
- A list of PIL images
"""
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 60-66
Comment:
**Commented example has a hyphenated alias and a doubly-prefixed `model_id`**
If a user uncomments this block as-is, two bugs will surface:
1. `alias="judge-gpt4o"` — the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference (`{{ eval_judge-gpt4o }}`).
2. `model_id="openai/gpt-4o"` — this includes the provider prefix. Because `JudgeSpec.to_model_config()` also sets `provider="openai"` (the default), LiteLLM will receive `openai/gpt-4o` with `provider=openai`, which may be interpreted as `openai/openai/gpt-4o` depending on routing, causing an API error.
```suggestion
# judge_registry.register(
# JudgeSpec(
# alias="judge_gpt4o",
# model_id="gpt-4o",
# provider="openai",
# description="GPT-4o as LLM judge",
# )
# )
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 13
Comment:
**Unused import**
`import data_designer.config as dd` is only referenced inside the commented-out block (for `dd.ModelProvider`). As long as that block remains commented out, `dd` is never used and this import is dead code.
```suggestion
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into vllm_example" | Re-trigger Greptile
| registry.register(JudgeSpec( | ||
| alias="my-judge", | ||
| model_id="anthropic/claude-sonnet-4-20250514", | ||
| scores=[...], | ||
| prompt_template="...", |
There was a problem hiding this comment.
Docstring example contradicts usage rules on two counts
The docstring example at the top of the file violates the rules established elsewhere:
-
alias="my-judge"uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases must use underscores, because hyphens break Jinja2 template rendering (e.g.,{{ response_my-judge }}is parsed as{{ response_my }} - judge). This example will silently produce broken column references at runtime. -
model_id="anthropic/claude-sonnet-4-20250514"includes the provider prefix. However,JudgeSpec.to_model_config()already passesprovider=self.providertodd.ModelConfig, so including the prefix inmodel_idwould result in a doubly-prefixed LiteLLM call (anthropic/anthropic/claude-sonnet-4-20250514), causing an API routing failure. The actual registered default (judge_claude) correctly usesmodel_id="claude-sonnet-4-20250514"without a prefix.
| registry.register(JudgeSpec( | |
| alias="my-judge", | |
| model_id="anthropic/claude-sonnet-4-20250514", | |
| scores=[...], | |
| prompt_template="...", | |
| registry = JudgeRegistry() | |
| registry.register(JudgeSpec( | |
| alias="my_judge", | |
| model_id="claude-sonnet-4-20250514", | |
| scores=[...], | |
| prompt_template="...", | |
| )) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 9-13
Comment:
**Docstring example contradicts usage rules on two counts**
The docstring example at the top of the file violates the rules established elsewhere:
1. `alias="my-judge"` uses a hyphen. The README and inline comments throughout the codebase explicitly state that aliases **must use underscores**, because hyphens break Jinja2 template rendering (e.g., `{{ response_my-judge }}` is parsed as `{{ response_my }} - judge`). This example will silently produce broken column references at runtime.
2. `model_id="anthropic/claude-sonnet-4-20250514"` includes the provider prefix. However, `JudgeSpec.to_model_config()` already passes `provider=self.provider` to `dd.ModelConfig`, so including the prefix in `model_id` would result in a doubly-prefixed LiteLLM call (`anthropic/anthropic/claude-sonnet-4-20250514`), causing an API routing failure. The actual registered default (`judge_claude`) correctly uses `model_id="claude-sonnet-4-20250514"` without a prefix.
```suggestion
registry = JudgeRegistry()
registry.register(JudgeSpec(
alias="my_judge",
model_id="claude-sonnet-4-20250514",
scores=[...],
prompt_template="...",
))
```
How can I resolve this? If you propose a fix, please make it concise.| def build_prompt(self, model_aliases: list[str]) -> str: | ||
| """Build the judge prompt with response column references. | ||
|
|
||
| Args: | ||
| model_aliases: List of model aliases whose response columns | ||
| will be referenced in the prompt via Jinja2 {{ }} syntax. | ||
|
|
||
| Returns: | ||
| Prompt string with Jinja2 column references. | ||
| """ | ||
| lines = [] | ||
| for alias in model_aliases: | ||
| col_name = f"response_{alias}" | ||
| lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}") | ||
| responses_block = "\n\n".join(lines) | ||
| return self.prompt_template.format(responses_block=responses_block) |
There was a problem hiding this comment.
str.format() raises KeyError for custom templates containing curly braces
self.prompt_template.format(responses_block=responses_block) will raise a KeyError for any custom prompt_template that contains {...} placeholders other than {responses_block} — for example a template that includes a JSON snippet like {"key": "value"} or a named slot like {task_description}. This is a common pattern in judge prompts.
The fix is to use a single .replace() call, which is immune to stray braces:
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/judges.py
Line: 117-132
Comment:
**`str.format()` raises `KeyError` for custom templates containing curly braces**
`self.prompt_template.format(responses_block=responses_block)` will raise a `KeyError` for any custom `prompt_template` that contains `{...}` placeholders other than `{responses_block}` — for example a template that includes a JSON snippet like `{"key": "value"}` or a named slot like `{task_description}`. This is a common pattern in judge prompts.
The fix is to use a single `.replace()` call, which is immune to stray braces:
```python
def build_prompt(self, model_aliases: list[str]) -> str:
lines = []
for alias in model_aliases:
col_name = f"response_{alias}"
lines.append(f"--- Response from {alias} ---\n{{{{ {col_name} }}}}")
responses_block = "\n\n".join(lines)
return self.prompt_template.replace("{responses_block}", responses_block)
```
How can I resolve this? If you propose a fix, please make it concise.| # Apply the patch once at import time. | ||
| ImageContext._format_base64_context = _patched_format_base64_context # type: ignore[assignment] |
There was a problem hiding this comment.
Global monkey-patch applied at module import time
ImageContext._format_base64_context is patched unconditionally the moment elorian.pipeline is imported. Any other code in the same process that imports data_designer.config.models.ImageContext — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside _get_data_designer()) or, better, contributing the fix upstream and removing the patch entirely once it lands.
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 51-52
Comment:
**Global monkey-patch applied at module import time**
`ImageContext._format_base64_context` is patched unconditionally the moment `elorian.pipeline` is imported. Any other code in the same process that imports `data_designer.config.models.ImageContext` — including upstream DataDesigner code and other tests — will silently receive the patched behavior. If the upstream bug is ever fixed, this patch will silently override the fix.
Consider making the patch opt-in (e.g., apply it only inside `_get_data_designer()`) or, better, contributing the fix upstream and removing the patch entirely once it lands.
How can I resolve this? If you propose a fix, please make it concise.| def _get_data_designer(self) -> DataDesigner: | ||
| """Create a DataDesigner interface instance with the required providers.""" | ||
| # Use LiteLLM bridge so provider_type is used as the routing prefix. | ||
| os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge" | ||
|
|
||
| return DataDesigner( | ||
| artifact_path=self.artifact_path, | ||
| model_providers=self._collect_providers(), | ||
| ) |
There was a problem hiding this comment.
os.environ mutation is a permanent, process-wide side effect
Setting os.environ["DATA_DESIGNER_MODEL_BACKEND"] inside _get_data_designer() permanently overrides the environment for the entire process — including any other code that creates a DataDesigner instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the DataDesigner constructor instead, or at least restoring the original value:
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prevPrompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 143-151
Comment:
**`os.environ` mutation is a permanent, process-wide side effect**
Setting `os.environ["DATA_DESIGNER_MODEL_BACKEND"]` inside `_get_data_designer()` permanently overrides the environment for the entire process — including any other code that creates a `DataDesigner` instance later, and any subprocess spawned after the call. This can cause hard-to-debug behaviour when the pipeline is used as a library.
Consider passing the backend configuration through the `DataDesigner` constructor instead, or at least restoring the original value:
```python
_prev = os.environ.get("DATA_DESIGNER_MODEL_BACKEND")
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = "litellm_bridge"
try:
return DataDesigner(...)
finally:
if _prev is None:
os.environ.pop("DATA_DESIGNER_MODEL_BACKEND", None)
else:
os.environ["DATA_DESIGNER_MODEL_BACKEND"] = _prev
```
How can I resolve this? If you propose a fix, please make it concise.| lines = ["Multimodal Eval Pipeline Configuration", "=" * 40] | ||
| lines.append(f"\nImage column: {self.image_column}") | ||
| lines.append(f"Prompt: {self.prompt[:80]}...") |
There was a problem hiding this comment.
... is always appended even when the prompt is shorter than 80 characters
self.prompt[:80] + "..." unconditionally appends the ellipsis. For a short custom prompt (e.g., "Describe this image."), the output reads Prompt: Describe this image.... which is misleading.
| lines = ["Multimodal Eval Pipeline Configuration", "=" * 40] | |
| lines.append(f"\nImage column: {self.image_column}") | |
| lines.append(f"Prompt: {self.prompt[:80]}...") | |
| lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/pipeline.py
Line: 192-194
Comment:
**`...` is always appended even when the prompt is shorter than 80 characters**
`self.prompt[:80] + "..."` unconditionally appends the ellipsis. For a short custom prompt (e.g., `"Describe this image."`), the output reads `Prompt: Describe this image....` which is misleading.
```suggestion
lines.append(f"Prompt: {self.prompt[:80]}{'...' if len(self.prompt) > 80 else ''}")
```
How can I resolve this? If you propose a fix, please make it concise.| """Utilities for loading and processing images into DataDesigner seed datasets. | ||
|
|
||
| Supports loading from: | ||
| - A local directory of image files | ||
| - A HuggingFace dataset with an image column | ||
| - A list of PIL images | ||
| - A list of image URLs | ||
| """ |
There was a problem hiding this comment.
Module docstring lists URL loading as supported, but no such function exists
The module docstring lists - A list of image URLs as a supported source, but there is no load_images_from_urls() function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
| """Utilities for loading and processing images into DataDesigner seed datasets. | |
| Supports loading from: | |
| - A local directory of image files | |
| - A HuggingFace dataset with an image column | |
| - A list of PIL images | |
| - A list of image URLs | |
| """ | |
| """Utilities for loading and processing images into DataDesigner seed datasets. | |
| Supports loading from: | |
| - A local directory of image files | |
| - A HuggingFace dataset with an image column | |
| - A list of PIL images | |
| """ |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/image_utils.py
Line: 1-8
Comment:
**Module docstring lists URL loading as supported, but no such function exists**
The module docstring lists `- A list of image URLs` as a supported source, but there is no `load_images_from_urls()` function in the file. Either remove the bullet point from the docstring, or add a note that URL loading is not yet implemented, to avoid confusion for users reading the docs.
```suggestion
"""Utilities for loading and processing images into DataDesigner seed datasets.
Supports loading from:
- A local directory of image files
- A HuggingFace dataset with an image column
- A list of PIL images
"""
```
How can I resolve this? If you propose a fix, please make it concise.| # judge_registry.register( | ||
| # JudgeSpec( | ||
| # alias="judge-gpt4o", | ||
| # model_id="openai/gpt-4o", | ||
| # description="GPT-4o as LLM judge", | ||
| # ) | ||
| # ) |
There was a problem hiding this comment.
Commented example has a hyphenated alias and a doubly-prefixed
model_id
If a user uncomments this block as-is, two bugs will surface:
alias="judge-gpt4o"— the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference ({{ eval_judge-gpt4o }}).model_id="openai/gpt-4o"— this includes the provider prefix. BecauseJudgeSpec.to_model_config()also setsprovider="openai"(the default), LiteLLM will receiveopenai/gpt-4owithprovider=openai, which may be interpreted asopenai/openai/gpt-4odepending on routing, causing an API error.
| # judge_registry.register( | |
| # JudgeSpec( | |
| # alias="judge-gpt4o", | |
| # model_id="openai/gpt-4o", | |
| # description="GPT-4o as LLM judge", | |
| # ) | |
| # ) | |
| # judge_registry.register( | |
| # JudgeSpec( | |
| # alias="judge_gpt4o", | |
| # model_id="gpt-4o", | |
| # provider="openai", | |
| # description="GPT-4o as LLM judge", | |
| # ) | |
| # ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 60-66
Comment:
**Commented example has a hyphenated alias and a doubly-prefixed `model_id`**
If a user uncomments this block as-is, two bugs will surface:
1. `alias="judge-gpt4o"` — the hyphen violates the underscore-only rule and will produce a broken Jinja2 column reference (`{{ eval_judge-gpt4o }}`).
2. `model_id="openai/gpt-4o"` — this includes the provider prefix. Because `JudgeSpec.to_model_config()` also sets `provider="openai"` (the default), LiteLLM will receive `openai/gpt-4o` with `provider=openai`, which may be interpreted as `openai/openai/gpt-4o` depending on routing, causing an API error.
```suggestion
# judge_registry.register(
# JudgeSpec(
# alias="judge_gpt4o",
# model_id="gpt-4o",
# provider="openai",
# description="GPT-4o as LLM judge",
# )
# )
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| from __future__ import annotations | ||
|
|
||
| import data_designer.config as dd |
There was a problem hiding this comment.
import data_designer.config as dd is only referenced inside the commented-out block (for dd.ModelProvider). As long as that block remains commented out, dd is never used and this import is dead code.
| import data_designer.config as dd |
Prompt To Fix With AI
This is a comment left during a code review.
Path: elorian/run_example.py
Line: 13
Comment:
**Unused import**
`import data_designer.config as dd` is only referenced inside the commented-out block (for `dd.ModelProvider`). As long as that block remains commented out, `dd` is never used and this import is dead code.
```suggestion
```
How can I resolve this? If you propose a fix, please make it concise.
No description provided.