[LEADS-182] - Add eval config data to the report #142

arin-deloatch · 2026-01-15T18:34:16Z

Description

Add all configuration information to the JSON and TXT file.
Add metric level metadata to CSV file.

Type of change

Tools used to create PR

Assisted-by: Claude
Generated by: Claude-4.5-sonnet

Related Tickets & Documents

Related Issue: RSPEED-2121
Closes: LEADS-182

Checklist before requesting a review

[☑️] I have performed a self-review of my code.
[☑️] PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

test_generator.py and test_evaluator.py were updated to support new additions to the codebase.

Output Examples

leads-182-json-example.json
leads-182-metrics-metadata-example.csv
leads-182-txt-example.txt

Summary by CodeRabbit

New Features
- CSV output now includes per-row metrics metadata for richer evaluation details (applies to successful and error rows).
- Summary reports (JSON and text) now show configurable system configuration parameters.
- Summary sections are customizable; defaults include llm, embedding, and api.
Documentation
- Configuration docs updated to show the new CSV column and summary section options.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T18:34:26Z

Walkthrough

Adds a CSV column for metrics_metadata and a configurable summary_config_sections output option; EvaluationResult now carries metrics_metadata, metrics metadata is extracted during evaluation, and output generation serializes selected system configuration sections into JSON/text summaries.

Changes

Cohort / File(s)	Summary
Config & Constants `config/system.yaml`, `src/lightspeed_evaluation/core/constants.py`	Added `metrics_metadata` to supported CSV columns and added `summary_config_sections` to output configuration.
Data & System Models `src/lightspeed_evaluation/core/models/data.py`, `src/lightspeed_evaluation/core/models/system.py`	`EvaluationResult` gains `metrics_metadata: Optional[str]`. `OutputConfig` gains `summary_config_sections: list[str]` with default `["llm","embedding","api"]`.
Output Generation `src/lightspeed_evaluation/core/output/generator.py`	Added methods to serialize/format selected config sections (`_get_included_config_sections`, `_convert_config_to_dict`, `_format_config_section`, `_write_config_params`, `_build_config_dict`) and included configuration in JSON/text summaries.
Evaluation Pipeline `src/lightspeed_evaluation/pipeline/evaluation/evaluator.py`	Added `_extract_metadata_for_csv` to build JSON metadata (excluding threshold/metric_identifier) and populate `metrics_metadata` on both success and error `EvaluationResult`s.
Tests `tests/unit/core/output/test_generator.py`, `tests/unit/pipeline/evaluation/test_evaluator.py`	Mocks adjusted to provide iterable `model_fields.keys()` and stubbed `get_metric_metadata` to avoid iteration errors in new config/metadata paths.
Docs `docs/configuration.md`	Documented `metrics_metadata` in output CSV columns and metrics section.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client
participant Evaluator as MetricsEvaluator
participant MM as MetricManager
participant Result as EvaluationResult
participant Output as OutputHandler
participant CSV as CSV Writer

Client->>Evaluator: submit EvaluationRequest
Evaluator->>MM: get_metric_metadata(metric_identifier, level, context)
MM-->>Evaluator: metadata dict
Evaluator->>Result: create EvaluationResult(..., metrics_metadata=JSON)
Evaluator-->>Client: return EvaluationResult
Client->>Output: request write summary/csv
Output->>Result: read metrics_metadata
Output->>CSV: write row including metrics_metadata

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

add additional fields to output for non-error scenarios #114 — modifies SUPPORTED_CSV_COLUMNS and EvaluationResult; closely related to adding metrics_metadata.
[LEADS-141] Add Latency Metrics to Evaluation Reports #127 — touches SUPPORTED_CSV_COLUMNS and result model fields; related changes to CSV/result schema.

Suggested reviewers

asamal4
VladimirKadlec

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the changeset: adding evaluation configuration data to reports. It aligns with the primary changes across JSON and TXT output generation.
Docstring Coverage	✅ Passed	Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

src/lightspeed_evaluation/core/models/system.py (1)

219-222: Consider adding validation for summary_config_sections values.

The new field lacks validation unlike sibling fields csv_columns and enabled_outputs. Invalid section names would be silently ignored in the generator, which could confuse users who mistype a section name.

♻️ Optional: Add a validator for valid section names

+    SUPPORTED_CONFIG_SECTIONS = {"llm", "embedding", "api", "core", "output", "logging", "visualization"}
+
     summary_config_sections: list[str] = Field(
         default=["llm", "embedding", "api"],
         description="Configuration sections to include in summary reports",
     )
+
+    `@field_validator`("summary_config_sections")
+    `@classmethod`
+    def validate_summary_config_sections(cls, v: list[str]) -> list[str]:
+        """Validate that all summary config sections are supported."""
+        for section in v:
+            if section not in cls.SUPPORTED_CONFIG_SECTIONS:
+                raise ValueError(
+                    f"Unsupported config section: {section}. "
+                    f"Supported sections: {sorted(cls.SUPPORTED_CONFIG_SECTIONS)}"
+                )
+        return v

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25293d8 and 3ffbb1e.

📒 Files selected for processing (9)

config/system.yaml
docs/configuration.md
src/lightspeed_evaluation/core/constants.py
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/output/generator.py
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
tests/unit/core/output/test_generator.py
tests/unit/pipeline/evaluation/test_evaluator.py

✅ Files skipped from review due to trivial changes (1)

docs/configuration.md

🚧 Files skipped from review as they are similar to previous changes (4)

tests/unit/core/output/test_generator.py
config/system.yaml
src/lightspeed_evaluation/core/models/data.py
src/lightspeed_evaluation/core/constants.py

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead

Files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
src/lightspeed_evaluation/core/models/system.py
tests/unit/pipeline/evaluation/test_evaluator.py
src/lightspeed_evaluation/core/output/generator.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Include type hints for all public functions and methods in Python
Use Google-style docstrings for all public APIs in Python

Files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/output/generator.py

src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Use custom exceptions from core.system.exceptions module for error handling
Use structured logging with appropriate log levels in Python code

Files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/output/generator.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock
Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock
Mirror test directory structure to match source code structure
Use test_.py naming convention for test files and test_ prefix for test functions
Use Test* prefix for test classes in Python test files
Mock LLM calls in tests using pytest mocker fixture

Files:

tests/unit/pipeline/evaluation/test_evaluator.py

🧠 Learnings (2)

📚 Learning: 2026-01-15T00:41:27.575Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 140
File: src/lightspeed_evaluation/runner/evaluation.py:41-41
Timestamp: 2026-01-15T00:41:27.575Z
Learning: In Python code reviews for the lightspeed-evaluation project, allow keeping the pylint disable: too-many-locals in functions that use lazy imports to group many local variables in one scope for readability. If a function uses lazy imports to justify this pattern, document why and avoid over-refactoring into multiple helpers just for lint reasons. If you can refactor without increasing complexity or number of globals, prefer refactoring; otherwise retain the pylint directive with a clear rationale.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/output/generator.py

📚 Learning: 2025-09-10T15:48:14.671Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:43-49
Timestamp: 2025-09-10T15:48:14.671Z
Learning: In the lightspeed-evaluation framework, system configuration uses Pydantic data models (SystemConfig, OutputConfig, LoggingConfig, etc.) rather than plain dictionaries. Components like OutputHandler receive properly structured Pydantic models, so direct attribute access (e.g., system_config.output.enabled_outputs) is the correct approach.

Applied to files:

src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/core/output/generator.py

🧬 Code graph analysis (2)

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (2)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationRequest (484-537)

src/lightspeed_evaluation/core/metrics/manager.py (2)

MetricLevel (10-14)

get_metric_metadata (50-82)

tests/unit/pipeline/evaluation/test_evaluator.py (1)

src/lightspeed_evaluation/core/metrics/manager.py (1)

get_metric_metadata (50-82)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: tests (3.12)
GitHub Check: tests (3.11)
GitHub Check: tests (3.13)
GitHub Check: radon

🔇 Additional comments (11)

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (3)

240-272: Well-structured helper method with proper type hints and documentation.

The method correctly:

Determines the appropriate metric level

Delegates to metric_manager.get_metric_metadata

Filters out redundant fields (threshold, metric_identifier) already present in CSV columns

Handles empty metadata gracefully by returning None

Reuses the existing _to_json_str helper for consistent serialization

196-196: LGTM!

The metrics_metadata field is correctly populated via the new helper method.

231-231: LGTM!

Consistent with the success path - error results also capture available metrics metadata for traceability.

src/lightspeed_evaluation/core/output/generator.py (7)

9-9: LGTM!

The BaseModel import enables proper type annotations for the new configuration conversion methods.

243-243: LGTM!

Adding the configuration dictionary to JSON output provides valuable context for reproducibility and debugging.

309-310: LGTM!

Configuration parameters are now included in the text summary, maintaining consistency with the JSON output.

429-461: LGTM!

The method properly iterates through configured sections, formats section names for display, and delegates formatting to the helper method.

475-486: LGTM!

Good security practice excluding cache_dir from output. The method handles both Pydantic models and plain dictionaries appropriately.

488-523: LGTM!

The formatting logic handles different value types (lists, None, primitives) appropriately for readable text output.

525-546: LGTM!

The JSON configuration builder correctly reuses _get_included_config_sections and _convert_config_to_dict for consistency with the text output.
tests/unit/pipeline/evaluation/test_evaluator.py (1)
51-52: LGTM! Consider adding test coverage for _extract_metadata_for_csv with non-null metadata.

The mock correctly returns None to support the default test path. However, the new _extract_metadata_for_csv helper has logic for filtering and serializing metadata that isn't tested when get_metric_metadata always returns None.

Consider adding a test case that verifies the metadata extraction and filtering:
def test_extract_metadata_for_csv_filters_threshold_and_metric_id(
    self, config_loader, mock_metric_manager, mock_script_manager, mocker
):
    """Test _extract_metadata_for_csv excludes threshold and metric_identifier."""
    # Setup mocks
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics")

    # Configure mock to return metadata with fields that should be filtered
    mock_metric_manager.get_metric_metadata.return_value = {
        "threshold": 0.7,
        "metric_identifier": "ragas:faithfulness",
        "criteria": "Check faithfulness",
        "evaluation_steps": ["step1", "step2"],
    }

    evaluator = MetricsEvaluator(config_loader, mock_metric_manager, mock_script_manager)

    turn_data = TurnData(turn_id="1", query="Q", response="R")
    conv_data = EvaluationData(conversation_group_id="test", turns=[turn_data])
    request = EvaluationRequest.for_turn(conv_data, "ragas:faithfulness", 0, turn_data)

    result = evaluator._extract_metadata_for_csv(request)

    import json
    parsed = json.loads(result)
    assert "threshold" not in parsed
    assert "metric_identifier" not in parsed
    assert parsed["criteria"] == "Check faithfulness"
    assert parsed["evaluation_steps"] == ["step1", "step2"]

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

VladimirKadlec

Nice work, thank you. Please add this to the documentation.
Other than that LGTM.

asamal4

Thank you.. LGTM. one nit..

FYI.. Also we will have to refactor metadata mapping to simplify the logic (to remove additional logic in evaluator.py). But not part of this story or PR.

asamal4 · 2026-01-16T12:23:27Z

src/lightspeed_evaluation/core/output/generator.py

+        if self.system_config is not None and hasattr(self.system_config, "output"):
+            if hasattr(self.system_config.output, "summary_config_sections"):
+                return self.system_config.output.summary_config_sections
+        # Default sections if not configured (see system.py:220)


Please remove code line number, that may get change.

in fact remove complete comment itself and use a constant; that will be self-explanatory.

…summary reports

Mock system_config and metric_manager now properly support iteration operations to prevent TypeError when production code iterates over model_fields.keys() or metadata.items(). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

asamal4 · 2026-01-18T16:12:02Z

src/lightspeed_evaluation/core/models/system.py

    )

+    summary_config_sections: list[str] = Field(
+        default=["llm", "embedding", "api"],


Please use a constant for this.

Suggested change

default=["llm", "embedding", "api"],

default=DEFAULT_STORED_CONFIGS,

asamal4 · 2026-01-18T16:16:02Z

src/lightspeed_evaluation/core/constants.py

    "result",
    "score",
    "threshold",
+    "metrics_metadata",


Please rename the column name

Suggested change

"metrics_metadata",

"metric_metadata",

VladimirKadlec suggested changes Jan 16, 2026

View reviewed changes

asamal4 reviewed Jan 16, 2026

View reviewed changes

arin-deloatch and others added 6 commits January 16, 2026 08:16

updating OutputConfig to support the inclusion of config sections in …

ec8fb7e

…summary reports

feat: .txt config writing support

f2aa4bb

adding support for JSON config writes; added helper to reduce redundancy

1c59cd5

adding support for metric_metadata CSV column + formatting missed

0c0d8a2

added metrics_metadata information to configuration.md

3ffbb1e

arin-deloatch force-pushed the feature/LEADS-182 branch from 25293d8 to 3ffbb1e Compare January 16, 2026 15:28

asamal4 reviewed Jan 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LEADS-182] - Add eval config data to the report #142

[LEADS-182] - Add eval config data to the report #142

arin-deloatch commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

VladimirKadlec left a comment

Uh oh!

asamal4 left a comment

Uh oh!

asamal4 Jan 16, 2026 •

edited

Loading

Uh oh!

asamal4 Jan 18, 2026

Uh oh!

asamal4 Jan 18, 2026

Uh oh!

asamal4 Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	default=["llm", "embedding", "api"],
	default=DEFAULT_STORED_CONFIGS,

[LEADS-182] - Add eval config data to the report #142

Are you sure you want to change the base?

[LEADS-182] - Add eval config data to the report #142

Conversation

arin-deloatch commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Output Examples

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

asamal4 left a comment

Choose a reason for hiding this comment

Uh oh!

asamal4 Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asamal4 Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

asamal4 Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

asamal4 Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

arin-deloatch commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

asamal4 Jan 16, 2026 •

edited

Loading