Skip to content

Conversation

@arin-deloatch
Copy link
Contributor

@arin-deloatch arin-deloatch commented Jan 15, 2026

Description

Add all configuration information to the JSON and TXT file.
Add metric level metadata to CSV file.

Type of change

  • Refactor
  • [☑️] New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Assisted-by: Claude
Generated by: Claude-4.5-sonnet

Related Tickets & Documents

Related Issue: RSPEED-2121
Closes: LEADS-182

Checklist before requesting a review

  • [☑️] I have performed a self-review of my code.
  • [☑️] PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

test_generator.py and test_evaluator.py were updated to support new additions to the codebase.

Output Examples

leads-182-json-example.json
leads-182-metrics-metadata-example.csv
leads-182-txt-example.txt

Summary by CodeRabbit

  • New Features

    • CSV output now includes per-row metrics metadata for richer evaluation details (applies to successful and error rows).
    • Summary reports (JSON and text) now show configurable system configuration parameters.
    • Summary sections are customizable; defaults include llm, embedding, and api.
  • Documentation

    • Configuration docs updated to show the new CSV column and summary section options.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds a CSV column for metrics_metadata and a configurable summary_config_sections output option; EvaluationResult now carries metrics_metadata, metrics metadata is extracted during evaluation, and output generation serializes selected system configuration sections into JSON/text summaries.

Changes

Cohort / File(s) Summary
Config & Constants
config/system.yaml, src/lightspeed_evaluation/core/constants.py
Added metrics_metadata to supported CSV columns and added summary_config_sections to output configuration.
Data & System Models
src/lightspeed_evaluation/core/models/data.py, src/lightspeed_evaluation/core/models/system.py
EvaluationResult gains metrics_metadata: Optional[str]. OutputConfig gains summary_config_sections: list[str] with default ["llm","embedding","api"].
Output Generation
src/lightspeed_evaluation/core/output/generator.py
Added methods to serialize/format selected config sections (_get_included_config_sections, _convert_config_to_dict, _format_config_section, _write_config_params, _build_config_dict) and included configuration in JSON/text summaries.
Evaluation Pipeline
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
Added _extract_metadata_for_csv to build JSON metadata (excluding threshold/metric_identifier) and populate metrics_metadata on both success and error EvaluationResults.
Tests
tests/unit/core/output/test_generator.py, tests/unit/pipeline/evaluation/test_evaluator.py
Mocks adjusted to provide iterable model_fields.keys() and stubbed get_metric_metadata to avoid iteration errors in new config/metadata paths.
Docs
docs/configuration.md
Documented metrics_metadata in output CSV columns and metrics section.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client
participant Evaluator as MetricsEvaluator
participant MM as MetricManager
participant Result as EvaluationResult
participant Output as OutputHandler
participant CSV as CSV Writer

Client->>Evaluator: submit EvaluationRequest
Evaluator->>MM: get_metric_metadata(metric_identifier, level, context)
MM-->>Evaluator: metadata dict
Evaluator->>Result: create EvaluationResult(..., metrics_metadata=JSON)
Evaluator-->>Client: return EvaluationResult
Client->>Output: request write summary/csv
Output->>Result: read metrics_metadata
Output->>CSV: write row including metrics_metadata

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • asamal4
  • VladimirKadlec
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the changeset: adding evaluation configuration data to reports. It aligns with the primary changes across JSON and TXT output generation.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
src/lightspeed_evaluation/core/models/system.py (1)

219-222: Consider adding validation for summary_config_sections values.

The new field lacks validation unlike sibling fields csv_columns and enabled_outputs. Invalid section names would be silently ignored in the generator, which could confuse users who mistype a section name.

♻️ Optional: Add a validator for valid section names
+    SUPPORTED_CONFIG_SECTIONS = {"llm", "embedding", "api", "core", "output", "logging", "visualization"}
+
     summary_config_sections: list[str] = Field(
         default=["llm", "embedding", "api"],
         description="Configuration sections to include in summary reports",
     )
+
+    `@field_validator`("summary_config_sections")
+    `@classmethod`
+    def validate_summary_config_sections(cls, v: list[str]) -> list[str]:
+        """Validate that all summary config sections are supported."""
+        for section in v:
+            if section not in cls.SUPPORTED_CONFIG_SECTIONS:
+                raise ValueError(
+                    f"Unsupported config section: {section}. "
+                    f"Supported sections: {sorted(cls.SUPPORTED_CONFIG_SECTIONS)}"
+                )
+        return v

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25293d8 and 3ffbb1e.

📒 Files selected for processing (9)
  • config/system.yaml
  • docs/configuration.md
  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/output/generator.py
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • tests/unit/core/output/test_generator.py
  • tests/unit/pipeline/evaluation/test_evaluator.py
✅ Files skipped from review due to trivial changes (1)
  • docs/configuration.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/unit/core/output/test_generator.py
  • config/system.yaml
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/constants.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead

Files:

  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • src/lightspeed_evaluation/core/models/system.py
  • tests/unit/pipeline/evaluation/test_evaluator.py
  • src/lightspeed_evaluation/core/output/generator.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Include type hints for all public functions and methods in Python
Use Google-style docstrings for all public APIs in Python

Files:

  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/output/generator.py
src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Use custom exceptions from core.system.exceptions module for error handling
Use structured logging with appropriate log levels in Python code

Files:

  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/output/generator.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock
Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock
Mirror test directory structure to match source code structure
Use test_.py naming convention for test files and test_ prefix for test functions
Use Test* prefix for test classes in Python test files
Mock LLM calls in tests using pytest mocker fixture

Files:

  • tests/unit/pipeline/evaluation/test_evaluator.py
🧠 Learnings (2)
📚 Learning: 2026-01-15T00:41:27.575Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 140
File: src/lightspeed_evaluation/runner/evaluation.py:41-41
Timestamp: 2026-01-15T00:41:27.575Z
Learning: In Python code reviews for the lightspeed-evaluation project, allow keeping the pylint disable: too-many-locals in functions that use lazy imports to group many local variables in one scope for readability. If a function uses lazy imports to justify this pattern, document why and avoid over-refactoring into multiple helpers just for lint reasons. If you can refactor without increasing complexity or number of globals, prefer refactoring; otherwise retain the pylint directive with a clear rationale.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/output/generator.py
📚 Learning: 2025-09-10T15:48:14.671Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:43-49
Timestamp: 2025-09-10T15:48:14.671Z
Learning: In the lightspeed-evaluation framework, system configuration uses Pydantic data models (SystemConfig, OutputConfig, LoggingConfig, etc.) rather than plain dictionaries. Components like OutputHandler receive properly structured Pydantic models, so direct attribute access (e.g., system_config.output.enabled_outputs) is the correct approach.

Applied to files:

  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/output/generator.py
🧬 Code graph analysis (2)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (2)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationRequest (484-537)
src/lightspeed_evaluation/core/metrics/manager.py (2)
  • MetricLevel (10-14)
  • get_metric_metadata (50-82)
tests/unit/pipeline/evaluation/test_evaluator.py (1)
src/lightspeed_evaluation/core/metrics/manager.py (1)
  • get_metric_metadata (50-82)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.11)
  • GitHub Check: tests (3.13)
  • GitHub Check: radon
🔇 Additional comments (11)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (3)

240-272: Well-structured helper method with proper type hints and documentation.

The method correctly:

  • Determines the appropriate metric level
  • Delegates to metric_manager.get_metric_metadata
  • Filters out redundant fields (threshold, metric_identifier) already present in CSV columns
  • Handles empty metadata gracefully by returning None
  • Reuses the existing _to_json_str helper for consistent serialization

196-196: LGTM!

The metrics_metadata field is correctly populated via the new helper method.


231-231: LGTM!

Consistent with the success path - error results also capture available metrics metadata for traceability.

src/lightspeed_evaluation/core/output/generator.py (7)

9-9: LGTM!

The BaseModel import enables proper type annotations for the new configuration conversion methods.


243-243: LGTM!

Adding the configuration dictionary to JSON output provides valuable context for reproducibility and debugging.


309-310: LGTM!

Configuration parameters are now included in the text summary, maintaining consistency with the JSON output.


429-461: LGTM!

The method properly iterates through configured sections, formats section names for display, and delegates formatting to the helper method.


475-486: LGTM!

Good security practice excluding cache_dir from output. The method handles both Pydantic models and plain dictionaries appropriately.


488-523: LGTM!

The formatting logic handles different value types (lists, None, primitives) appropriately for readable text output.


525-546: LGTM!

The JSON configuration builder correctly reuses _get_included_config_sections and _convert_config_to_dict for consistency with the text output.

tests/unit/pipeline/evaluation/test_evaluator.py (1)

51-52: LGTM! Consider adding test coverage for _extract_metadata_for_csv with non-null metadata.

The mock correctly returns None to support the default test path. However, the new _extract_metadata_for_csv helper has logic for filtering and serializing metadata that isn't tested when get_metric_metadata always returns None.

Consider adding a test case that verifies the metadata extraction and filtering:

def test_extract_metadata_for_csv_filters_threshold_and_metric_id(
    self, config_loader, mock_metric_manager, mock_script_manager, mocker
):
    """Test _extract_metadata_for_csv excludes threshold and metric_identifier."""
    # Setup mocks
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.LLMManager")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.EmbeddingManager")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.RagasMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.DeepEvalMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.CustomMetrics")
    mocker.patch("lightspeed_evaluation.pipeline.evaluation.evaluator.ScriptEvalMetrics")

    # Configure mock to return metadata with fields that should be filtered
    mock_metric_manager.get_metric_metadata.return_value = {
        "threshold": 0.7,
        "metric_identifier": "ragas:faithfulness",
        "criteria": "Check faithfulness",
        "evaluation_steps": ["step1", "step2"],
    }

    evaluator = MetricsEvaluator(config_loader, mock_metric_manager, mock_script_manager)

    turn_data = TurnData(turn_id="1", query="Q", response="R")
    conv_data = EvaluationData(conversation_group_id="test", turns=[turn_data])
    request = EvaluationRequest.for_turn(conv_data, "ragas:faithfulness", 0, turn_data)

    result = evaluator._extract_metadata_for_csv(request)

    import json
    parsed = json.loads(result)
    assert "threshold" not in parsed
    assert "metric_identifier" not in parsed
    assert parsed["criteria"] == "Check faithfulness"
    assert parsed["evaluation_steps"] == ["step1", "step2"]

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you. Please add this to the documentation.
Other than that LGTM.

Copy link
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.. LGTM. one nit..

FYI.. Also we will have to refactor metadata mapping to simplify the logic (to remove additional logic in evaluator.py). But not part of this story or PR.

if self.system_config is not None and hasattr(self.system_config, "output"):
if hasattr(self.system_config.output, "summary_config_sections"):
return self.system_config.output.summary_config_sections
# Default sections if not configured (see system.py:220)
Copy link
Collaborator

@asamal4 asamal4 Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove code line number, that may get change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact remove complete comment itself and use a constant; that will be self-explanatory.

arin-deloatch and others added 6 commits January 16, 2026 08:16
Mock system_config and metric_manager now properly support iteration
operations to prevent TypeError when production code iterates over
model_fields.keys() or metadata.items().

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
)

summary_config_sections: list[str] = Field(
default=["llm", "embedding", "api"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a constant for this.

Suggested change
default=["llm", "embedding", "api"],
default=DEFAULT_STORED_CONFIGS,

"result",
"score",
"threshold",
"metrics_metadata",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the column name

Suggested change
"metrics_metadata",
"metric_metadata",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants