Report Generation: Adding basic online evaluation scores #46

lotif · 2026-02-11T17:16:43Z

Summary

Adding basic online evaluations for the report generation agent. Those evaluations are meant to be run by the "production" environment and will produce and upload scores to langfuse on the following:

Checking if the final result is present and contains a string match
Adding scores for latency, token count and cost
- Those have been added to langfuse.py so they can be easily reused by other agents

This is how it those scores are displayed in the Langfuse UI:
Trace detail page:

Dashboard page:

Clickup Ticket(s): NA

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes)
⚡ Performance improvement
🧪 Test improvements
🔒 Security fix

Changes Made

Small refactor to split report generation evaluations into online and offline
Adding a function to upload scores for final result against a string match
Adding functions to langfuse.py to upload scores on latency, token count and cost for a trace
Adding the function calls to send evaluations on each run of the demo UI for the report generation agent
Small fix to the UI for better output formatting

Testing

Tests pass locally (uv run pytest tests/)
Type checking passes (uv run mypy <src_dir>)
Linting passes (uv run ruff check src_dir/)
Manual testing performed (describe below)

Manual testing details:

Tested the UI and checked the resulting scores in langfuse.

Checklist

Code follows the project's style guidelines
Self-review of code completed
Documentation updated (if applicable)
No sensitive information (API keys, credentials) exposed

…lity

lotif added 3 commits February 11, 2026 11:33

Adding final response evaluation and some minor improvements

071af24

Finished online scores, need to put it in a thread

cad754a

Moving the score reporting to the lasngfuse module for better reusabi…

3aedc8e

…lity

lotif requested review from amrit110 and fcogidi February 11, 2026 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report Generation: Adding basic online evaluation scores #46

Report Generation: Adding basic online evaluation scores #46

Uh oh!

lotif commented Feb 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Report Generation: Adding basic online evaluation scores #46

Are you sure you want to change the base?

Report Generation: Adding basic online evaluation scores #46

Uh oh!

Conversation

lotif commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Changes Made

Testing

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lotif commented Feb 11, 2026 •

edited

Loading