Skip to content

Conversation

@evanjhoward11
Copy link

@evanjhoward11 evanjhoward11 commented Dec 20, 2025

Description

Please include a summary of the changes and the related issue. Include relevant motivation and context.
I fixed the type hints for the following files:

src/agentunit/adapters/agentops_adapter.py
src/agentunit/core/scenario.py
src/agentunit/datasets/builtins.py
src/agentunit/metrics/builtin.py
src/agentunit/production/integrations.py

Note: /metrics/builtin.py has one import-untyped error

Fixes #13

Type of Change

Please delete options that are not relevant:

  • Code refactoring
  • Test coverage improvement

Changes Made

Please provide a detailed list of changes:

  • Miscellaneous changes to all the files

Testing

Please describe the tests you ran to verify your changes:

  • Existing test suite passes (pytest)

Test Configuration

  • Python version: 3.11.4
  • Operating System: Windows 11
  • Relevant adapters tested: N/A

Test Results

# Paste relevant test output or results
$ mypy --follow-imports=silent src/agentunit/adapters/agentops_adapter.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/core/scenario.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/datasets/builtins.py
Success: no issues found in 1 source file

$ mypy --follow-imports=silent src/agentunit/metrics/builtin.py
src\agentunit\metrics\builtin.py:19: error: Skipping analyzing "ragas.metrics": module is installed, but missing library stubs or py.typed marker  [import-untyped]
src\agentunit\metrics\builtin.py:19: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 1 source file)

$ mypy --follow-imports=silent src/agentunit/production/integrations.py
Success: no issues found in 1 source file

Code Quality

  • My code follows the project's style guidelines (Ruff, Black)
  • I have run pre-commit run --all-files and addressed any issues
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have added type hints where appropriate
  • I have updated docstrings following Google style

Documentation

  • I have updated relevant documentation (README, docs/, inline comments)
  • I have added/updated docstrings for new/modified functions
  • I have updated CHANGELOG.md under [Unreleased]
  • I have added examples if introducing new features

Breaking Changes

If this PR introduces breaking changes, please describe:

  • What breaks:
  • Migration path for users:
  • Deprecation warnings added (if applicable):

Dependencies

  • No new dependencies added
  • New dependencies added (list below with justification)

New dependencies:

  • package-name: Reason for adding

Performance Impact

Describe any performance implications:

  • No performance impact
  • Performance improvement (describe and provide benchmarks)
  • Potential performance regression (describe and justify)

Screenshots/Recordings (if applicable)

Add screenshots or recordings to help explain your changes:

Additional Context

Add any other context about the PR here:

  • Links to related issues or PRs
  • References to external documentation
  • Design decisions and trade-offs
  • Known limitations or future work

Checklist

  • I have read the CONTRIBUTING.md guide
  • My branch name follows the convention (feature/, fix/, docs/, etc.)
  • My commit messages follow the conventional commit format
  • I have tested my changes locally
  • I have updated the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • All new and existing tests pass
  • I have checked for security vulnerabilities in any new dependencies

Reviewer Notes

Please pay special attention to:

Post-Merge Tasks

Tasks to complete after merging (if any):

  • Update external documentation
  • Announce in discussions/community
  • Create follow-up issues for future work

Summary by CodeRabbit

  • New Features

    • AgentOps integration for tracing and run lifecycle; tracing enabled by default and run IDs derived from traces
    • Public attributes for tracing control and client exposure
  • Refactor

    • Broadened factory method options for flexible integrations
    • Introduced structured dataset row format with optional metadata
    • Safer metadata access and clarified type annotations
  • Public API

    • Async scenario run signature, evaluation return widened, platform accessor removed

✏️ Tip: You can customize this high-level summary in your review settings.

@continue
Copy link

continue bot commented Dec 20, 2025

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Walkthrough

Replaces LangSmith-specific flows with AgentOps tracing and session handling in the adapter; makes run_scenario async; broadens factory method option typings to Any; introduces a DatasetRow TypedDict; adds defensive metadata access and an explicit baseline_stats type annotation.

Changes

Cohort / File(s) Change Summary
AgentOps integration & tracing refactor
src/agentunit/adapters/agentops_adapter.py
Adds _LANGSMITH_UNSUPPORTED, enable_tracing, and public client; unconditionally initializes AgentOps session; converts run_scenario to async; starts/updates/ends AgentOps traces (start_trace, update_trace_metadata, end_trace); uses AgentOps TraceLog objects and project_id; send_message defaults to_agent to "broadcast" and records AgentOps events; several metrics/evaluation methods now raise NotImplementedError for LangSmith-specific behavior.
Factory method typing updates
src/agentunit/core/scenario.py
Changes many factory methods to accept **options: Any (replacing object); adds Any to TYPE_CHECKING imports; adjusts default name resolution in from_openai_agents.
Dataset row typing
src/agentunit/datasets/builtins.py
Introduces DatasetRow TypedDict (id, query, expected_output, tools, context, optional metadata); updates _GAIA_L1_SHOPPING and _SWE_BENCH_LITE types and _build_loader(rows: list[DatasetRow]); SWE items include optional metadata.
Defensive metadata access & explicit typing
src/agentunit/metrics/builtin.py, src/agentunit/production/integrations.py
Uses metadata = getattr(trace, "metadata", {}) in cost/token metrics to avoid AttributeError; annotates baseline_stats: dict[str, dict[str, dict[str, float]]] in baseline calculation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Adapter as AgentOpsAdapter
  participant Runner as ScenarioRunner
  participant AgentOps as AgentOpsClient

  Client->>Adapter: run_scenario(scenario)
  Note right of Adapter: run_scenario is async
  Adapter->>AgentOps: start_trace(trace_metadata)
  AgentOps-->>Adapter: trace_id
  Adapter->>Runner: execute scenario (trace_id)
  Runner-->>Adapter: events / result
  Adapter->>AgentOps: update_trace_metadata(trace_id, metrics/result)
  alt success
    Adapter->>AgentOps: end_trace(trace_id, status="completed")
  else failure
    Adapter->>AgentOps: end_trace(trace_id, status="failed", details)
  end
  Adapter-->>Client: ScenarioResult (scenario_run_id from trace_id)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 4
❌ Failed checks (3 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The linked issue #13 specifically requires adding type hints to DatasetCase in src/agentunit/datasets/base.py, but the PR changes do not include modifications to base.py; instead, changes are in builtins.py. The core requirement to fix DatasetCase type hints in base.py is not met. Verify whether the changes to src/agentunit/datasets/builtins.py adequately address issue #13, or add the missing changes to src/agentunit/datasets/base.py to explicitly add type hints to DatasetCase fields.
Out of Scope Changes check ⚠️ Warning The PR includes significant changes to agentops_adapter.py (enable_tracing, client attributes, async run_scenario, AgentOps-centric refactoring) that go far beyond fixing type hints and are unrelated to the linked issue #13 about DatasetCase type hints. Either scope this PR to focus on type hint fixes only, or create separate PRs for the AgentOps adapter refactoring to keep changes focused and aligned with the linked issue.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Fix miscellaneous type hints' is vague and generic; it does not convey specific information about what type hints were fixed or which modules were affected. Revise the title to be more specific, such as 'Fix type hints across adapters, datasets, metrics, and core modules' or 'Add proper type annotations to DatasetRow and AgentOps adapter' to better reflect the actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers the template structure adequately, listing the files modified, the issue fixed, type of change, testing results, and completing most relevant sections including verification that mypy type checking passes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

73-87: Remove inconsistent and outdated docstring block.

Lines 73-87 contain a duplicate/outdated docstring that references "LangSmith" instead of "AgentOps" and has incorrect parameter descriptions (e.g., "Langsmith project ID" on line 78). This conflicts with the actual AgentOps docstring above (lines 56-65).

🔎 Proposed fix
-        """
-        Initialize LangSmith adapter.
-
-        Args:
-            api_key: LangSmith API key
-            project_id: Langsmith project ID
-            endpoint: Optional custom LangSmith endpoint
-            enable_tracing: Whether to enable automatic tracing
-            enable_feedback: Whether to collect feedback data
-        """
-        self.api_key = api_key
-        self.project_id = project_id
-        self.default_tags = default_tags or []
-        self.auto_start_session = auto_start_session
-
🧹 Nitpick comments (2)
src/agentunit/adapters/agentops_adapter.py (2)

70-71: Type annotation added for new attributes.

The self.client: Any annotation provides some type information, though Any sacrifices type safety. Consider using a protocol or the actual agentops client type when available for better type checking.


103-103: Document the import suppression or address the underlying issue.

The # type: ignore[import-not-found] comment suppresses mypy's missing import error. This aligns with the PR description noting that ragas import issues remain. Consider documenting why this is acceptable (e.g., optional dependency) in a comment.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 315a9a4 and cec0c1d.

📒 Files selected for processing (5)
  • src/agentunit/adapters/agentops_adapter.py (13 hunks)
  • src/agentunit/core/scenario.py (12 hunks)
  • src/agentunit/datasets/builtins.py (4 hunks)
  • src/agentunit/metrics/builtin.py (2 hunks)
  • src/agentunit/production/integrations.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/agentunit/adapters/agentops_adapter.py (4)
src/agentunit/adapters/langsmith_adapter.py (2)
  • platform (104-106)
  • run_scenario (355-516)
src/agentunit/production/integrations.py (1)
  • MonitoringPlatform (33-41)
src/agentunit/core/scenario.py (1)
  • Scenario (23-278)
src/agentunit/reporting/results.py (1)
  • ScenarioResult (32-66)
src/agentunit/datasets/builtins.py (1)
src/agentunit/datasets/base.py (1)
  • DatasetCase (19-27)
src/agentunit/core/scenario.py (2)
src/agentunit/datasets/base.py (1)
  • name (38-39)
tests/test_framework_adapters.py (1)
  • flow (105-107)
🔇 Additional comments (11)
src/agentunit/production/integrations.py (1)

100-100: LGTM! Type annotation correctly describes the baseline stats structure.

The explicit type annotation accurately reflects the nested dictionary structure used for categorizing baseline statistics by metric category, metric name, and statistical measures.

src/agentunit/metrics/builtin.py (2)

146-148: LGTM! Defensive metadata access prevents AttributeError.

The change from direct trace.metadata access to getattr(trace, "metadata", {}) safely handles cases where the trace object may not have a metadata attribute, falling back to an empty dict.


172-174: LGTM! Consistent defensive pattern applied.

The same safe metadata access pattern is correctly applied here, maintaining consistency with CostMetric above.

src/agentunit/datasets/builtins.py (4)

16-24: LGTM! DatasetRow TypedDict correctly structures dataset rows.

The TypedDict definition properly mirrors the DatasetCase dataclass structure from base.py, with appropriate use of NotRequired for the optional metadata field. This provides strong type checking for dataset row dictionaries.


27-27: LGTM! Explicit type annotation improves type safety.

The type annotation list[DatasetRow] provides clear type information for the dataset, enabling better IDE support and type checking.


44-44: LGTM! Consistent type annotation applied.

The same list[DatasetRow] annotation is consistently applied to this dataset.


64-64: LGTM! Function signature properly typed.

The updated signature with list[DatasetRow] parameter type and Iterable[DatasetCase] return type provides clear type contracts for the loader function.

src/agentunit/core/scenario.py (2)

98-98: **LGTM! Using Any for kwargs is more appropriate than object.

The change from **options: object to **options: Any is the correct approach for variadic keyword arguments, providing better flexibility while maintaining type safety at call sites.

Also applies to: 116-116, 131-131, 146-146, 161-161, 176-176, 191-191, 207-207, 222-222, 242-242


71-71: Verify the necessity of the additional None check.

The change from getattr(flow, "__name__", "openai-agents-scenario") to getattr(flow, "__name__", None) or "openai-agents-scenario" adds protection against falsy __name__ values (like empty strings). However, this differs from the pattern used in other factory methods (lines 89, 107, 122, etc.).

Is there a specific case where flow.__name__ could be an empty string or other falsy value? If not, consider keeping consistency with other factory methods for maintainability.

#!/bin/bash
# Check if other factory methods in the same file use similar patterns
rg -n "getattr.*__name__" src/agentunit/core/scenario.py
src/agentunit/adapters/agentops_adapter.py (2)

355-355: LGTM! Explicit type annotation improves clarity.

The agent_counts: dict[str, int] annotation clearly documents the structure of the dictionary being built.


385-385: AgentOpsAdapter.run_scenario is async, but inconsistently implemented across adapters.

The run_scenario method in AgentOpsAdapter is now async, while LangSmithAdapter and WandbAdapter retain sync implementations. This creates API inconsistency: the abstract base class declares it as async, but not all adapters comply. Additionally, the runner's main execution path calls adapter.execute() rather than adapter.run_scenario(), so this breaking change only affects direct callers of the method. Consider either making all adapters async or clarifying whether run_scenario is the intended public interface (if not, it could be internal).

Comment on lines 342 to 350
if isinstance(interaction.from_agent, list):
unique_agents.update(interaction.from_agent)
else:
unique_agents.add(interaction.from_agent)
if interaction.to_agent:
unique_agents.add(interaction.to_agent)
if isinstance(interaction.to_agent, list):
unique_agents.update(interaction.to_agent)
else:
unique_agents.add(interaction.to_agent)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find the agentops_adapter.py file
find . -type f -name "agentops_adapter.py" | head -20

# Check the file structure
fd "agentunit" --type d | head -10

Repository: aviralgarg05/agentunit

Length of output: 125


🏁 Script executed:

#!/bin/bash

# First, let's check if the file exists and get basic info
if [ -f "src/agentunit/adapters/agentops_adapter.py" ]; then
  echo "File found. Checking lines 240-260 for AgentInteraction definition..."
  sed -n '240,260p' src/agentunit/adapters/agentops_adapter.py
  echo ""
  echo "Checking lines 342-350 for the code in review..."
  sed -n '342,350p' src/agentunit/adapters/agentops_adapter.py
else
  echo "File not found at expected location. Searching for it..."
  find . -name "agentops_adapter.py" -type f 2>/dev/null
fi

# Also search for AgentInteraction class definition
echo ""
echo "Searching for AgentInteraction class definition..."
rg "class AgentInteraction" --type py -A 10 2>/dev/null || echo "Pattern not found with rg"

Repository: aviralgarg05/agentunit

Length of output: 2159


Remove unnecessary list handling for from_agent.

The to_agent field is properly typed as AgentID | list[AgentID] to support broadcast messaging, so the list checks for that field are correct. However, from_agent is typed only as AgentID (string), not as a union type. Remove the isinstance check for from_agent being a list, or update its type hint to AgentID | list[AgentID] if list support is intentional.

🤖 Prompt for AI Agents
In src/agentunit/adapters/agentops_adapter.py around lines 342 to 350, the code
treats interaction.from_agent as possibly being a list but its type is only
AgentID (string); remove the unnecessary isinstance(interaction.from_agent,
list) branch and simply call unique_agents.add(interaction.from_agent), unless
you intentionally want to support broadcasting from multiple agents—in that case
update the type hint for from_agent to AgentID | list[AgentID] and keep the
existing list handling; ensure any callers and type annotations are updated to
match the chosen approach.

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the issues

@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 5.26316% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentunit/adapters/agentops_adapter.py 11.76% 15 Missing ⚠️
src/agentunit/datasets/builtins.py 0.00% 12 Missing ⚠️
src/agentunit/metrics/builtin.py 0.00% 6 Missing ⚠️
src/agentunit/core/scenario.py 0.00% 2 Missing ⚠️
src/agentunit/production/integrations.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
src/agentunit/adapters/agentops_adapter.py (8)

73-86: Remove duplicate LangSmith code block.

Lines 73-86 contain a duplicate docstring and attribute assignments copied from the LangSmith adapter. The docstring incorrectly states "Initialize LangSmith adapter" and "Langsmith project ID", and the attribute assignments (api_key, project_id, default_tags, auto_start_session) redundantly repeat lines 66-69. This dead code reduces maintainability and creates confusion.

🔎 Proposed fix
-        """
-        Initialize LangSmith adapter.
-
-        Args:
-            api_key: LangSmith API key
-            project_id: Langsmith project ID
-            endpoint: Optional custom LangSmith endpoint
-            enable_tracing: Whether to enable automatic tracing
-            enable_feedback: Whether to collect feedback data
-        """
-        self.api_key = api_key
-        self.project_id = project_id
-        self.default_tags = default_tags or []
-        self.auto_start_session = auto_start_session
-
         # Initialize AgentOps client
         self._initialize_agentops()

380-388: Update docstring and log message to reference AgentOps instead of LangSmith.

The docstring at lines 380-386 states "Run a scenario with LangSmith integration" and the log message at line 389 references "LangSmith", but this is the AgentOps adapter. Update all references to reflect the correct integration.

🔎 Proposed fix
     async def run_scenario(self, scenario: Scenario) -> ScenarioResult:
         """
-        Run a scenario with LangSmith integration.
+        Run a scenario with AgentOps integration.
 
         Args:
             scenario: Scenario to execute
 
         Returns:
-            ScenarioResult: Execution results with LangSmith trace data
+            ScenarioResult: Execution results with AgentOps trace data
         """
-        logger.info(f"Running scenario with LangSmith: {scenario.name}")
+        logger.info(f"Running scenario with AgentOps: {scenario.name}")
 
-        # Start LangSmith run for the scenario
+        # Start AgentOps run for the scenario
         scenario_run_id = None

484-496: Update remaining LangSmith references to AgentOps.

Lines 484, 524, and associated code comments still reference "LangSmith run" when they should reference AgentOps traces. Update these comments for consistency.

🔎 Proposed fix
-            # Update LangSmith run with results
+            # Update AgentOps trace with results
             if scenario_run_id and self.enable_tracing:
                 try:
                     self.agentops.update_trace_metadata(

Apply similar changes to line 524.

Also applies to: 524-534


537-548: Update collect_metrics docstring to reference AgentOps.

The docstring at lines 539-547 states "Collect production metrics from LangSmith" but this is the AgentOps adapter. Update the docstring to reflect the correct integration platform.

🔎 Proposed fix
     def collect_metrics(self, scenario: Any, result: Any, **kwargs) -> ProductionMetrics:
         """
-        Collect production metrics from LangSmith.
+        Collect production metrics from AgentOps.
 
         Args:
             scenario: The scenario being evaluated

612-677: Update establish_baseline docstring and log messages to reference AgentOps.

The docstring at line 616 states "Establish baseline metrics from historical LangSmith data" and log messages reference "LangSmith baseline" and "LangSmith" errors. Update all references to reflect AgentOps integration.

🔎 Proposed fix
     def establish_baseline(
         self, historical_data: list[dict[str, Any]], metrics: list[str], **kwargs
     ) -> BaselineMetrics:
         """
-        Establish baseline metrics from historical LangSmith data.
+        Establish baseline metrics from historical AgentOps data.
 
         Args:
             historical_data: Historical data for baseline calculation

Apply similar updates to log messages at lines 642, 670, etc.


679-697: Update helper method docstring to reference AgentOps.

Line 680 docstring states "Extract metrics from LangSmith runs" but should reference AgentOps for consistency with the adapter's purpose.

🔎 Proposed fix
     def _extract_run_metrics(self, runs):
-        """Extract metrics from LangSmith runs."""
+        """Extract metrics from AgentOps runs."""
         durations = []

713-747: Update create_evaluation_dataset docstring and log messages to reference AgentOps.

The docstring at line 717 and log messages at lines 742 and 746 reference "LangSmith" when they should reference AgentOps.

🔎 Proposed fix
     def create_evaluation_dataset(
         self, name: str, examples: list[dict[str, Any]], description: str | None = None
     ) -> str:
         """
-        Create an evaluation dataset in LangSmith.
+        Create an evaluation dataset in AgentOps.
 
         Args:
             name: Dataset name

Apply similar updates to log messages.


749-776: Critical: run_evaluation uses LangSmith SDK instead of AgentOps SDK.

The run_evaluation method at lines 749-776 imports and calls langsmith.evaluation.evaluate, but in an AgentOps adapter where self.client is initialized as the AgentOps SDK (line 71: self.client = agentops). This will cause a runtime failure. Additionally, the current AgentOps Python SDK does not provide evaluation dataset or evaluation-run APIs (marked as "forthcoming" in their roadmap). Either remove this method or completely rewrite it using AgentOps APIs once they become available. Also fix the duplicate LangSmith docstring in __init__ (lines 73-82).

🧹 Nitpick comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

72-72: Consider making platform a property for consistency with other adapters.

Line 72 assigns self.platform directly as an instance attribute. However, other adapters (LangSmithAdapter and WandbAdapter) implement platform as a @property method returning MonitoringPlatform. For consistency across adapters, consider refactoring this to match the pattern:

🔎 Proposed refactor

Remove line 72 and add a property method after the __init__ method:

-        self.platform = MonitoringPlatform.AGENTOPS

Then add after the __init__ method (around line 98):

    @property
    def platform(self) -> MonitoringPlatform:
        """Return the monitoring platform type."""
        return MonitoringPlatform.AGENT_OPS

Note: Use MonitoringPlatform.AGENT_OPS (with underscore) to match the enum definition in src/agentunit/production/integrations.py.

Based on relevant code snippets showing the pattern used by LangSmithAdapter and WandbAdapter.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cec0c1d and e0c8efd.

📒 Files selected for processing (1)
  • src/agentunit/adapters/agentops_adapter.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/adapters/agentops_adapter.py (4)
src/agentunit/adapters/langsmith_adapter.py (1)
  • platform (104-106)
src/agentunit/adapters/wandb_adapter.py (1)
  • platform (117-119)
src/agentunit/production/integrations.py (1)
  • MonitoringPlatform (33-41)
src/agentunit/reporting/results.py (1)
  • ScenarioResult (32-66)
🔇 Additional comments (4)
src/agentunit/adapters/agentops_adapter.py (4)

245-245: LGTM! Default to "broadcast" addresses validation concern.

The fallback to_agent or "broadcast" correctly prevents the ValueError("to_agent is required") that would occur with an empty string, and aligns with the broadcast semantics used in other adapters.


348-348: LGTM! Explicit type annotation improves clarity.

Adding the explicit type hint dict[str, int] for agent_counts improves code clarity and helps with type checking.


70-71: LGTM! Good type hint additions.

Extracting enable_tracing from kwargs and adding the explicit type annotation self.client: Any improves code clarity. While Any is broad, it's reasonable for a dynamically-typed client library.


378-378: No changes needed - async signature is correct and required by MultiAgentAdapter abstract method.

The run_scenario method must be async because it implements the abstract method from MultiAgentAdapter (line 372 in src/agentunit/multiagent/__init__.py), which defines async def run_scenario. This is not a premature change—it is the required implementation signature for the base class. No internal callers were found because this is an abstract method implemented by subclasses; external consumers would already be designed to handle async execution when using these adapters.

Likely an incorrect or invalid review comment.

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls fix the issues

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/agentunit/adapters/agentops_adapter.py (4)

570-634: Remove unreachable code after raise NotImplementedError.

Lines 573-634 are unreachable after the raise statement. This dead code should be removed to improve maintainability. If retained for future reference, move it to a separate method or document it elsewhere.

🔧 Proposed fix
     def collect_metrics(self, scenario: Any, result: Any, **kwargs) -> ProductionMetrics:
         """
         Collect production metrics from AgentOps.

         Args:
             scenario: The scenario being evaluated
             result: The result of the scenario execution
             **kwargs: Additional metadata

         Returns:
             ProductionMetrics: Current production metrics
         """
-
         raise NotImplementedError(_LANGSMITH_UNSUPPORTED)
-
-        try:
-            # Query recent runs from AgentOps
-            runs = list(self.client.list_runs(project_name=self.project_id, limit=100))
-            ... (remove all lines through 634)

636-703: Remove unreachable code and update docstring.

  1. The docstring at line 640 still references "LangSmith" - update to "AgentOps"
  2. Lines 652-703 are unreachable after raise NotImplementedError and should be removed
🔧 Proposed fix for docstring
     def establish_baseline(
         self, historical_data: list[dict[str, Any]], metrics: list[str], **kwargs
     ) -> BaselineMetrics:
         """
-        Establish baseline metrics from historical LangSmith data.
+        Establish baseline metrics from historical AgentOps data.

753-776: Remove unreachable code.

Lines 756-776 are unreachable after raise NotImplementedError. Additionally, log messages at lines 771 and 775 reference "LangSmith" which is inconsistent with this being the AgentOps adapter.


790-807: Remove unreachable code.

Lines 793-807 are unreachable after raise NotImplementedError. This block also imports from langsmith.evaluation (line 794), which would fail if LangSmith isn't installed and is inconsistent with the AgentOps adapter purpose.

🔧 Proposed consolidated fix for all unreachable code

For collect_metrics, establish_baseline, create_evaluation_dataset, and run_evaluation, truncate each method after the raise NotImplementedError line:

     def collect_metrics(self, scenario: Any, result: Any, **kwargs) -> ProductionMetrics:
         """Collect production metrics from AgentOps."""
         raise NotImplementedError(_LANGSMITH_UNSUPPORTED)
-        # Remove all code after raise

     def establish_baseline(
         self, historical_data: list[dict[str, Any]], metrics: list[str], **kwargs
     ) -> BaselineMetrics:
         """Establish baseline metrics from historical AgentOps data."""
         raise NotImplementedError(_LANGSMITH_UNSUPPORTED)
-        # Remove all code after raise

     def create_evaluation_dataset(
         self, name: str, examples: list[dict[str, Any]], description: str | None = None
     ) -> str:
         """Create an evaluation dataset in AgentOps."""
         raise NotImplementedError(_LANGSMITH_UNSUPPORTED)
-        # Remove all code after raise

     def run_evaluation(self, dataset_id: str, evaluator_function: Any, **kwargs) -> Any:
         """Run evaluation on an AgentOps dataset."""
         raise NotImplementedError(_LANGSMITH_UNSUPPORTED)
-        # Remove all code after raise
🤖 Fix all issues with AI agents
In `@src/agentunit/adapters/agentops_adapter.py`:
- Around line 536-541: In the exception handler where you build trace_metadata
for AgentOps runs, the success flag is incorrectly set to True; update the
trace_metadata dictionary (the variable named trace_metadata that currently
includes "scenario_name": scenario.name, "metrics":
session_summary.get("metrics", {}), "success": True) to set "success": False so
failed scenarios correctly report failure; ensure this change is made in the
exception handling block that updates AgentOps run with error.
- Around line 75-88: Remove the dead commented LangSmith initialization block
that duplicates attribute handling; specifically delete the commented lines
referring to api_key, project_id, endpoint, enable_tracing, enable_feedback and
the assignments to self.api_key, self.project_id, self.default_tags, and
self.auto_start_session so only the active __init__ attribute assignments (the
existing code handling api_key/project_id/default_tags/auto_start_session around
the AgentOpsAdapter constructor) remain.
🧹 Nitpick comments (2)
src/agentunit/adapters/agentops_adapter.py (2)

386-389: Remove or clarify the misleading warning.

The warning states "LangSmith-style APIs which are not fully supported by AgentOps" but the code now correctly uses AgentOps APIs (start_trace(trace_name=..., tags=...), update_trace_metadata(...), end_trace(...)). This warning is outdated and may confuse maintainers.

🔧 Proposed fix
-        if self.enable_tracing:
-            logger.warning(
-                "run_scenario tracing uses LangSmith-style APIs which are not fully supported by AgentOps"
-            )
-
         logger.info(f"Running scenario with AgentOps: {scenario.name}")

553-554: Variable shadowing: inner exception shadows outer exception.

The exception variable e at line 553 shadows the outer e from line 510. While the scope is limited here, renaming to trace_err or similar improves clarity.

🔧 Proposed fix
-                except Exception as e:
-                    logger.warning(f"Failed to finalize failed scenario trace: {e}")
+                except Exception as trace_err:
+                    logger.warning(f"Failed to finalize failed scenario trace: {trace_err}")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/agentunit/adapters/agentops_adapter.py`:
- Around line 522-538: The error path assumes session_summary exists and builds
trace_metadata but neither is guarded or used; to fix, ensure session_summary is
safely initialized or defaulted before use (e.g., if session_summary is
undefined set session_summary = {} or use session_summary.get(...) with a safe
default), use the constructed trace_metadata when calling
self.agentops.update_trace_metadata (pass trace_metadata or merge it with
{"error": str(e)}), and keep the existing calls to
self.agentops.update_trace_metadata and self.agentops.end_trace inside the
trace_log and self.enable_tracing guard so agentops.update_trace_metadata,
agentops.end_trace, session_summary, trace_metadata, trace_log, and
enable_tracing are all referenced and handled safely.
🧹 Nitpick comments (2)
src/agentunit/adapters/agentops_adapter.py (2)

544-621: Unreachable LangSmith blocks after NotImplementedError.
These blocks will never run; consider removing them or gating behind a feature flag to avoid dead code and reduce maintenance surface.

Also applies to: 636-690


725-794: Evaluation methods still contain unreachable LangSmith code.
Same pattern as above — consider removing or isolating the dead code to avoid confusion.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

362-544: Remove async keyword—method is fully synchronous and called without await.

The method is declared async def but the body contains no awaits and calls only synchronous methods. When called at core/runner.py:51 without await, this returns a coroutine object instead of a ScenarioResult, causing a type mismatch and runtime failure. Either remove async to match the synchronous implementation, or refactor the body to be truly asynchronous and update call sites to await the result.

Note: Other adapters (LangSmith, Wandb) implement run_scenario as synchronous def, violating the abstract method's async signature. This inconsistency should also be resolved.

🧹 Nitpick comments (1)
src/agentunit/adapters/agentops_adapter.py (1)

559-560: Unreachable code after NotImplementedError.
The early raises make the remaining method bodies dead; consider removing or gating them with a TODO/feature flag to avoid confusion.

Also applies to: 638-639, 742-743, 779-780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing type hints to DatasetCase

3 participants