Skip to content

Conversation

@juanmichelini
Copy link
Collaborator

Problem

The swebench-multimodal evaluation was only creating report.json, while other benchmarks (swebench, commit0, gaia, etc.) create output.report.json. This inconsistency caused the push-to-index workflow to fail when trying to find output.report.json.

Evidence: GitHub Actions run 21077636459 failed with:

[push-to-index] Failed to find report.json: No output.report.json found in /tmp/tmpyebg1v7b/extracted/eval_outputs

Solution

After the multimodal evaluation completes, copy report.json to output.report.json to match the behavior of other benchmarks.

This follows the same pattern used in:

  • benchmarks/swebench/eval_infer.py (lines 258-267)
  • Other benchmark evaluation scripts

Testing

The fix ensures that the archive structure matches what the push-to-index script expects:

  • ✅ Creates output.report.json alongside output.jsonl
  • ✅ Maintains backward compatibility (original report.json still exists)
  • ✅ Follows established patterns from other benchmarks

…ith other benchmarks

The swebench-multimodal evaluation was only creating report.json, while other
benchmarks (swebench, commit0, etc.) create output.report.json. This caused
the push-to-index workflow to fail when trying to find output.report.json.

This change copies report.json to output.report.json after evaluation completes,
matching the behavior of other benchmarks.

Fixes the issue where GitHub Actions run 21077636459 failed to create a PR in
openhands-index-results because the script couldn't find output.report.json.
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #331 at branch `fix-swebench-multimodal-report-filename`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@juanmichelini
Copy link
Collaborator Author

@OpenHands please fix the failing actions on PR #331 at branch fix-swebench-multimodal-report-filename

@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

The shutil module was used but not imported, causing pre-commit checks to fail.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

Summary

I fixed the failing pre-commit check on PR #331 by adding the missing shutil import.

Issue

The pre-commit check was failing because the code used shutil.copy() on line 288 but the shutil module was never imported.

Fix

Added import shutil to the imports section of benchmarks/swebenchmultimodal/eval_infer.py.

Changes Made

  • Added import shutil to the imports (line 14)
  • Minor formatting adjustments applied by ruff formatter (trailing whitespace and line wrapping)

Verification

  • ruff check passes
  • ruff format passes
  • pyright type checking passes
  • ✅ Changes pushed to fix-swebench-multimodal-report-filename branch

The PR should now pass the pre-commit checks when CI runs again.

View full conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants