feat(benchmark): add Hermes OpenViking LoCoMo scripts by ehz0ah · Pull Request #1985 · volcengine/OpenViking

ehz0ah · 2026-05-12T07:45:52Z

Summary

This adds a Hermes-backed LoCoMo benchmark runner under benchmark/locomo/hermes for comparing three memory paths:

Hermes native memory baseline
Hermes-to-OpenViking E2E ingestion
OpenViking pre-ingest queried through Hermes

The runner wires import, QA evaluation, judging, and final statistics into one repeatable flow while keeping the LoCoMo dataset and generated benchmark artifacts outside the PR.

Changes

Add run_full_eval.sh to orchestrate suite selection, result directories, retries, optional OpenViking checkpoints, and E2E target/archive readiness checks.
Add importers for native Hermes memory, Hermes/OpenViking E2E session ingestion, and direct OpenViking pre-ingest, using flattened LoCoMo session transcripts with timestamp and visual metadata.
Add shared QA, judge, and stats helpers with retry handling, tool-call accounting, Hermes state.db token/cache summaries, and OpenViking observer token deltas.

Notes

The LoCoMo dataset is intentionally not included. Use LOCOMO_JSON=/path/to/locomo10.json or place a local copy at the documented path.
Benchmark tests and generated result/checkpoint directories are intentionally not included.
state.db is used when available for authoritative Hermes token/cache accounting because gateway CSV token fields can be lossy.

Validation

uv run ruff format --check benchmark/locomo/hermes/*.py
uv run ruff check benchmark/locomo/hermes/*.py
bash -n benchmark/locomo/hermes/run_full_eval.sh
uv run python -m py_compile benchmark/locomo/hermes/*.py
./benchmark/locomo/hermes/run_full_eval.sh --help
Python script --help paths load for import, eval, judge, and stats helpers

github-actions · 2026-05-12T07:48:02Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Add LoCoMo import scripts Relevant files: benchmark/locomo/hermes/import_e2e.py benchmark/locomo/hermes/import_to_ov.py benchmark/locomo/hermes/import_to_native.py Sub-PR theme: Add LoCoMo eval and judge scripts Relevant files: benchmark/locomo/hermes/eval.py benchmark/locomo/hermes/judge.py Sub-PR theme: Add LoCoMo stats and runner script Relevant files: benchmark/locomo/hermes/stat_judge_result.py benchmark/locomo/hermes/run_full_eval.sh
⚡ Recommended focus areas for review Error Handling Broad except Exception clauses without logging could hide real issues. Add logging (print is acceptable for benchmark scripts) or narrow the exception types. except Exception: return None Error Handling Broad except Exception clauses without logging could hide real issues. Add logging (print is acceptable for benchmark scripts) or narrow the exception types. except Exception: continue Error Handling Broad except Exception clauses without logging could hide real issues. Add logging (print is acceptable for benchmark scripts) or narrow the exception types. return None Error Handling Broad except Exception clauses without logging could hide real issues. Add logging (print is acceptable for benchmark scripts) or narrow the exception types. resp = await client.get(f"{openviking_url}/api/v1/observer/models") if resp.status_code != 200: Error Handling Broad except Exception clauses without logging could hide real issues. Add logging (print is acceptable for benchmark scripts) or narrow the exception types. except Exception: return None

github-actions · 2026-05-12T07:50:21Z

PR Code Suggestions ✨

No code suggestions found for the PR.

github-project-automation Bot added this to OpenViking project May 12, 2026

github-project-automation Bot moved this to Backlog in OpenViking project May 12, 2026

github-actions Bot added the Review effort 3/5 label May 12, 2026

feat(benchmark): add Hermes OpenViking LoCoMo scripts

99abd11

ehz0ah force-pushed the feat/hermes-openviking-benchmark branch from 172c5fa to 99abd11 Compare May 12, 2026 07:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmark): add Hermes OpenViking LoCoMo scripts#1985

feat(benchmark): add Hermes OpenViking LoCoMo scripts#1985
ehz0ah wants to merge 1 commit into
volcengine:mainfrom
ehz0ah:feat/hermes-openviking-benchmark

ehz0ah commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ehz0ah commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Validation

Uh oh!

github-actions Bot commented May 12, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented May 12, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ehz0ah commented May 12, 2026 •

edited

Loading