Add output validation to benchmarks for regression testing (Issue #267) by Copilot · Pull Request #295 · spcl/serverless-benchmarks

Copilot · 2026-04-15T15:56:12Z

Regression tests currently only verify that benchmark functions execute without errors — they don't validate correctness of the output. This adds a validate_output hook to each benchmark that the regression framework calls after each successful invocation.

Interface

Added optional validate_output(input_config, output) -> bool to BenchmarkModuleInterface (defaults to True for backwards compatibility)
Added Benchmark.validate_output() method that delegates to the benchmark's input.py if the function is defined
regression.py now calls benchmark.validate_output(input_config, ret.output.get("result", {})) after each successful trigger invocation; failures are logged and mark the test as failed

The output passed to validators is the function handler's full return value (containing result and measurement keys).

Per-benchmark validators added to `input.py`

Benchmark	Validation logic
`010.sleep`	`result == input['sleep']` (exact match)
`110.dynamic-html`	Non-empty HTML string containing the username
`120.uploader`	Non-empty storage key, URL echoed back correctly
`130.crud-api`	GET returns expected fields; PUT returns `{}`
`210.thumbnailer`, `220.video-processing`, `504.dna-visualisation`	Non-empty storage key in result
`311.compression`	Result key ends with `.zip`
`411.image-recognition`	Non-empty class label string and non-negative integer index
`501.graph-pagerank`	Float in `[0.0, 1.0]`
`502.graph-mst`	Result is not `None`
`503.graph-bfs`	Non-empty list/tuple

Example

# benchmarks/000.microbenchmarks/010.sleep/input.py
def validate_output(input_config: dict, output: dict) -> bool:
    return output.get('result') == input_config.get('sleep')

…esting Agent-Logs-Url: https://github.com/spcl/serverless-benchmarks/sessions/5e147d20-e219-4d68-82c9-95b122fe8d05 Co-authored-by: mcopik <720619+mcopik@users.noreply.github.com>

Agent-Logs-Url: https://github.com/spcl/serverless-benchmarks/sessions/5e147d20-e219-4d68-82c9-95b122fe8d05 Co-authored-by: mcopik <720619+mcopik@users.noreply.github.com>

Copilot AI and others added 2 commits April 15, 2026 12:24

Add validate_output to all benchmarks and integrate into regression t…

276c436

…esting Agent-Logs-Url: https://github.com/spcl/serverless-benchmarks/sessions/5e147d20-e219-4d68-82c9-95b122fe8d05 Co-authored-by: mcopik <720619+mcopik@users.noreply.github.com>

Improve PUT /cart validation to check for empty dict response

b061bee

Agent-Logs-Url: https://github.com/spcl/serverless-benchmarks/sessions/5e147d20-e219-4d68-82c9-95b122fe8d05 Co-authored-by: mcopik <720619+mcopik@users.noreply.github.com>

Copilot AI assigned Copilot and mcopik Apr 15, 2026

Copilot created this pull request from a session on behalf of mcopik April 15, 2026 15:56 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add output validation to benchmarks for regression testing (Issue #267)#295

Add output validation to benchmarks for regression testing (Issue #267)#295
Copilot wants to merge 2 commits intomasterfrom
copilot/issue-267-benchmark-validation

Copilot AI commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 15, 2026

Interface

Per-benchmark validators added to input.py

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Per-benchmark validators added to `input.py`