Benchmark Regression Test Suite

## Problem
`imitation`'s testing is currently limited to static analysis (type checking, linting, etc) and unit testing. There are no automated, end-to-end tests of algorithm training performance. This is problematic as small implementation details in reward and imitation learning can have big impacts on performance.

## Solution
We do, however, already have tuned hyperparameters and some initial results in https://github.com/HumanCompatibleAI/imitation/tree/master/benchmarking thanks to @taufeeque9 (PR with the code used for hyperparameter tuning should be forthcoming soon as well). This suggests that we could simply have a test suite that trains all the algorithms using these existing configs and records performance. If the performance drops by more than some threshold, then a warning or error could be issued.

Training end-to-end is far too slow to do on every commit, but it's something we could afford to do before each (significant) release or before merging PRs that we're worried might cause regressions.

There are some tools already that are designed to track metrics over time like [airspeed velocity (asv)](https://github.com/airspeed-velocity/asv). Integrating it with one of these might make sense, but I don't yet know how well those features line up.

## Possible alternative solutions

We could also just add a handful of tests marked as "expensive" to pytest that are skipped by default and can be run on-demand, that do end-to-end training and assert reward is above some threshold. This might give us a non-trivial amount of the benefit but with much less work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Regression Test Suite #647

Problem

Solution

Possible alternative solutions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benchmark Regression Test Suite #647

Description

Problem

Solution

Possible alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions