A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary Inputs

Purpose

Requested badge: Functional

This repository contains the code for the ICSE'25 paper "A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary Inputs". As described below, the repository contains the code for replicating the experimental analysis performed in the paper, including generating all figures and tables. It also contains information on how to repeat the full experiment using user-provided datasets and systems.

Provenance

This repository is available on GitHub and archived on Software Heritage. A preprint of the paper is available in the repository.

Data

The main experiment consists of providing video input to 5 different AV systems and recording their steering angles in response to this video. These responses are then analyzed using the differential testing approach proposed in the paper and implemented in /3_Process/OutlierDetection.py.

⭐ Replicating the figures and data from the experiment

The usage information in the setup and reproduction sections below describe how to use the provided scripts to reproduce the data from the paper.

The steering angle output of the 5 AV systems are available in /3_Process/cache/*.

The input videos used in the experiment cannot be directly included in this repository due to licensing limitations. See the datasets readme for more information.

Replicating the full pipeline

To replicate the full experiment:

First install the 5 SUTs following the process described in 0_Setup.
Then, obtain the datasets used in the experiment as explained in 1_Datasets; note: due to licensing limitations these cannot be directly included and must be obtained from their original sources.
Finally, these datasets must be preprocessed into a common format as explained in 2_TransformVideos.
Once the videos have been processed, follow the instructions for each of the different SUTs in 0_Setup to run each version of OpenPilot on the different videos.
Follow the instructions for replicating the figures and data to utilize the scripts in 3_Process to generate the figures.

Replicating the full pipeline for user-supplied data

To replicate the pipeline on user-supplied videos, repeat the process above, but replace step 2 with adding user-supplied videos. These videos will still need to be preprocessed as described in step 3.

To replicate the pipeline for other SUTs, the user must extract the steering angle readings from the SUT based on the video. The steering angles can then be processed directly by 3_Process/OutlierDetection.py to identify failures as described in the paper.

Setup

Running in Docker

A Dockerfile is provided for convenience in replication of the figures and results based on the cached data provided. First, build the Docker image as:

docker build -t difftest .

Running Locally

If running locally (outside of Docker), first set up the Python environment. With conda installed, run the following:

source create_env.sh

This will create the difftest conda environment and install all relevant dependencies.

Usage

We first describe the structure of the repository and then describe how to utilize the scripts to reproduce the experimental analysis from the paper.

Repository Structure

Folder Structure:

0_Setup - Information on setting up and running the SUTs used in the experiment
1_Datasets - Placeholder for datasets - ommitted for licensing; see the datasets readme.
2_TransformVideos - Scripts to normalize data in 1_Datasets
⭐ 3_Process - Scripts to execute the experiment
- 📋 cache - Raw performance data from the SUTs evaluated on all videos.
- 🧰 🌟 OutlierDetection.py - Code to perform the statistical analysis of DiffTest4AV. This implementation uses the Dixon's Q test for outlier detection (dixon).

Reproducing the results in the paper

The following was tested on a fresh install of Ubuntu 22.04 using miniconda

Reproducing figures and results through Docker

docker build -t difftest .  # if not run during setup above
docker run -it --rm -v "$(pwd)/:/difftest" difftest /bin/bash
source generate_figures.sh

Reproducing figures and results locally

With conda installed, run the following:

source create_env.sh  # if not run during setup above
source generate_figures.sh

Expected Results

This will launch all of the scripts in succession to compute all of the figures and tables used in the paper. The scripts are heavily parallelized and will run for ~20 minutes on a machine with 32 cores; runtimes will vary based on available hardware.

All figures will be saved in 3_Process/gen_figures/. A version of these figures has been bundled with this repository; running the script will overwrite the included files. All png files generated should be an exact binary match with the original files bundled with the repository; however, the pdf version of the images may differ in the file binary due to system variations - the image itself is the same.

The following table describes how to find the figures used in the paper. NOTE: all referenced frames from the paper, e.g. Figures 1, 3, 4, 5, 6, and 9 will appear as a blank image with steering angles only since the videos are not included.

Paper Figure	Generated file
Fig 1.	image link
Fig 3.	image link
Table 1	table link
Fig 4.	image link
Table 2	table link
Fig 5.	image link
Fig 6.	image link
Fig 7.	image link
Fig 8a.	image link
Fig 8b.	image link
Fig 8c.	image link
Table V	table link
Fig 9a.	image link
Fig 9b.	image link
Fig 9c.	image link
Fig 9d.	image link
Fig 9e.	image link
Table VI	table link

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
0_Setup		0_Setup
1_Datasets		1_Datasets
2_TransformVideos		2_TransformVideos
3_Process		3_Process
Common		Common
images		images
DiffTest4AV_Preprint.pdf		DiffTest4AV_Preprint.pdf
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
conda_details.txt		conda_details.txt
conda_details_docker.txt		conda_details_docker.txt
create_env.sh		create_env.sh
generate_figures.sh		generate_figures.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary Inputs

Purpose

Provenance

Data

⭐ Replicating the figures and data from the experiment

Replicating the full pipeline

Replicating the full pipeline for user-supplied data

Setup

Running in Docker

Running Locally

Usage

Repository Structure

Reproducing the results in the paper

Reproducing figures and results through Docker

Reproducing figures and results locally

Expected Results

About

Uh oh!

Releases

Packages

Languages

License

less-lab-uva/DiffTest4AV

Folders and files

Latest commit

History

Repository files navigation

A Differential Testing Framework to Identify Critical AV Failures Leveraging Arbitrary Inputs

Purpose

Provenance

Data

⭐ Replicating the figures and data from the experiment

Replicating the full pipeline

Replicating the full pipeline for user-supplied data

Setup

Running in Docker

Running Locally

Usage

Repository Structure

Reproducing the results in the paper

Reproducing figures and results through Docker

Reproducing figures and results locally

Expected Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages