NeuraL-Coverage

Research Artifact of ICSE 2023 Paper: Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion

Preprint: https://arxiv.org/pdf/2112.01955.pdf

Implementations

This repo implements the NLC proposed in our paper and previous neuron coverage criteria (optimized if possible), including

Each criterion is implemented as one Python class in coverage.py.

[1] DeepXplore: Automated whitebox testing of deep learning systems, SOSP 2017.
[2] DeepGauge: Comprehensive and multi granularity testing criteria for gauging the robustness of deep learning systems, ASE 2018.
[3] Tensorfuzz: Debugging neural networks with coverage-guided fuzzing, ICML 2019.
[4] Guiding deep learning system testing using surprise adequacy, ICSE 2019.
[5] Reducing dnn labelling cost using surprise adequacy: An industrial case study for autonomous driving, FSE Industry Track 2020.

Installation

Build from source code

git clone https://github.com/Yuanyuan-Yuan/NeuraL-Coverage
cd NeuraL-Coverage
pip install -r requirements.txt

Model & Dataset

Pretrained models: please see MODEL.
Datasets: please see DATASET.

Download pretrained_models, datasets, and adversarial_examples folders here.

Getting Started

import torch
# Implemented using Pytorch

import tool
import coverage

# 0. Get layer size in model
input_size = (1, image_channel, image_size, image_size)
random_input = torch.randn(input_size).to(device)
layer_size_dict = tool.get_layer_output_sizes(model, random_input)

# 1. Initialization
# `hyper` denotes the hyper-paramter of a criterion;
# set `hyper` as None if a criterion is hyper-paramter free (e.g., NLC).
criterion = coverage.NLC(model, layer_size_dict, hyper=None)
# KMNC/NBC/SNAC/LSC/DSC/MDSC requires training data statistics of the tested model,
# which is implemented in `build`. `train_loader` can be a DataLoader object in Pytorch or a list of data samples.
# For other criteria, `build` function is empty.
criterion.build(train_loader)

# 2. Calculation
# `test_loader` stores all test inputs; it can be a DataLoader object in Pytorch or a list of data samples.
criterion.assess(test_loader)
# If test inputs are gradually given from a data stream (e.g., in fuzzing), then calculate the coverage as the following way.
for data in data_stream:
    criterion.step(data)

# 3. Result
# The following instruction assigns the current coverage value to `cov`.
cov = criterion.current

Experiments

After prepring all data and pretrained models, you should first set these paths in constants.py.

Diversity of Test Suites

Discriminative (Image) Model

python eval_diversity_image.py --model resnet50 --dataset CIFAR10 --criterion NC --hyper 0.75

--model - The tested DNN.
chocies = [resnet50, vgg16_bn, mobilenet_v2]
--dataset - Training dataset of the tested DNN. Test suites are generated using test split of this dataset.
choices = [CIFAR10, ImageNet]
--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Discriminative (Text) Model

python eval_diversity_text.py --criterion NC --hyper 0.75

--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Generative Model

Our tested generative model is BigGAN. We reuse the codebase of the official implementation and hardcode some parameters; see BigGAN-projects/CIFAR10 and BigGAN-projects/ImageNet.

Since we directly insert the BigGAN project path into system path, passing arguments to eval_diversity_gen.py in bash has conflicts with BigGAN projects. Therefore, we recommend first setting the following arguments in eval_diversity_gen.py and then run python eval_diversity_gen.py.

Of course, this should be implemented in a more elegant way...🫠 I will do it later.

--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Fault-Revealing Capability of Test Suites

python eval_fault_revealing.py --dataset CIFAR10 --model resnet50 --criterion NC --hyper 0.75 --AE PGD --split test

--AE - AE generation algorithm.
choices = [PGD, CW]
--split - Which split of the dataset to generate AEs.
choices = [train, test]

Guiding Input Mutation in DNN Testing

python fuzz.py --dataset CIFAR10 --model resnet50 --criterion NC

For random mutation (i.e., without any criterion as objective), run

python fuzz_rand.py --dataset CIFAR10 --model resnet50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeuraL-Coverage

Implementations

Installation

Model & Dataset

Getting Started

Experiments

Diversity of Test Suites

Discriminative (Image) Model

Discriminative (Text) Model

Generative Model

Fault-Revealing Capability of Test Suites

Guiding Input Mutation in DNN Testing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
BigGAN-projects		BigGAN-projects
adversarial_examples		adversarial_examples
datasets		datasets
models		models
pretrained_models		pretrained_models
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
coverage.py		coverage.py
data_loader.py		data_loader.py
eval_diversity_gen.py		eval_diversity_gen.py
eval_diversity_image.py		eval_diversity_image.py
eval_diversity_text.py		eval_diversity_text.py
eval_fault_revealing.py		eval_fault_revealing.py
fuzz.py		fuzz.py
fuzz_rand.py		fuzz_rand.py
image_transforms.py		image_transforms.py
requirements.txt		requirements.txt
style_operator.py		style_operator.py
tool.py		tool.py
utility.py		utility.py

WhiteSecurity/NeuraL-Coverage

Folders and files

Latest commit

History

Repository files navigation

NeuraL-Coverage

Implementations

Installation

Model & Dataset

Getting Started

Experiments

Diversity of Test Suites

Discriminative (Image) Model

Discriminative (Text) Model

Generative Model

Fault-Revealing Capability of Test Suites

Guiding Input Mutation in DNN Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages