Skip to content

amazon-science/MigrationBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

MigrationBench

MigrationBench (Hugging Face) MigrationBench (GitHub) JavaMigration (GitHub) MigrationBench (arXiv) java-full java-selected java-utg

1. πŸ“– Overview

MigrationBench provides an automated and robust framework for evaluating code migration success.

1.1 MigrationBench: Dataset and Evaluation Framework

The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:

  1. πŸ€— MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.
    • Current and initial release includes java 8 repositories with the maven build system, as of May 2025.
  2. MigrationBench (current Github package) is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support (LTS) versions.

The evaluation is an approximation for functional equivalence by checking the following:

  1. The repo is able to build and pass all tests
  2. Compiled classes' major versions are consistent with the target java version
    • 52 and 61 for java 8 and 17 respectively
  3. Test methods are invariant after code migration
  4. Number of test cases is non-decreasing after code migration
  5. The repos' dependency libraries match their latest major versions
    • Optional for minimal migration by definition, while
    • Required for maximal migration

1.2 JavaMigration: Migration with LLMs

JavaMigration is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation.

There are three datasets in πŸ€— MigrationBench:

  • All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.
Index Dataset Size Notes
1 πŸ€— AmazonScience/migration-bench-java-full 5,102 Each repo has a test directory or at least one test case
2 πŸ€— AmazonScience/migration-bench-java-selected 300 A subset of πŸ€— migration-bench-java-full
3 πŸ€— AmazonScience/migration-bench-java-utg 4,814 The unit test generation (utg) dataset, disjoint with πŸ€— migration-bench-java-full

3. Code Migration Evaluation

MigrationBench supports two evaluation modes:

  1. Docker Mode (Recommended): Runs evaluations in isolated Docker containers. No need to install Java or Maven locally.
  2. Local Mode: Runs evaluations directly on your machine. Requires Java 17 and Maven 3.9.6 installed locally.

3.1 Docker Mode (Recommended)

Docker mode provides a consistent evaluation environment without requiring local Java/Maven installation. Each evaluation runs in an isolated container, making it ideal for batch processing and reproducible results.

Benefits:

  • βœ… No local Java/Maven installation needed
  • βœ… Consistent environment across different machines
  • βœ… Parallel execution (multiple containers)
  • βœ… Easy setup and onboarding

3.1.1 Setup

1. Install Docker:

Follow the official Docker installation guide:

2. Verify Docker:

docker --version

3. Install MigrationBench:

git clone https://github.com/amazon-science/MigrationBench.git
cd MigrationBench

pip install -r requirements.txt -e .

That's it! The Docker image will be built automatically on first run.

3.1.2 Single Repository Evaluation

To run a single repository evaluation in Docker:

GITHUB_URL=https://github.com/0xShamil/java-xid
GIT_DIFF_FILE=/path/to/java-xid.diff

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker

Evaluate with a migrated repository directory:

MIGRATED_DIR=/path/to/migrated/repo
python run_eval.py --github_url $GITHUB_URL --migrated_root_dir $MIGRATED_DIR --use_docker

Force rebuild the Docker image:

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker --build_docker_image

3.1.3 Batch Evaluation

To run batch evaluations in Docker:

PREDICTIONS=predictions.json

# Run batch evaluation in Docker
python run_eval.py --predictions_filename $PREDICTIONS --use_docker

With parallel processing (recommended for large batches):

# Run 8 Docker containers in parallel (each evaluates one repo)
python run_eval.py --predictions_filename $PREDICTIONS --use_docker --max_workers 8

Important Notes:

  • File paths in the predictions file should be absolute paths on your host system
  • The Docker container will automatically mount these paths as read-only volumes
  • Each repository is evaluated in its own isolated container
  • Docker mode automatically handles environment isolation and cleanup

3.2 Local Mode

Local mode runs evaluations directly on your machine. This requires Java 17 and Maven 3.9.6 installed locally.

3.2.1 Install Java and Maven

Install Java 17:

  • macOS: brew install openjdk@17
  • Ubuntu/Debian: sudo apt-get install openjdk-17-jdk
  • Windows: Download from https://adoptium.net/

Install Maven 3.9.6:

Verify installations:

java -version  # Should show version 17
mvn -version   # Should show version 3.9.6

3.2.2 Install MigrationBench

git clone https://github.com/amazon-science/MigrationBench.git
cd MigrationBench

pip install -r requirements.txt -e .

3.2.3 Single Repository Evaluation

GITHUB_URL=https://github.com/0xShamil/java-xid
GIT_DIFF_FILE=/path/to/java-xid.diff

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE

3.2.4 Batch Evaluation

PREDICTIONS=predictions.json

# Sequential processing
python run_eval.py --predictions_filename $PREDICTIONS

# Parallel processing (8 workers)
python run_eval.py --predictions_filename $PREDICTIONS --max_workers 8

3.3 Predictions File Format

For batch evaluation (both Docker and Local modes), provide a predictions file in JSON format.

For each repository, specify the GitHub URL and one of the following:

  1. git_diff_file: Path to a file containing the git diff
  2. git_diff: Git diff content as a string (can be empty for no changes)
  3. migrated_root_dir: Absolute path to the migrated repository directory

Example predictions.json:

[
  {
    "github_url": "https://github.com/0xShamil/java-xid",
    "git_diff_file": "/absolute/path/to/java-xid.diff"
  },
  {
    "github_url": "https://github.com/0xShamil/java-xid",
    "git_diff": "diff --git a/pom.xml b/pom.xml\n--- a/pom.xml\n+++ b/pom.xml\n..."
  },
  {
    "github_url": "https://github.com/0xShamil/java-xid",
    "migrated_root_dir": "/absolute/path/to/migrated/java-xid"
  }
]

4. πŸ“š Citation

@misc{liu2025migrationbenchrepositorylevelcodemigration,
      title={MigrationBench: Repository-Level Code Migration Benchmark from Java 8},
      author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras},
      year={2025},
      eprint={2505.09569},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2505.09569},
}

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors