You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rewrite benchmark README: CI-first with detailed methodology
Restructured to lead with the GitHub Actions CI benchmark,
explaining why CI cold-start measurements matter more than
local benchmarks. Detailed explanation of the three-job
workflow design and why Java AOT wins in CI environments.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The most important benchmark runs on GitHub Actions because it measures performance in the environment where the generator actually executes — CI. The [Benchmark Generator](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml) workflow is manually triggered and runs across **Ubuntu**, **Windows**, and **macOS**.
10
+
11
+
### Why CI benchmarks matter
12
+
13
+
On a developer machine, repeated runs benefit from warm OS file caches — the operating system keeps recently read files in RAM, making subsequent reads nearly instant. This masks real-world performance differences. Python also benefits from `__pycache__/` bytecode that persists between runs.
14
+
15
+
In CI, **every workflow run starts on a fresh runner**. There is no `__pycache__/`, no warm OS cache, no JBang compilation cache. This is the environment where the deploy workflow runs, so these numbers reflect actual production performance.
16
+
17
+
### How the CI benchmark works
18
+
19
+
The workflow has three jobs:
20
+
21
+
1.**`benchmark`** — Runs Phase 1 (training/build costs) and Phase 2 (steady-state execution) on each OS. All tools are installed in the same job, so this measures raw execution speed after setup.
22
+
23
+
2.**`build-jar`** — Builds the fat JAR and AOT cache on each OS, then uploads them as workflow artifacts. This simulates what the `build-generator.yml` workflow does weekly: produce the JAR and AOT cache and store them in the GitHub Actions cache.
24
+
25
+
3.**`ci-cold-start`** — The key benchmark. Runs on a **completely fresh runner** that has never executed Java or Python in the current job. It downloads the JAR and AOT artifacts (simulating the `actions/cache/restore` step in the deploy workflow), then measures a single cold run of each method. This is the closest simulation of what happens when the deploy workflow runs:
26
+
-**Python** has no `__pycache__/` — it must interpret every `.py` file from scratch
27
+
-**Fat JAR** must load and link all classes on a cold JVM
28
+
-**Fat JAR + AOT** loads pre-linked classes from the `.aot` file, skipping class loading entirely
29
+
30
+
The `setup-java` and `setup-python` actions are required to provide the runtimes, but they don't warm up the generator code. The first invocation of `java` or `python3` in this job is the benchmark measurement itself.
31
+
32
+
### Why Java AOT wins in CI
33
+
34
+
Java's AOT cache (JEP 483) snapshots the result of class loading and linking from a training run into a `.aot` file. This file is platform-specific and ~21 MB. When restored from the actions cache, the JVM skips the expensive class discovery, verification, and linking steps that normally happen on first run.
35
+
36
+
Python's `__pycache__/` serves a similar purpose — it caches compiled bytecode so Python doesn't re-parse `.py` files. But `__pycache__/` is not committed to git or stored in CI caches, so **Python always pays full interpretation cost in CI**. Java AOT, by contrast, is stored in the actions cache and restored before each deploy.
37
+
38
+
## Local Benchmark
39
+
40
+
The local benchmark script runs all three phases on your development machine. Local results will differ from CI because of OS file caching and warm `__pycache__/`.
41
+
42
+
### Phase 1: Training / Build Cost (one-time)
6
43
7
44
These are one-time setup costs, comparable across languages.
8
45
@@ -12,7 +49,7 @@ These are one-time setup costs, comparable across languages.
12
49
| JBang export | 2.19s | Compiles source + bundles dependencies into fat JAR |
13
50
| AOT training run | 2.92s | Runs JAR once to record class loading, produces `.aot` cache |
14
51
15
-
## Phase 2: Steady-State Execution (avg of 5 runs)
52
+
###Phase 2: Steady-State Execution (avg of 5 runs)
16
53
17
54
After one-time setup, these are the per-run execution times.
18
55
@@ -23,11 +60,9 @@ After one-time setup, these are the per-run execution times.
23
60
|**JBang**| 1.08s | Includes JBang launcher overhead |
## Phase 3: CI Cold Start (fresh runner, no caches)
63
+
###Phase 3: CI Cold Start (simulated locally)
27
64
28
-
Simulates a CI environment where every run is the first run.
29
-
Python has no `__pycache__`, JBang has no compilation cache.
30
-
Java AOT benefits from the pre-built `.aot` file restored from actions cache.
65
+
Clears `__pycache__/` and JBang cache, then measures a single run. On a local machine the OS file cache still helps, so these numbers are faster than true CI.
31
66
32
67
| Method | Time | Notes |
33
68
|--------|------|-------|
@@ -36,14 +71,14 @@ Java AOT benefits from the pre-built `.aot` file restored from actions cache.
36
71
|**JBang**| 3.25s | Must compile source before running |
37
72
|**Python**| 0.16s | No `__pycache__`; full interpretation |
38
73
39
-
## How It Works
74
+
###How each method works
40
75
41
-
-**Python** caches compiled bytecode in `__pycache__/` after the first run, similar to how Java's AOT cache works.
42
-
-**Java AOT** (JEP 483) snapshots ~3,300 pre-loaded classes from a training run into a `.aot` file, eliminating class loading overhead on subsequent runs.
76
+
-**Python** caches compiled bytecode in `__pycache__/` after the first run, similar to how Java's AOT cache works. But this cache is local-only and not available in CI.
77
+
-**Java AOT** (JEP 483) snapshots ~3,300 pre-loaded classes from a training run into a `.aot` file, eliminating class loading overhead on subsequent runs. The `.aot` file is stored in the GitHub Actions cache.
43
78
-**JBang** compiles and caches internally but adds launcher overhead on every invocation.
44
79
-**Fat JAR** (`java -jar`) loads and links all classes from scratch each time.
./html-generators/benchmark/run.sh # print results to stdout
74
-
./html-generators/benchmark/run.sh --update # also update this file
109
+
./html-generators/benchmark/run.sh --update # also update local results in this file
75
110
```
76
-
77
-
### CI Benchmark
78
-
79
-
The [Benchmark Generator](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml) workflow runs cross-platform benchmarks (Ubuntu, Windows, macOS) on GitHub Actions. It includes a CI cold-start phase on a fresh runner to measure true first-run performance. Trigger it manually from the Actions tab.
0 commit comments