Skip to content

Commit 96992d6

Browse files
brunoborgesCopilot
andcommitted
Rewrite benchmark README: CI-first with detailed methodology
Restructured to lead with the GitHub Actions CI benchmark, explaining why CI cold-start measurements matter more than local benchmarks. Detailed explanation of the three-job workflow design and why Java AOT wins in CI environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 253bcc7 commit 96992d6

File tree

1 file changed

+48
-17
lines changed

1 file changed

+48
-17
lines changed

html-generators/benchmark/README.md

Lines changed: 48 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,44 @@
22

33
Performance comparison of execution methods for the HTML generator, measured on 95 snippets across 10 categories.
44

5-
## Phase 1: Training / Build Cost (one-time)
5+
## CI Benchmark (GitHub Actions)
6+
7+
[![Benchmark Generator](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml/badge.svg)](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml)
8+
9+
The most important benchmark runs on GitHub Actions because it measures performance in the environment where the generator actually executes — CI. The [Benchmark Generator](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml) workflow is manually triggered and runs across **Ubuntu**, **Windows**, and **macOS**.
10+
11+
### Why CI benchmarks matter
12+
13+
On a developer machine, repeated runs benefit from warm OS file caches — the operating system keeps recently read files in RAM, making subsequent reads nearly instant. This masks real-world performance differences. Python also benefits from `__pycache__/` bytecode that persists between runs.
14+
15+
In CI, **every workflow run starts on a fresh runner**. There is no `__pycache__/`, no warm OS cache, no JBang compilation cache. This is the environment where the deploy workflow runs, so these numbers reflect actual production performance.
16+
17+
### How the CI benchmark works
18+
19+
The workflow has three jobs:
20+
21+
1. **`benchmark`** — Runs Phase 1 (training/build costs) and Phase 2 (steady-state execution) on each OS. All tools are installed in the same job, so this measures raw execution speed after setup.
22+
23+
2. **`build-jar`** — Builds the fat JAR and AOT cache on each OS, then uploads them as workflow artifacts. This simulates what the `build-generator.yml` workflow does weekly: produce the JAR and AOT cache and store them in the GitHub Actions cache.
24+
25+
3. **`ci-cold-start`** — The key benchmark. Runs on a **completely fresh runner** that has never executed Java or Python in the current job. It downloads the JAR and AOT artifacts (simulating the `actions/cache/restore` step in the deploy workflow), then measures a single cold run of each method. This is the closest simulation of what happens when the deploy workflow runs:
26+
- **Python** has no `__pycache__/` — it must interpret every `.py` file from scratch
27+
- **Fat JAR** must load and link all classes on a cold JVM
28+
- **Fat JAR + AOT** loads pre-linked classes from the `.aot` file, skipping class loading entirely
29+
30+
The `setup-java` and `setup-python` actions are required to provide the runtimes, but they don't warm up the generator code. The first invocation of `java` or `python3` in this job is the benchmark measurement itself.
31+
32+
### Why Java AOT wins in CI
33+
34+
Java's AOT cache (JEP 483) snapshots the result of class loading and linking from a training run into a `.aot` file. This file is platform-specific and ~21 MB. When restored from the actions cache, the JVM skips the expensive class discovery, verification, and linking steps that normally happen on first run.
35+
36+
Python's `__pycache__/` serves a similar purpose — it caches compiled bytecode so Python doesn't re-parse `.py` files. But `__pycache__/` is not committed to git or stored in CI caches, so **Python always pays full interpretation cost in CI**. Java AOT, by contrast, is stored in the actions cache and restored before each deploy.
37+
38+
## Local Benchmark
39+
40+
The local benchmark script runs all three phases on your development machine. Local results will differ from CI because of OS file caching and warm `__pycache__/`.
41+
42+
### Phase 1: Training / Build Cost (one-time)
643

744
These are one-time setup costs, comparable across languages.
845

@@ -12,7 +49,7 @@ These are one-time setup costs, comparable across languages.
1249
| JBang export | 2.19s | Compiles source + bundles dependencies into fat JAR |
1350
| AOT training run | 2.92s | Runs JAR once to record class loading, produces `.aot` cache |
1451

15-
## Phase 2: Steady-State Execution (avg of 5 runs)
52+
### Phase 2: Steady-State Execution (avg of 5 runs)
1653

1754
After one-time setup, these are the per-run execution times.
1855

@@ -23,11 +60,9 @@ After one-time setup, these are the per-run execution times.
2360
| **JBang** | 1.08s | Includes JBang launcher overhead |
2461
| **Python** | 1.26s | Uses cached `__pycache__` bytecode |
2562

26-
## Phase 3: CI Cold Start (fresh runner, no caches)
63+
### Phase 3: CI Cold Start (simulated locally)
2764

28-
Simulates a CI environment where every run is the first run.
29-
Python has no `__pycache__`, JBang has no compilation cache.
30-
Java AOT benefits from the pre-built `.aot` file restored from actions cache.
65+
Clears `__pycache__/` and JBang cache, then measures a single run. On a local machine the OS file cache still helps, so these numbers are faster than true CI.
3166

3267
| Method | Time | Notes |
3368
|--------|------|-------|
@@ -36,14 +71,14 @@ Java AOT benefits from the pre-built `.aot` file restored from actions cache.
3671
| **JBang** | 3.25s | Must compile source before running |
3772
| **Python** | 0.16s | No `__pycache__`; full interpretation |
3873

39-
## How It Works
74+
### How each method works
4075

41-
- **Python** caches compiled bytecode in `__pycache__/` after the first run, similar to how Java's AOT cache works.
42-
- **Java AOT** (JEP 483) snapshots ~3,300 pre-loaded classes from a training run into a `.aot` file, eliminating class loading overhead on subsequent runs.
76+
- **Python** caches compiled bytecode in `__pycache__/` after the first run, similar to how Java's AOT cache works. But this cache is local-only and not available in CI.
77+
- **Java AOT** (JEP 483) snapshots ~3,300 pre-loaded classes from a training run into a `.aot` file, eliminating class loading overhead on subsequent runs. The `.aot` file is stored in the GitHub Actions cache.
4378
- **JBang** compiles and caches internally but adds launcher overhead on every invocation.
4479
- **Fat JAR** (`java -jar`) loads and links all classes from scratch each time.
4580

46-
## AOT Cache Setup
81+
### AOT Cache Setup
4782

4883
```bash
4984
# One-time: build the fat JAR
@@ -56,7 +91,7 @@ java -XX:AOTCacheOutput=html-generators/generate.aot -jar html-generators/genera
5691
java -XX:AOTCache=html-generators/generate.aot -jar html-generators/generate.jar
5792
```
5893

59-
## Environment
94+
### Environment
6095

6196
| | |
6297
|---|---|
@@ -67,13 +102,9 @@ java -XX:AOTCache=html-generators/generate.aot -jar html-generators/generate.jar
67102
| **Python** | 3.14.3 |
68103
| **OS** | Darwin |
69104

70-
## Reproduce
105+
### Reproduce
71106

72107
```bash
73108
./html-generators/benchmark/run.sh # print results to stdout
74-
./html-generators/benchmark/run.sh --update # also update this file
109+
./html-generators/benchmark/run.sh --update # also update local results in this file
75110
```
76-
77-
### CI Benchmark
78-
79-
The [Benchmark Generator](https://github.com/javaevolved/javaevolved.github.io/actions/workflows/benchmark.yml) workflow runs cross-platform benchmarks (Ubuntu, Windows, macOS) on GitHub Actions. It includes a CI cold-start phase on a fresh runner to measure true first-run performance. Trigger it manually from the Actions tab.

0 commit comments

Comments
 (0)