cicd: move lint to ubuntu-latest with caching, tune pytest#924
Draft
thomashebrard wants to merge 1 commit into
Draft
cicd: move lint to ubuntu-latest with caching, tune pytest#924thomashebrard wants to merge 1 commit into
thomashebrard wants to merge 1 commit into
Conversation
Lint workflow runs on GitHub-hosted ubuntu-latest (free for public repo) instead of self-hosted Azure runners. ruff/pyright/mypy are single-threaded so the 2-core ubuntu-latest matches what the Azure D8 lint runner was giving anyway, and the install + mypy caches make this 3× faster. Changes: * .github/workflows/lint-check.yml — all jobs to runs-on: ubuntu-latest. Adds astral-sh/setup-uv@v3 with uv-cache enabled (keyed on uv.lock) so the previously ~2-minute install drops to ~8s on warm cache. Adds actions/cache@v4 for .mypy_cache, keyed on uv.lock + pyproject.toml + matrix python-version, with three-tier restore-keys for graceful fallback. Expected: mypy 2 min → ~25-30s on incremental runs. * .github/workflows/lint-fresh-check.yml — new workflow. Runs the full typecheck matrix on push to dev/main and weekly cron, with NO mypy cache restore. This is the safety net for the cached PR checks: if mypy's incremental mode ever drifts from a fresh run, this catches it within minutes of merge instead of letting bad state accumulate. * Makefile gha-tests target — tightens the pytest invocation: --dist=worksteal (newer xdist load balancing for uneven test durations, 5-15% wall-time win), --tb=line (less stdout I/O on failures across workers), -p no:cacheprovider (no .pytest_cache writes since each runner is ephemeral), --no-header (cosmetic). No behavioral change to which tests run or pass. Tests workflow itself (tests-check.yml) is unchanged — tests stay on self-hosted Azure runners where multi-core pytest-xdist parallelism matters. Azure-side improvements (Consumption profile + Azure Files cache mount for uv) ship in a separate infra repo change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
CI is currently ~€2000/month and 10–13 min per PR. The lint workflow runs on self-hosted Azure runners which are paid-by-the-minute even though
ruff/pyright/mypyare single-threaded and don't benefit from the multi-core machines. The mypy step alone is ~2 min per matrix entry because the.mypy_cacheis thrown away every run.Public OSS projects (NumPy, pandas, Pydantic, FastAPI, mypy itself) all do this: lint on
ubuntu-latestwith.mypy_cachepersisted viaactions/cache, tests where the cores actually help. We're not doing anything novel here, just applying the standard pattern.What changed
lint-check.yml— all jobs toubuntu-latest, with cachingruns-on: [aca-runner-lint]→runs-on: ubuntu-latestfor all three lint jobs (lint-ruff-plxt,lint-typecheck,lint-config-sync)astral-sh/setup-uv@v3withenable-cache: trueandcache-dependency-glob: uv.lock. uv's~/.cache/uvis restored from prior runs, keyed onuv.lock. Install: ~2 min → ~8s on warm cache.actions/cache@v4for.mypy_cachein the typecheck job. Key includes matrix python-version +uv.lock+pyproject.toml. Three-tierrestore-keysfor graceful fallback when those change. mypy: ~2 min → ~25–30s on incremental runs.lint-fresh-check.yml— new safety-net workflowmypy's incremental mode is designed to give the same result as a fresh run, but has historically had soundness bugs around plugin state, complex generics, and stub-version drift. To address that risk:
pushtodev/mainand a weekly cron (Monday 04:00 UTC)Pattern: cache on PR (fast, probably right), fresh on main (slow, definitely right). Standard for serious OSS Python.
Makefile— pytest tuning ingha-teststargetPure performance tuning, no behavioral change to which tests run or pass:
--dist=worksteal— better xdist load balancing for uneven test durations--tb=line— tighter failure output, less stdout I/O across workers-p no:cacheprovider— skip.pytest_cachewrites (each runner is ephemeral)--no-header— cosmetic, skip bannerExpected: 5–15% pytest wall-time reduction depending on test duration variance.
What didn't change (and why)
tests-check.ymlkeepsruns-on: [aca-runner-test]. pytest with-n autois genuinely CPU-bound and benefits from 8 cores — that's the one place self-hosting pays for itself. Azure-side cost cuts ship as a separate infra change (ACA Consumption profile + Azure Files cache mount; ~€1200/mo → ~€150/mo).Boot testsmoke step kept intests-check.yml. Adds ~27s but catches install-broken cases fast.Expected impact
Not in scope (future work)
ci-testextra inpyproject.toml—.venvis currently 1.4 GB becausemake installdoesuv sync --all-extras, which pulls in torch (355 MB), opencv (119 MB), transformers (54 MB) for thedoclingextra. None of that is exercised by tests excluded from PR (theextract/img_gen/search/pipelex_apimarkers). A targetedci-testextra could cut install ~3×. Needs careful audit of which extras have imports that load on PR — separate PR.pytest-splitsharding — would get tests to ~2m45s wall time. Requirespytest-splitdep + committed.test_durationsfile + 5×4 matrix expansion. Separate PR if/when wanted.Pipelex.make()fixture scope — the autouse module-scopedreset_pipelex_config_fixtureruns per test module. IfPipelex.make()is expensive (worth profiling), promoting to session scope where safe could shave wall time.How to verify
lint-fresh-check.ymlfires and runs the full fresh matrix in ~5 min — that's the bulletproof check.🤖 Generated with Claude Code
Summary by cubic
Move lint to GitHub-hosted
ubuntu-latestwith caching and add a fresh typecheck safety net; also tunepytestflags. This cuts lint time ~3× and drops Azure lint cost to €0, with tests unchanged.Refactors
ubuntu-latestwithastral-sh/setup-uv@v3cache (keyed onuv.lock) andactions/cache@v4for.mypy_cache.mypy~25–30s incremental; lint ~1m50s per matrix entry.Makefile:pytesttuned with--dist=worksteal,--tb=line,-p no:cacheprovider,--no-header.New Features
lint-fresh-check.yml: runs on push todev/mainand weekly cron..mypy_cacheto verify incremental results against a fresh run.Written for commit f9964ed. Summary will update on new commits. Review in cubic