Skip to content

Commit 5d5ce0f

Browse files
feat(doccano-django): keploy compat lane sample + Python line coverage gate (#101)
* feat: add doccano-django sample (keploy postgres-v3 simple-Query bind regression) Minimum reproducer for the polymorphic-resourcetype failure that motivated keploy/integrations#177. Wraps doccano v1.8.5 + django-rest-polymorphic + postgres 13.3-alpine — the same shape the bug originally surfaced on (keploy/enterprise PRs #1889 / #1964, pipelines 3556 / 3572). Per the keploy-ci-debug skill, the sample owns ALL orchestration the lane scripts in keploy/integrations and keploy/enterprise need: the docker-compose, the admin-bootstrap flow, the API traffic loop, the noise filter (via keploy.yml.template), and a coverage-report helper. Future lanes that exercise the same backend re-use this directory; they don't redefine compose / bootstrap / traffic in their own scripts. The intent is to migrate enterprise/.ci/scripts/doccano-linux.sh from its current ~400-line inlined-everything shape down to a thin "clone sample → wrap in keploy → assert" wrapper in a follow-up PR. Layout: * `Dockerfile` — `FROM doccano/doccano:backend`. Wrapper exists so a future doccano patch (or a backport of an upstream fix that changes the bug-triggering shape) is a one-line edit here, not scattered across lane scripts. * `docker-compose.yml` — postgres + doccano backend on a fixed subnet, every name fully env-driven (DOCCANO_BACKEND_CONTAINER / DOCCANO_DB_CONTAINER / DOCCANO_APP_PORT / DOCCANO_DB_IP / DOCCANO_NETWORK_SUBNET). Lane scripts running multiple matrix cells in parallel pass per-cell values so the cells don't collide on container names. Two-phase boot (DOCCANO_SKIP_BOOTSTRAP=0 → migrations + admin; named volume retained; DOCCANO_SKIP_BOOTSTRAP=1 → gunicorn-only against the populated volume) so record/replay see a deterministic state. * `flow.sh` — four subcommands: bootstrap — log in as admin, install the deterministic authtoken_token row so record-time and replay-time Authorization headers match. record-traffic — drive the API: 16-call /v1/me warmup hammer (gunicorn worker contenttypes-cache warmup, necessary for the SIGINT-driven shutdown pattern lanes use), POST a polymorphic TextClassificationProject, GET / PATCH it, plus dependent category-types / examples / categories / metrics reads that exercise the multi-bind django_content_type lookups the fix targets. Fire-and-forget; keploy is the assertion layer at replay. coverage — walk the running backend's URL resolver (introspecting actual served methods, not Django's permissive http_method_names default) and the just-recorded keploy/test-set-* tests; emit a (method, path) coverage percentage for the v1/projects + accessory surface. list-routes — print the route table the coverage report uses as its denominator (diagnostic). * `keploy.yml.template` — globalNoise filter for the inherently non-deterministic fields (Date/Expires headers, created_at/ updated_at body fields). Centralised here so a future doccano version that adds another auto-timestamp field is one edit rather than a fan-out across lane scripts. Lane scripts envsubst this template into the per-cell run dir. * `README.md` — bug shape, local-run instructions, lane pointers. Sample is keploy-independent: `docker compose up && bash flow.sh bootstrap && bash flow.sh record-traffic` works against bare doccano. Verified locally: 25/25 calls return expected status, polymorphic resourcetype is `TextClassificationProject` end-to-end. The route walker emits 144 (method, path) pairs for the v1/projects + /v1/me + /v1/users + /v1/health + /v1/auth surface; coverage matching against synthetic recorded tests rounds correctly. Lanes that pin to this sample (pinned to the feat/doccano-django-sample branch via --branch until this PR merges): * keploy/integrations `.woodpecker/doccano-postgres.yml` — three-way matrix (record-build × replay-build, record-latest × replay-build, record-build × replay-latest); depends_on prepare-and-run. * keploy/enterprise `.woodpecker/doccano-linux.yml` — being migrated to consume this sample in a follow-up PR; today still uses inline compose generation. Signed-off-by: Akash Kumar <meakash7902@gmail.com> * fix(doccano-django): gate record-traffic on a real readiness signal Pipeline 3597 / 909 (post-compose-render fix) failed at: Container ... Error dependency postgres failed to start Misleading. Real cause: doccano_record_traffic fired its very first POST /v1/projects against a backend whose port was open but gunicorn was still booting; the 5xx response failed `curl -fsS`, set -e killed the script silently, the lane saw a zero-second "traffic done", SIGINTed keploy ~3s later, and the recording captured nothing. The "dependency postgres failed" line in the log is downstream noise from the SIGINT compose-down. Fix: gate doccano_record_traffic on doccano_wait_for_fixed_token before any curl fires. /v1/me with the fixed Authorization header is a stronger readiness signal than wait_for_port: it proves gunicorn is past boot, auth is wired, the named-volume token is loaded, and the DB is responsive — all four guarantees the first POST needs. Lane scripts can keep their own port-level wait (wait_for_port), but the sample's flow.sh now refuses to fire traffic until the backend is genuinely serving. Local smoke-test pattern is unchanged: bootstrap + record-traffic still work standalone. Signed-off-by: Akash Kumar <meakash7902@gmail.com> * ci: doccano-django sample coverage gate (build vs release) Adds .github/workflows/doccano-django.yml — runs ONLY on changes under doccano-django/ (or this workflow file) so unrelated samples in this repo don't pay the doccano runtime cost. Three jobs: * `build-coverage` — checks out the PR's HEAD ref, brings up the sample's compose, drives flow.sh bootstrap + record-traffic with a per-call audit log enabled, runs flow.sh coverage. Captures the percentage as a job output. * `release-coverage` — same end-to-end against github.event.pull_request.base.ref (typically main) so we have a baseline to compare against. Skipped on direct push events to main (no baseline to diff against — main IS the baseline). * `coverage-gate` — fails the PR if build's coverage drops more than COVERAGE_THRESHOLD pp below release. COVERAGE_THRESHOLD defaults to 1.0pp; override with the `DOCCANO_COVERAGE_THRESHOLD` actions variable per-repo. Sticky-comments the PR with the diff via marocchino/sticky-pull-request-comment so reviewers see the delta inline. The two measurement jobs share their body via .github/workflows/scripts/run-and-measure.sh — same script, different ref. Lifting it out of the YAML keeps the YAML focused on orchestration (matrix / outputs / artifacts) and the bash on the actual workflow logic. Coverage source uses flow.sh's per-call audit log (DOCCANO_FIRED_ROUTES_FILE). That makes the measurement genuinely keploy-independent: the workflow doesn't run keploy at all, doesn't compare against recorded test sets, just measures what the sample's flow.sh ACTUALLY exercises against doccano's URL resolver. Lane scripts in keploy/integrations and keploy/enterprise consume the same flow.sh but use the keploy/test-set-*/tests/*.yaml tree as their numerator (authoritative — only calls keploy actually captured count). Both modes are wired into flow.sh::doccano_list_recorded_routes via the DOCCANO_FIRED_ROUTES_FILE fallback. Sample-side changes: * flow.sh::doccano_wait_for_fixed_token extracted as its own function (was inlined into doccano_bootstrap_token, broke doccano_record_traffic's forward reference and silently fail-fasted under set -e). * flow.sh::doccano_record_traffic gates on doccano_wait_for_fixed_token before any curl fires — port-open isn't a sufficient readiness signal under SIGINT-driven shutdown, the very first curl -fsS POST would 5xx on a still-booting gunicorn and silently kill the script. * flow.sh::log_fired writes (METHOD, URL) to DOCCANO_FIRED_ROUTES_FILE before each curl in doccano_record_traffic. Cheap, optional (no-op when env var unset), and keeps the audit log adjacent to the curl that produces it so future contributors can't add a curl without also adding the log entry. * flow.sh::doccano_list_recorded_routes falls back to the audit log when no keploy/test-set-*/tests/*.yaml exists — the standalone-mode numerator the workflow needs. Verified locally: workflow body (`run-and-measure.sh`) runs end-to-end against bare doccano in ~3 minutes, captures 16 unique (method, path) pairs, emits coverage=11.1% to GITHUB_OUTPUT. The gate logic itself is plain bash + python3 arithmetic; no codecov/coveralls dependency, no hosted service needed. Signed-off-by: Akash Kumar <meakash7902@gmail.com> * ci(doccano-django): graceful bootstrap when base ref lacks the sample Run 25196349264 (the very PR introducing doccano-django/) failed in release-coverage with: An error occurred trying to start process '/usr/bin/bash' with working directory '.../doccano-django'. No such file or directory Expected: the workflow checks out the PR's base ref to compute the baseline coverage, but on the introducing PR there's no baseline — `doccano-django/` doesn't exist on main yet. Fix: a `detect` step inspects whether `doccano-django/flow.sh` exists on the checked-out base ref. If yes, the measurement runs as before. If no (first-PR-bootstrap case), an `empty-baseline` step emits coverage=0.0 onto the job output, the measurement step is skipped via `if:`, and the upload- artifact step is also skipped (so we don't claim a non-existent report file). The job's `outputs.coverage` falls back through `||` so the gate sees 0.0 either way. Net effect on the introducing PR: build's coverage (currently ~11%) is compared against 0%, gate trivially passes. After this PR merges and a future PR edits doccano-django/, the detect step finds the sample on main, real measurement runs, real diff applies. Signed-off-by: Akash Kumar <meakash7902@gmail.com> * feat(doccano-django): real Python line coverage via coverage.py overlay Replaces the prior API-route-surface "coverage" (which counted fired routes / known routes — a proxy that read like real coverage but didn't measure code execution) with actual line coverage via coverage.py 7.6.1. Architecture: - `Dockerfile.coverage` extends `doccano/doccano:backend` to install coverage[toml] and drop a `coverage_subprocess.pth` file into site-packages, so every gunicorn worker that forks auto-starts `coverage.process_startup()`. - `.coveragerc` runs in parallel mode (one .coverage.<pid> per worker) with sigterm = true so flushing happens on graceful shutdown. - `docker-compose.coverage.yml` is an OVERLAY: the GH Actions coverage workflow applies it via `-f docker-compose.yml -f docker-compose.coverage.yml`. The base `Dockerfile` and `docker-compose.yml` are untouched, so keploy/integrations and keploy/enterprise CI lanes consume the base compose and pay zero coverage-instrumentation cost. - `flow.sh::doccano_report_coverage` shells into the running backend, runs `coverage combine` + `coverage report --format=total`, emits `Covered N/M (XX.X%)` matching the helper script's regex. When called against the base image (no overlay) it prints "INFO: ... uninstrumented" and exits 0 so enterprise lanes' `flow.sh coverage || true` informational calls keep working. Removed: - `doccano_list_routes` (the Django URL-resolver walk). - `doccano_list_recorded_routes` (the keploy-tests / fired-routes reader). - The legacy route-surface `doccano_report_coverage` body. - `list-routes` subcommand (was diagnostic only for the surface metric). Validated locally: e2e run produced `coverage=59.0` to GITHUB_OUTPUT against a clean stack (gunicorn 4 workers, traffic loop fired, SIGTERM flush, combine+report inside container). 59% reflects bootstrap + the sample's small traffic surface; adding curls to flow.sh::doccano_record_traffic moves the number up. Signed-off-by: Akash Kumar <meakash7902@gmail.com> * ci(doccano-django): drop trailing prose from sticky comment Signed-off-by: Akash Kumar <meakash7902@gmail.com> * docs(doccano-django): split run section into smoke / coverage / keploy modes Signed-off-by: Akash Kumar <meakash7902@gmail.com> * fix(doccano-django): own skip-bootstrap replay mode Signed-off-by: Akash Kumar <meakash7902@gmail.com> * fix(doccano-django): nest globalNoise schema so the filter actually applies keploy's GlobalNoise type is map[string]map[string][]string — outer key is the response section ("header" / "body"), inner key is the field name. Flat dotted keys like `body.created_at: []` get put into the outer map as literal key "body.created_at" and never match any section, so the noise is silently dropped. The template's drift-suppression list was a no-op; only Date got ignored at compare time because keploy auto-stamps Date as per-test noise on every recording, and everything else slipped through. Same shape of fix landed in samples-typescript/umami-postgres in 93bbdae; documenting the gotcha in this template's comments so the next sample doesn't repeat it. Validation pending — doccano cells haven't run yet on the active keploy/enterprise PR (#1889). The matrix-cell collision fix landed in keploy/enterprise#1889 (commit 84cd64b1) opens up the lane enough for the noise filter to actually be exercised against real drift. Signed-off-by: Akash Kumar <meakash7902@gmail.com> --------- Signed-off-by: Akash Kumar <meakash7902@gmail.com>
1 parent 57856de commit 5d5ce0f

12 files changed

Lines changed: 1027 additions & 0 deletions
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# doccano-django sample CI — keploy-independent end-to-end smoke +
2+
# coverage gate.
3+
#
4+
# Triggers ONLY on changes under doccano-django/ (or this workflow
5+
# file). Other samples in this repo have their own orthogonal CI;
6+
# gating the whole repo on every doccano change would slow them
7+
# all down for no benefit.
8+
#
9+
# What it gates:
10+
# * `release-coverage` — checks out the PR's base branch (main)
11+
# and runs the sample end-to-end: docker compose up, bootstrap
12+
# admin token, drive flow.sh record-traffic with the per-call
13+
# audit log enabled, capture the route-coverage percentage from
14+
# `flow.sh coverage`. This is the baseline.
15+
# * `build-coverage` — same end-to-end against the PR's HEAD ref.
16+
# * `coverage-gate` — fails the PR if `build`'s coverage drops
17+
# more than COVERAGE_THRESHOLD percentage points below
18+
# `release`. Default threshold is 1.0pp; override via repo
19+
# variable `DOCCANO_COVERAGE_THRESHOLD` for a tighter or
20+
# looser bar.
21+
#
22+
# On push to main, only `build-coverage` runs (no baseline to
23+
# compare against — main IS the baseline).
24+
#
25+
# Standards-aligned choices:
26+
# * `paths:` filter on both push and pull_request triggers — the
27+
# canonical GH Actions way to scope a workflow to one
28+
# subdirectory.
29+
# * Job outputs (steps.<id>.outputs.coverage → needs.<job>.outputs)
30+
# to thread the captured percentage between jobs.
31+
# * `concurrency:` cancel-in-progress on the same ref so a stale
32+
# run doesn't waste runner minutes.
33+
# * actions/upload-artifact for the human-readable
34+
# coverage_report.txt — reviewers can inspect missing routes
35+
# directly from the PR's "checks" tab.
36+
# * marocchino/sticky-pull-request-comment for the PR-side diff
37+
# comment. Pinned-by-header so successive runs update the same
38+
# comment instead of fanning out.
39+
# * The compare step is plain bash + python3 (no external
40+
# coverage service). For full Python coverage.py XMLs you'd
41+
# want diff-cover or codecov, but the sample's coverage is
42+
# API-route-based (single percentage), so the gate is a 3-line
43+
# subtraction.
44+
#
45+
# Sample is genuinely keploy-independent here: the workflow uses
46+
# flow.sh's $DOCCANO_FIRED_ROUTES_FILE per-call audit log as its
47+
# numerator source, not a keploy recording. The lane scripts in
48+
# keploy/integrations and keploy/enterprise consume the same
49+
# flow.sh, but use the keploy/test-set-*/tests/*.yaml tree as
50+
# their numerator (authoritative — only calls keploy actually
51+
# CAPTURED count). Both modes are wired into
52+
# `flow.sh::doccano_list_recorded_routes`.
53+
name: doccano-django sample
54+
55+
on:
56+
pull_request:
57+
paths:
58+
- 'doccano-django/**'
59+
- '.github/workflows/doccano-django.yml'
60+
push:
61+
branches: [main]
62+
paths:
63+
- 'doccano-django/**'
64+
- '.github/workflows/doccano-django.yml'
65+
workflow_dispatch: {}
66+
67+
concurrency:
68+
group: doccano-django-${{ github.ref }}
69+
cancel-in-progress: true
70+
71+
env:
72+
COVERAGE_THRESHOLD: ${{ vars.DOCCANO_COVERAGE_THRESHOLD || '1.0' }}
73+
74+
jobs:
75+
build-coverage:
76+
name: build (current ref) coverage
77+
runs-on: ubuntu-latest
78+
timeout-minutes: 20
79+
outputs:
80+
coverage: ${{ steps.measure.outputs.coverage }}
81+
steps:
82+
- uses: actions/checkout@v4
83+
- id: measure
84+
name: Run sample end-to-end + measure coverage
85+
working-directory: doccano-django
86+
env:
87+
DOCCANO_FIRED_ROUTES_FILE: ${{ runner.temp }}/fired-routes-build.log
88+
DOCCANO_PHASE: ci-build
89+
run: ../.github/workflows/scripts/run-and-measure.sh
90+
91+
- name: Upload coverage report
92+
if: always()
93+
uses: actions/upload-artifact@v4
94+
with:
95+
name: coverage-build
96+
path: doccano-django/coverage_report.txt
97+
if-no-files-found: warn
98+
99+
release-coverage:
100+
if: github.event_name == 'pull_request'
101+
name: release (base ref) coverage
102+
runs-on: ubuntu-latest
103+
timeout-minutes: 20
104+
outputs:
105+
coverage: ${{ steps.measure.outputs.coverage || steps.empty-baseline.outputs.coverage }}
106+
sample-existed: ${{ steps.detect.outputs.sample-existed }}
107+
steps:
108+
- uses: actions/checkout@v4
109+
with:
110+
ref: ${{ github.event.pull_request.base.ref }}
111+
112+
# First-PR bootstrap escape hatch: the very PR that
113+
# introduces the doccano-django/ sample has no baseline
114+
# (doccano-django/ doesn't exist on the base ref). Detect
115+
# that and short-circuit to coverage=0; the gate then
116+
# treats build's coverage as the new baseline and trivially
117+
# passes for any percentage > 0. After the introducing PR
118+
# merges, every subsequent PR has a real baseline to diff
119+
# against.
120+
- id: detect
121+
name: Detect baseline presence
122+
run: |
123+
if [ -d doccano-django ] && [ -x doccano-django/flow.sh ]; then
124+
echo "sample-existed=true" >>"$GITHUB_OUTPUT"
125+
echo "Sample exists on base ref — running full measurement."
126+
else
127+
echo "sample-existed=false" >>"$GITHUB_OUTPUT"
128+
echo "No doccano-django/ on base ref — first-PR bootstrap; baseline coverage treated as 0%."
129+
fi
130+
131+
- id: measure
132+
name: Run sample end-to-end + measure coverage
133+
if: steps.detect.outputs.sample-existed == 'true'
134+
working-directory: doccano-django
135+
env:
136+
DOCCANO_FIRED_ROUTES_FILE: ${{ runner.temp }}/fired-routes-release.log
137+
DOCCANO_PHASE: ci-release
138+
run: ../.github/workflows/scripts/run-and-measure.sh
139+
140+
- id: empty-baseline
141+
name: Emit zero baseline (first-PR bootstrap)
142+
if: steps.detect.outputs.sample-existed != 'true'
143+
run: echo "coverage=0.0" >>"$GITHUB_OUTPUT"
144+
145+
- name: Upload coverage report
146+
if: always() && steps.detect.outputs.sample-existed == 'true'
147+
uses: actions/upload-artifact@v4
148+
with:
149+
name: coverage-release
150+
path: doccano-django/coverage_report.txt
151+
if-no-files-found: warn
152+
153+
coverage-gate:
154+
if: github.event_name == 'pull_request'
155+
name: coverage gate
156+
needs: [build-coverage, release-coverage]
157+
runs-on: ubuntu-latest
158+
steps:
159+
- name: Compare build vs release
160+
env:
161+
BUILD: ${{ needs.build-coverage.outputs.coverage }}
162+
RELEASE: ${{ needs.release-coverage.outputs.coverage }}
163+
THRESHOLD: ${{ env.COVERAGE_THRESHOLD }}
164+
BASE_REF: ${{ github.event.pull_request.base.ref }}
165+
run: |
166+
set -Eeuo pipefail
167+
if [ -z "${BUILD:-}" ] || [ -z "${RELEASE:-}" ]; then
168+
echo "::error::missing coverage outputs — build='${BUILD:-}' release='${RELEASE:-}'"
169+
exit 1
170+
fi
171+
drop=$(python3 -c "print(round(${RELEASE} - ${BUILD}, 2))")
172+
echo "Release (${BASE_REF}): ${RELEASE}%"
173+
echo "Build (this PR): ${BUILD}%"
174+
echo "Drop: ${drop}pp (threshold ${THRESHOLD}pp)"
175+
if python3 -c "import sys; sys.exit(0 if (${RELEASE} - ${BUILD}) > ${THRESHOLD} else 1)"; then
176+
echo "::error::doccano-django coverage dropped from ${RELEASE}% → ${BUILD}% (-${drop}pp), exceeding the ${THRESHOLD}pp threshold."
177+
echo "Suggested actions:"
178+
echo " * Add curl(s) to flow.sh::doccano_record_traffic that exercise the new code paths."
179+
echo " * Or extend the .coveragerc 'omit' list if the new module is not part of the runtime backend (migrations, management commands, tests)."
180+
exit 1
181+
fi
182+
echo "OK — coverage delta within ${THRESHOLD}pp threshold."
183+
184+
- name: Sticky PR comment
185+
if: ${{ !cancelled() }}
186+
uses: marocchino/sticky-pull-request-comment@v2
187+
with:
188+
header: doccano-django-coverage
189+
message: |
190+
### doccano-django sample coverage
191+
192+
| ref | coverage |
193+
|---|---|
194+
| base (`${{ github.event.pull_request.base.ref }}`) | **${{ needs.release-coverage.outputs.coverage }}%** |
195+
| this PR | **${{ needs.build-coverage.outputs.coverage }}%** |
196+
197+
Threshold: PR may not drop coverage by more than **${{ env.COVERAGE_THRESHOLD }}pp**. Override per-repo via the `DOCCANO_COVERAGE_THRESHOLD` actions variable.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
#!/usr/bin/env bash
2+
#
3+
# run-and-measure.sh — bring doccano up under the coverage overlay,
4+
# run flow.sh bootstrap + record-traffic, flush coverage from each
5+
# gunicorn worker, run flow.sh coverage to combine + report, and
6+
# emit `coverage=PCT` onto $GITHUB_OUTPUT for the downstream
7+
# coverage-gate job.
8+
#
9+
# Called from .github/workflows/doccano-django.yml's build-coverage
10+
# and release-coverage jobs (one per ref under comparison). Both
11+
# jobs source the same script so the measurement is identical
12+
# across refs — any drift in the numerator definition would
13+
# otherwise produce a misleading delta.
14+
#
15+
# Coverage isolation contract:
16+
# * Base `Dockerfile` and `docker-compose.yml` are untouched.
17+
# * The overlay `Dockerfile.coverage` + `docker-compose.coverage.yml`
18+
# adds coverage.py + the auto-start .pth file. ONLY this script
19+
# applies the overlay; the keploy/integrations and
20+
# keploy/enterprise CI lanes consume the base compose and pay
21+
# zero coverage-instrumentation cost.
22+
#
23+
# Inputs (from the workflow env):
24+
# DOCCANO_PHASE — label spliced into the project name so
25+
# build vs release runs don't collide.
26+
# GITHUB_OUTPUT — standard GH Actions sink for step outputs.
27+
set -Eeuo pipefail
28+
29+
export DOCCANO_BACKEND_CONTAINER="${DOCCANO_BACKEND_CONTAINER:-doccano_backend}"
30+
export DOCCANO_DB_CONTAINER="${DOCCANO_DB_CONTAINER:-doccano_db}"
31+
export DOCCANO_APP_PORT="${DOCCANO_APP_PORT:-18080}"
32+
export DOCCANO_FIXED_TOKEN="${DOCCANO_FIXED_TOKEN:-ac38262065f0ae1476b6a707d9d697a101764a6b}"
33+
34+
mkdir -p coverage
35+
chmod 777 coverage # worker UID inside container differs from runner UID
36+
sudo rm -rf coverage/.coverage* 2>/dev/null || rm -rf coverage/.coverage* 2>/dev/null || true
37+
38+
COMPOSE=(docker compose -f docker-compose.yml -f docker-compose.coverage.yml)
39+
40+
# Stage 1: bring up doccano with bootstrap so the schema migrations
41+
# and the admin user persist into the named DB volume. The overlay
42+
# image runs gunicorn with coverage.process_startup() auto-armed in
43+
# every forked worker.
44+
DOCCANO_SKIP_BOOTSTRAP=0 "${COMPOSE[@]}" up -d --build
45+
46+
# Wait for the backend to start serving (cold doccano boot runs
47+
# Django migrations + admin user create — on a GH runner this can
48+
# hit 90-120s).
49+
for i in $(seq 1 120); do
50+
code=$(curl -sS -o /dev/null -w '%{http_code}' \
51+
"http://127.0.0.1:${DOCCANO_APP_PORT}/v1/health/" 2>/dev/null || echo "")
52+
if [ -n "$code" ] && [ "$code" != "000" ]; then break; fi
53+
sleep 2
54+
done
55+
56+
bash flow.sh bootstrap 240
57+
"${COMPOSE[@]}" down --remove-orphans
58+
59+
# Stage 2: re-launch in skip-bootstrap mode against the populated
60+
# volume; same shape the keploy lanes use. The overlay layer is
61+
# preserved across compose-down (only `down -v` would wipe the
62+
# named volume), so coverage tooling is still wired in.
63+
DOCCANO_SKIP_BOOTSTRAP=1 "${COMPOSE[@]}" up -d
64+
65+
# flow.sh::doccano_record_traffic gates on doccano_wait_for_fixed_token
66+
# internally, so this won't fire curls at a half-booted backend.
67+
bash flow.sh record-traffic
68+
69+
# Flush coverage from each gunicorn worker. coverage.py with
70+
# sigterm = true writes the in-flight per-worker .coverage.<pid>
71+
# data file to /coverage on SIGTERM; `compose kill -s SIGTERM`
72+
# delivers it to the container's main process which propagates to
73+
# its workers via gunicorn's graceful shutdown.
74+
"${COMPOSE[@]}" kill -s SIGTERM backend
75+
# coverage.py's sigterm hook is synchronous but the OS-level
76+
# write+fsync needs a moment.
77+
sleep 3
78+
79+
# Bring backend back up so `flow.sh coverage` can docker-exec
80+
# `coverage combine` + `coverage report` inside.
81+
"${COMPOSE[@]}" up -d backend
82+
for i in $(seq 1 60); do
83+
if docker exec "$DOCCANO_BACKEND_CONTAINER" sh -c 'ls /coverage/.coverage.* >/dev/null 2>&1'; then
84+
break
85+
fi
86+
sleep 1
87+
done
88+
89+
COVERAGE_REPORT_FILE="$PWD/coverage_report.txt" bash flow.sh coverage
90+
91+
# Parse `Covered N/M (XX.X%)` — anchored on the parenthesised form
92+
# so a future report-prose change doesn't break the parse.
93+
pct=$(grep -oE '\([0-9]+\.[0-9]+%\)' coverage_report.txt | head -1 | tr -d '()%')
94+
if [ -z "$pct" ]; then
95+
echo "::error::Could not parse coverage percentage from coverage_report.txt"
96+
cat coverage_report.txt || true
97+
exit 1
98+
fi
99+
echo "coverage=${pct}" >>"$GITHUB_OUTPUT"
100+
echo "coverage: ${pct}% (Python line coverage via coverage.py)"
101+
102+
"${COMPOSE[@]}" down -v --remove-orphans

doccano-django/.coveragerc

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
[run]
2+
# Per-process line coverage of the backend Django code.
3+
#
4+
# parallel + sigterm: gunicorn forks WORKERS subprocesses; each
5+
# writes its own .coverage.<host>.<pid> file under /coverage.
6+
# `combine` merges them at report time. `sigterm = true` flushes
7+
# the in-flight data on SIGTERM so the reaper from the workflow
8+
# captures it.
9+
parallel = true
10+
sigterm = true
11+
branch = false
12+
data_file = /coverage/.coverage
13+
source = /backend
14+
15+
omit =
16+
*/tests/*
17+
*/migrations/*
18+
*/__pycache__/*
19+
/backend/manage.py
20+
/backend/config/wsgi.py
21+
/backend/config/asgi.py

doccano-django/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
coverage/
2+
coverage_report.txt

doccano-django/Dockerfile

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Thin wrapper around doccano's official backend image at the version
2+
# this sample tracks. Pinning here (rather than in each lane script
3+
# under keploy/integrations / keploy/enterprise) means a future
4+
# doccano release that changes the bug-triggering shape is a one-line
5+
# retag in this repo, not a hunt across the CI tree.
6+
#
7+
# Upstream tag: doccano/doccano:backend (the rolling backend tag)
8+
# Source pin: doccano/doccano @ v1.8.5
9+
# https://github.com/doccano/doccano/releases/tag/v1.8.5
10+
#
11+
# v1.8.5 was the version exercised on keploy/enterprise pipeline 3556
12+
# (PR #1889) and pipeline 3572 (PR #1964 minimal repro) where the
13+
# bug originally manifested.
14+
FROM doccano/doccano:backend
15+
16+
USER root
17+
COPY doccano-entrypoint.sh /opt/bin/doccano-keploy-entrypoint.sh
18+
RUN chmod +x /opt/bin/doccano-keploy-entrypoint.sh
19+
USER doccano
20+
21+
ENTRYPOINT ["/opt/bin/doccano-keploy-entrypoint.sh"]

doccano-django/Dockerfile.coverage

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Coverage-instrumented variant of the doccano backend image.
2+
#
3+
# Base `Dockerfile` (and `docker-compose.yml`) are deliberately
4+
# untouched so the keploy enterprise / integrations lanes — which
5+
# consume them as-is — pay zero coverage-instrumentation cost. This
6+
# overlay image is built and run ONLY by the standalone GitHub
7+
# Actions workflow under `.github/workflows/doccano-django.yml`,
8+
# wired in via `docker-compose.coverage.yml`.
9+
#
10+
# What the overlay adds:
11+
# * `coverage` (Python coverage.py) installed into the same
12+
# site-packages as gunicorn / Django.
13+
# * `.coveragerc` placed at /backend/.coveragerc — the working
14+
# directory the upstream image starts gunicorn from. With
15+
# `COVERAGE_PROCESS_START=/backend/.coveragerc` exported into
16+
# the container env (set in the compose overlay), every
17+
# gunicorn worker that imports `coverage.process_startup` via
18+
# site-packages will pick the rcfile up; combined with `parallel
19+
# = true` and `sigterm = true` in the rcfile, this gives us
20+
# real per-worker line coverage that flushes on SIGTERM.
21+
FROM doccano/doccano:backend
22+
23+
USER root
24+
RUN pip install --no-cache-dir 'coverage[toml]==7.6.1'
25+
26+
# Subprocess auto-start: a .pth file in site-packages is processed
27+
# at every Python startup, so each gunicorn worker that forks calls
28+
# coverage.process_startup() before any Django code runs. This is
29+
# the canonical way coverage.py instruments forked subprocesses
30+
# (see "Measuring sub-processes" in the coverage.py docs).
31+
RUN echo 'import coverage; coverage.process_startup()' \
32+
> /usr/local/lib/python3.10/site-packages/coverage_subprocess.pth
33+
34+
COPY .coveragerc /backend/.coveragerc
35+
RUN mkdir -p /coverage \
36+
&& chown -R doccano:doccano /coverage /backend/.coveragerc
37+
USER doccano

0 commit comments

Comments
 (0)