Skip to content

Commit 1e68a2d

Browse files
committed
report_progress: add User-Agent header (Cloudflare WAF was blocking)
The Cloudflare WAF in front of githubusers.archivebox.io was returning HTTP 403 error 1010 to urllib's default User-Agent. Sending a clear User-Agent lets the POST through. Without this every report_progress call was silently 403'd by Cloudflare before reaching the Worker.
1 parent c7cba7e commit 1e68a2d

2 files changed

Lines changed: 16 additions & 5 deletions

File tree

cloudflare/worker/index.ts

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -188,11 +188,12 @@ async function handleStatus(
188188
return json({ error: "invalid user" }, 400);
189189
}
190190
const repo = env.GH_REPO ?? "ArchiveBox/githubusers";
191-
// Fetch the most recent workflow_dispatch run. Since concurrency.group
192-
// serializes mines, the latest in_progress (or most recent overall)
193-
// is most likely the one for this user.
191+
// Fetch the most recent workflow run regardless of event (dispatch /
192+
// push / schedule) — they all run the same mining job, and the
193+
// concurrency.group serializes them so the latest run is always the
194+
// most relevant.
194195
const r = await fetch(
195-
`https://api.github.com/repos/${repo}/actions/runs?per_page=5&event=workflow_dispatch`,
196+
`https://api.github.com/repos/${repo}/actions/runs?per_page=5`,
196197
{
197198
headers: {
198199
Authorization: `Bearer ${env.GH_DISPATCH_TOKEN}`,
@@ -205,7 +206,11 @@ async function handleStatus(
205206
return json({ error: "gh api failed", status: r.status }, 502);
206207
}
207208
const data = await r.json() as any;
208-
const run = (data.workflow_runs ?? [])[0];
209+
// Prefer an in_progress / queued run; fall back to most recent overall.
210+
const runs = data.workflow_runs ?? [];
211+
const run = runs.find((x: any) => x.status === "in_progress")
212+
?? runs.find((x: any) => x.status === "queued")
213+
?? runs[0];
209214
if (!run) {
210215
return json({ ok: false, status: "no_runs" });
211216
}

generate_stats.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2395,6 +2395,12 @@ def report_progress(phase: str, message: str = "", **extra) -> None:
23952395
headers={
23962396
"Content-Type": "application/json",
23972397
"Authorization": f"Bearer {PROGRESS_TOKEN}",
2398+
# Cloudflare WAF blocks empty User-Agents — give it a
2399+
# real-looking one.
2400+
"User-Agent": (
2401+
"generate_stats.py "
2402+
"(github.com/ArchiveBox/githubusers)"
2403+
),
23982404
},
23992405
)
24002406
urllib.request.urlopen(req, timeout=5).read()

0 commit comments

Comments
 (0)