fix workflow system so scheduled triggers run on latest code#2706
fix workflow system so scheduled triggers run on latest code#2706shagun-singh-inkeep wants to merge 21 commits intomainfrom
Conversation
🦋 Changeset detectedLatest commit: feb29ce The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
TL;DR — Replaces the per-trigger daisy-chaining workflow architecture with a single centralized scheduler workflow that polls every 60 seconds, dispatches due triggers as independent one-shot workflows, and stores Key changes
Summary | 34 files | 9 commits | base: Centralized scheduler workflow and trigger dispatcher
The dispatcher atomically advances
|
| Migration | DB | Change |
|---|---|---|
0013_lumpy_apocalypse.sql |
manage (Doltgres) | ALTER TABLE scheduled_triggers ADD COLUMN next_run_at timestamptz |
0023_broad_sharon_ventura.sql |
runtime (Postgres) | CREATE TABLE scheduler_state (singleton) |
manage-schema.ts · computeNextRunAt.ts · scheduledTriggers.ts (DAL)
One-shot scheduledTriggerRunnerWorkflow
Before: The runner was a long-lived daisy-chaining workflow: check trigger → compute next time → create invocation → sleep → post-sleep re-check → execute → chain to next iteration. It handled adoption, supersession, and cancellation mid-sleep.
After: The runner is a stateless one-shot: check trigger enabled → create idempotent invocation → retry loop with cancellation checks → mark completed or failed → exit.
All scheduling concerns (when to run, what's due) are now the dispatcher's responsibility. The runner only needs a TriggerPayload with scheduledFor and focuses purely on execution.
scheduledTriggerRunner.ts · scheduledTriggerSteps.ts
Deploy restart endpoint and CI integration
Before: No mechanism to move scheduled trigger workflows to a new deployment; they continued running on the old instance until they naturally chained.
After:POST /api/deploy/restart-schedulerstarts a fresh scheduler workflow, superseding the old one. The Vercel production workflow calls this after promote + deploy.
The endpoint uses constant-time comparison of INKEEP_AGENTS_RUN_API_BYPASS_SECRET for auth and is registered with noAuth() / security: [] in the OpenAPI spec.
restartScheduler.ts · vercel-production.yml
There was a problem hiding this comment.
Significant architectural improvement — moving from per-trigger daisy-chaining workflows to a centralized scheduler with next_run_at on the manage table is a cleaner model. The supersession mechanism and one-shot runner design are well thought out.
There are a few issues to address before merging, roughly in priority order:
- Reconciliation
checkis broken —listEnabledScheduledTriggersonly selects{ id, name }, so(t as any).nextRunAtis alwaysundefinedand every enabled trigger will be flagged as missing. agentFullandprojectFullcreate paths don't setnextRunAt— triggers created through these bulk routes will sit dormant until reconciliation detects them.- Crash between advance-and-dispatch loses one-time triggers — if the process dies after
advanceScheduledTriggerNextRunAtcommits but beforestart(workflow)executes, one-time triggers are permanently disabled with no execution. as anycasts —nextRunAtis omitted fromScheduledTriggerInsertSchemabut passed at the call site, causing multipleas anycasts. Clean fix: makenextRunAtan accepted (optional) field in the insert schema.- Security nits on the deploy endpoint — timing-safe comparison leaks secret length; error responses expose
err.message.
|
|
||
| const missingWorkflows = enabledTriggers | ||
| .filter((t) => !workflowsByTriggerId.has(t.id)) | ||
| .filter((t) => !(t as any).nextRunAt) |
There was a problem hiding this comment.
Bug: listEnabledScheduledTriggers (in audit-queries.ts) only selects { id, name } — nextRunAt is never present on the returned objects. This means !(t as any).nextRunAt is always true, and every enabled trigger will be reported as missing.
Fix: add nextRunAt: scheduledTriggers.nextRunAt to the select() in listEnabledScheduledTriggers, then remove this as any cast.
| orphanedWorkflows: [], | ||
| staleWorkflows: [], | ||
| deadWorkflows: [], | ||
| verificationFailures: [], |
There was a problem hiding this comment.
These four fields (orphanedWorkflows, staleWorkflows, deadWorkflows, verificationFailures) are now hardcoded empty arrays. Consider updating ScheduledTriggerAuditResult to remove or mark them optional — returning dead-letter fields that can never be populated adds noise.
| nextRunAt, | ||
| } as any); |
There was a problem hiding this comment.
The as any cast is needed because nextRunAt is omitted from ScheduledTriggerInsertSchemaBase. Since the create route now always computes and passes nextRunAt, the insert type should accept it.
Fix: remove nextRunAt from the .omit() in ScheduledTriggerInsertSchemaBase (or add it as .optional()) so the DAL function accepts it without a cast.
| const mergedEnabled = body.enabled !== undefined ? body.enabled : existing.enabled; | ||
| const enabledChanged = body.enabled !== undefined && body.enabled !== existing.enabled; | ||
|
|
||
| let nextRunAt: string | null | undefined; | ||
| if (!mergedEnabled) { | ||
| nextRunAt = null; | ||
| } else if (scheduleChanged || enabledChanged) { | ||
| const mergedCron = | ||
| body.cronExpression !== undefined ? body.cronExpression : existing.cronExpression; | ||
| const mergedTimezone = | ||
| body.cronTimezone !== undefined ? body.cronTimezone : existing.cronTimezone; | ||
| const mergedRunAt = body.runAt !== undefined ? body.runAt : existing.runAt; | ||
| nextRunAt = computeNextRunAt({ | ||
| cronExpression: mergedCron, | ||
| cronTimezone: mergedTimezone, | ||
| runAt: mergedRunAt, | ||
| }); | ||
| } |
There was a problem hiding this comment.
nextRunAt is only recomputed when scheduleChanged || enabledChanged. If only payload or messageTemplate changes, nextRunAt is left unchanged — that's correct.
However, the enabled → disabled transition sets nextRunAt = null here, but onTriggerUpdated no longer cancels pending invocations for that case (the old Case 2 was removed from ScheduledTriggerService.ts). Already-queued pending or running invocations will continue executing even though the user disabled the trigger. Consider adding cancelPendingInvocationsForTrigger to the enabled→disabled path, either here or in onTriggerUpdated.
| await withRef( | ||
| manageDbPool, | ||
| resolvedRef, | ||
| (db) => | ||
| advanceScheduledTriggerNextRunAt(db)({ | ||
| scopes: { tenantId, projectId, agentId }, | ||
| scheduledTriggerId, | ||
| nextRunAt, | ||
| enabled: isOneTime ? false : undefined, | ||
| }), | ||
| { commit: true, commitMessage: `Advance next_run_at for trigger ${scheduledTriggerId}` } | ||
| ); |
There was a problem hiding this comment.
Risk: crash-window between advance and dispatch. If the process dies after this advanceScheduledTriggerNextRunAt commit but before start(scheduledTriggerRunnerWorkflow) on line 100, nextRunAt is already advanced but no workflow was started. The rollback on line 102 only runs if start() throws, not on a process crash.
- Cron triggers: miss one execution (next tick computes a new
nextRunAt) — acceptable. - One-time triggers: permanently disabled (
enabled=false,nextRunAt=null) with no execution — data loss.
Consider reversing the order: start the workflow first (idempotent via createInvocationIdempotentStep), then advance nextRunAt. If advance fails, the trigger is picked up again next tick; the idempotency key prevents double-execution.
| await upsertSchedulerState(runDbClient)({ | ||
| currentRunId: run.runId, | ||
| deploymentId: getDeploymentId(), | ||
| }); |
There was a problem hiding this comment.
Minor race: this upsertSchedulerState call is redundant with registerSchedulerStep inside the workflow. Both write currentRunId to the same singleton row. If the deploy endpoint is called twice in quick succession (e.g. CI retry), you can get:
- Call 1:
start()→ run C - Call 2:
start()→ run D - Call 1:
upsertSchedulerState(C) - Call 2:
upsertSchedulerState(D) - Workflow C:
registerSchedulerStep(C)— overwrites D
Now both C and D think they're current. Consider removing this outer upsertSchedulerState and relying solely on registerSchedulerStep inside the workflow.
| function constantTimeEqual(a: string, b: string): boolean { | ||
| const bufA = Buffer.from(a); | ||
| const bufB = Buffer.from(b); | ||
| if (bufA.length !== bufB.length) return false; | ||
| return timingSafeEqual(bufA, bufB); |
There was a problem hiding this comment.
The early return false on length mismatch exits faster than timingSafeEqual, letting an attacker binary-search the secret's byte length via response timing. Low-severity in practice (network jitter dominates), but cheap to fix:
import { createHash, timingSafeEqual } from 'node:crypto';
function constantTimeEqual(a: string, b: string): boolean {
const hash = (s: string) => createHash('sha256').update(s).digest();
return timingSafeEqual(hash(a), hash(b));
}| { error: err instanceof Error ? err.message : String(err) }, | ||
| 'Failed to restart scheduler workflow' | ||
| ); | ||
| return c.json({ error: err instanceof Error ? err.message : 'Internal error' }, 500); |
There was a problem hiding this comment.
err.message is returned to the caller, potentially leaking internal details (DB connection strings, file paths). Since the caller is CI, return a generic message and rely on the server-side log (line 53) for debugging:
return c.json({ error: 'Failed to restart scheduler workflow' }, 500);| const rows = await db.execute( | ||
| sql`SELECT id, tenant_id, project_id, agent_id, | ||
| cron_expression, cron_timezone, run_at, | ||
| next_run_at, enabled | ||
| FROM scheduled_triggers AS OF ${sql.raw(`'${branchName}'`)} |
There was a problem hiding this comment.
branchName is interpolated unescaped into the query via sql.raw. It's constructed from tenantId/projectId via getProjectScopedRef (simple concatenation, no sanitization). The values come from the runtime DB so they're trusted internal data — not a regression since this pattern exists elsewhere in the dolt module — but worth hardening. Consider a shared helper that validates branch names for AS OF clauses (e.g., reject values containing ').
| }): Promise<DueScheduledTrigger[]> => { | ||
| const allDue: DueScheduledTrigger[] = []; | ||
|
|
||
| for (const project of params.projects) { |
There was a problem hiding this comment.
This loops over every project one-at-a-time with a separate SQL query per project. Fine for small deployments, but could become a bottleneck at scale (N round-trips to Doltgres). Consider adding a log/metric for the iteration count so you can detect when this becomes slow.
|
TL;DR — Replaces the per-trigger daisy-chaining workflow architecture with a single centralized scheduler workflow that ticks every 60 seconds, queries a new Key changes
Summary | 41 files | 16 commits | base: Centralized
|
| Migration | Target | Change |
|---|---|---|
0013_lumpy_apocalypse.sql |
manage | ALTER TABLE scheduled_triggers ADD COLUMN next_run_at timestamptz |
0023_broad_sharon_ventura.sql |
runtime | CREATE TABLE scheduler_state (singleton with current_run_id, deployment_id) |
manage-schema.ts · runtime-schema.ts · compute-next-run-at.ts
Post-deploy restart endpoint and CI integration
Before: No mechanism to move scheduled triggers onto a new deployment. Old workflows continued running on stale instances until they naturally terminated.
After:POST /api/deploy/restart-scheduler(authenticated viaINKEEP_AGENTS_RUN_API_BYPASS_SECRET) starts a new scheduler workflow, superseding the old one. A newrestart-schedulerCI job invercel-production.ymlcalls this endpoint after deploy.
restartScheduler.ts · vercel-production.yml
ScheduledTriggerService and reconciliation simplification
Before:
ScheduledTriggerService(~340 lines) managed full lifecycle — start/stop/restart workflows on create/update/delete. The reconciliation handler checked for orphaned/stale/dead workflows by cross-referencing workflow run statuses.
After: Service reduced to ~70 lines — onlyonTriggerUpdatedsurvives for cancelling stale invocations on re-enable or schedule change. Reconciliation checks only for enabled triggers missing anext_run_atvalue.
ScheduledTriggerService.ts · scheduled-triggers.ts (reconciliation)
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
This is a well-designed architectural improvement — moving from per-trigger daisy-chaining workflows to a centralized scheduler is cleaner and more maintainable. The supersession mechanism, one-shot runner design, and upsertScheduledTrigger fix (computing nextRunAt at the DAL layer) are all well thought out.
✅ Issues Fixed Since Prior Reviews
| Prior Issue | Status | Evidence |
|---|---|---|
Reconciliation check broken (nextRunAt missing in select) |
✅ Fixed | audit-queries.ts:18 now includes nextRunAt |
Bulk routes (agentFull/projectFull) don't set nextRunAt |
✅ Fixed | upsertScheduledTrigger() at scheduledTriggers.ts:154-160 now computes nextRunAt internally — all callers benefit |
| Crash between advance-and-dispatch loses one-time triggers | ✅ Fixed | triggerDispatcher.ts:84 now starts workflow before advancing |
| Timing-safe comparison leaks secret length | ✅ Fixed | restartScheduler.ts:11-14 uses SHA256 hash comparison |
Error response exposes err.message |
✅ Fixed | restartScheduler.ts:54 returns generic error |
No test coverage for computeNextRunAt |
✅ Fixed | 136 lines of tests added |
No test coverage for triggerDispatcher |
✅ Fixed | 240 lines of tests added |
🔴❗ Critical (1) ❗🔴
🔴 1) 0013_lumpy_apocalypse.sql:1 Missing data migration for existing enabled triggers
Issue: The migration adds a nullable next_run_at column but does NOT backfill existing enabled triggers. All currently-enabled triggers will have next_run_at = NULL after migration.
Why: The scheduler at findDueScheduledTriggersAcrossProjects only dispatches triggers where next_run_at IS NOT NULL AND next_run_at <= now(). Existing enabled triggers will silently stop running after deploy — a production outage for customers relying on scheduled triggers. This is a one-way door that's very difficult to diagnose in production.
This issue has been raised in all 4 prior reviews and remains the sole blocker.
Fix: Add a backfill to the migration:
ALTER TABLE "scheduled_triggers" ADD COLUMN "next_run_at" timestamp with time zone;
UPDATE scheduled_triggers
SET next_run_at = COALESCE(run_at, NOW())
WHERE enabled = true;Refs:
Inline Comments:
- 🔴 Critical:
0013_lumpy_apocalypse.sql:1Missing data migration backfill
🟠⚠️ Major (1) 🟠⚠️
Inline Comments:
- 🟠 Major:
triggerDispatcher.ts:44Unbounded concurrent dispatches
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
triggerDispatcher.ts:102-105Error log missing correlation context
💭 Consider (1) 💭
Inline Comments:
- 💭 Consider:
computeNextRunAt.test.ts:104-126DST tests use weak assertions
🕐 Pending Recommendations (2)
Prior issues from pullfrog and earlier claude reviews that remain relevant (lower priority than the critical blocker):
- 🟠 Unbounded concurrent dispatches — dispatches all due triggers in parallel with no concurrency limit
- 🟡 DST tests use weak assertions — DST tests don't verify exact expected times
🚫 REQUEST CHANGES
Summary: Excellent progress on this PR — 7 of 8 prior issues have been addressed, including critical fixes to bulk route handling, dispatch ordering, test coverage, and security. The architecture is sound.
However, the data migration backfill remains the critical blocker. Existing customers with scheduled triggers will experience a silent production outage after deploy because their triggers will have next_run_at = NULL and won't be picked up by the scheduler.
This is straightforward to fix with a few lines of SQL added to the migration file. Once that's addressed, this PR is ready to merge.
Discarded (5)
| Location | Issue | Reason Discarded |
|---|---|---|
schedulerWorkflow.ts |
60-second polling could miss sub-minute schedules | Intentional design — cron granularity is minutes |
restartScheduler.ts |
No rate limiting on restart endpoint | Acceptable — endpoint is protected by secret and only called by CI |
schedulerSteps.ts |
Supersession check returns false when state missing | Edge case — scheduler re-registers on next tick anyway |
scheduled-triggers.ts |
Removed onCreated/onDeleted handlers |
Intentional — scheduler polls instead of event-driven |
triggerDispatcher.ts |
Success log emitted when advance fails | Acceptable — log says "dispatched" which is accurate; advance failure logged separately |
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (manual analysis) |
8 | 1 | 1 | 0 | 4 | 2 | 5 |
| Total | 8 | 1 | 1 | 0 | 4 | 2 | 5 |
Note: Subagent reviewers could not be dispatched due to permissions. Manual analysis performed by orchestrator using comprehensive codebase exploration.
| @@ -0,0 +1 @@ | |||
| ALTER TABLE "scheduled_triggers" ADD COLUMN "next_run_at" timestamp with time zone; No newline at end of file | |||
There was a problem hiding this comment.
🔴 CRITICAL: Missing data migration for existing enabled triggers
Issue: This migration adds a nullable next_run_at column but does NOT backfill existing enabled triggers. All currently-enabled triggers will have next_run_at = NULL after migration.
Why: The scheduler at findDueScheduledTriggersAcrossProjects only dispatches triggers where next_run_at IS NOT NULL AND next_run_at <= now() (line 246). Existing enabled triggers will silently stop running after deploy — a production outage for customers relying on scheduled triggers.
This issue has been raised in all 4 prior reviews and remains the sole blocker.
Fix options:
- Add backfill to this migration (recommended):
ALTER TABLE "scheduled_triggers" ADD COLUMN "next_run_at" timestamp with time zone;
-- Backfill: set next_run_at = NOW() for all enabled triggers
-- so scheduler picks them up on next tick
UPDATE scheduled_triggers
SET next_run_at = COALESCE(run_at, NOW())
WHERE enabled = true;- Add startup reconciliation in
SchedulerService.startSchedulerWorkflow()that callscomputeNextRunAtfor any enabled trigger with NULLnext_run_atbefore the first dispatch tick.
Refs:
|
|
||
| logger.info({ dueCount: dueTriggers.length }, 'Found due triggers'); | ||
|
|
||
| const results = await Promise.allSettled( |
There was a problem hiding this comment.
🟠 MAJOR: Unbounded concurrent dispatch could overwhelm the system
Issue: All due triggers are dispatched in parallel via Promise.allSettled with no concurrency limit. If many triggers become due simultaneously (scheduler outage recovery, popular cron times like midnight UTC), this could spawn hundreds of concurrent workflow starts.
Why: Each dispatchSingleTrigger performs DB writes and workflow starts. Unbounded parallelism risks DB connection pool exhaustion, workflow queue backpressure, and cascading failures during recovery scenarios.
Fix: Add a concurrency limit using a semaphore:
import pLimit from 'p-limit';
const limit = pLimit(10); // max 10 concurrent dispatches
const results = await Promise.allSettled(
dueTriggers.map((trigger) => limit(() => dispatchSingleTrigger(trigger)))
);Refs:
- p-limit — popular concurrency limiter
| logger.error( | ||
| { scheduledTriggerId, err }, | ||
| 'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)' | ||
| ); |
There was a problem hiding this comment.
🟡 Minor: Error log missing correlation context for incident debugging
Issue: This log includes scheduledTriggerId and err but is missing tenantId, projectId, agentId, and scheduledFor — all of which are available in scope from the trigger object on line 64.
Why: During an incident with multiple failing triggers, operators would need to manually correlate trigger IDs back to tenant/project context. Including all identifiers makes debugging significantly easier and costs nothing.
Fix:
| logger.error( | |
| { scheduledTriggerId, err }, | |
| 'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)' | |
| ); | |
| logger.error( | |
| { scheduledTriggerId, tenantId, projectId, agentId, scheduledFor: trigger.nextRunAt, err }, | |
| 'Failed to advance next_run_at after workflow start; next tick will retry (idempotent)' | |
| ); |
Refs:
| it('handles DST spring-forward transition', () => { | ||
| const result = computeNextRunAt({ | ||
| cronExpression: '30 2 * * *', | ||
| cronTimezone: 'America/New_York', | ||
| lastScheduledFor: '2026-03-07T07:30:00.000Z', | ||
| }); | ||
|
|
||
| expect(result).toBeDefined(); | ||
| const nextDate = new Date(result as string); | ||
| expect(nextDate.getTime()).toBeGreaterThan(new Date('2026-03-07T07:30:00.000Z').getTime()); | ||
| }); | ||
|
|
||
| it('handles DST fall-back transition', () => { | ||
| const result = computeNextRunAt({ | ||
| cronExpression: '30 1 * * *', | ||
| cronTimezone: 'America/New_York', | ||
| lastScheduledFor: '2026-10-31T05:30:00.000Z', | ||
| }); | ||
|
|
||
| expect(result).toBeDefined(); | ||
| const nextDate = new Date(result as string); | ||
| expect(nextDate.getTime()).toBeGreaterThan(new Date('2026-10-31T05:30:00.000Z').getTime()); | ||
| }); |
There was a problem hiding this comment.
💭 Consider: DST tests use weak assertions
Issue: These DST tests only assert that the result is defined and later than the input. They don't verify the exact expected next run time, meaning bugs in DST handling (e.g., skipping to the wrong day, firing twice during fall-back) would pass.
Why: DST bugs in scheduler systems are notoriously hard to debug in production. Spring-forward and fall-back transitions can cause triggers to fire at unexpected times or not at all.
Fix: Consider strengthening assertions to verify exact expected times. For the spring-forward test (March 8, 2026 in America/New_York), 2:30 AM doesn't exist during spring-forward, so the next valid occurrence should be verified explicitly:
// Example: Assert the exact expected time after DST transition
expect(result).toBe('2026-03-09T07:30:00.000Z'); // or similarRefs:
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
This is a delta review covering 2 commits since the last automated review (19ce6ddac87f). The delta is small and focused:
- Removed unused
workflowProcess.tsendpoint - Refactored
findDueScheduledTriggersAcrossProjectsfrom sequential to parallel queries - Cleaned up unused type fields in
ScheduledTriggerAuditResult
✅ Delta Changes Analyzed
| File | Change | Assessment |
|---|---|---|
createApp.ts |
Remove unused endpoint mounting | ✅ Clean |
workflowProcess.ts |
Deleted (unused) | ✅ Clean |
scheduledTriggers.ts |
Sequential → parallel query refactor | |
types.ts |
Remove unused type fields | ✅ Clean |
The parallel query refactor in scheduledTriggers.ts is a good performance improvement for deployments with many projects, but loses project context in error logging (see inline comment).
🔴❗ Critical (1) ❗🔴
🔴 1) 0013_lumpy_apocalypse.sql:1 Missing data migration for existing enabled triggers
Issue: The migration adds a nullable next_run_at column but does NOT backfill existing enabled triggers. All currently-enabled triggers will have next_run_at = NULL after migration.
Why: The scheduler at findDueScheduledTriggersAcrossProjects only dispatches triggers where next_run_at IS NOT NULL AND next_run_at <= now(). Existing enabled triggers will silently stop running after deploy — a production outage for customers relying on scheduled triggers.
This issue has been raised in 5 prior reviews and remains the sole blocker.
Fix: Add a backfill to the migration:
ALTER TABLE "scheduled_triggers" ADD COLUMN "next_run_at" timestamp with time zone;
-- Backfill: set next_run_at for all enabled triggers so scheduler picks them up
UPDATE scheduled_triggers
SET next_run_at = COALESCE(run_at, NOW())
WHERE enabled = true;Refs:
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
scheduledTriggers.ts:268Error log lost project context after parallel refactor
🕐 Pending Recommendations (2)
Prior issues that were noted but are lower priority than the critical blocker:
- 🟠 Unbounded concurrent dispatches —
triggerDispatcher.ts:44dispatches all due triggers in parallel with no concurrency limit. Consider addingp-limitfor large-scale deployments. - 🟡 DST tests use weak assertions — DST tests only verify
result > inputrather than exact expected times.
🚫 REQUEST CHANGES
Summary: The delta is clean with one minor issue (error logging context). However, the critical data migration issue remains unresolved and continues to block this PR.
All existing scheduled triggers will have next_run_at = NULL after migration and will silently stop running. This is a production outage waiting to happen. Please add a backfill mechanism before merging.
The architecture is solid and this PR is very close to being ready. This one remaining issue is straightforward to fix with a few lines of SQL added to the migration file.
Discarded (0)
No findings were discarded in this delta review.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (direct analysis) |
2 | 1 | 0 | 0 | 1 | 2 | 0 |
| Total | 2 | 1 | 0 | 0 | 1 | 2 | 0 |
Note: This was a focused delta review of a small changeset. Subagent reviewers were not dispatched as the delta contained only 4 file changes with narrow scope.
| allDue.push(...result.value); | ||
| } else { | ||
| logger.warn( | ||
| { err: result.reason }, |
There was a problem hiding this comment.
🟡 Minor: Error log lost project context after parallel refactor
Issue: The refactor from sequential to parallel queries removed tenantId and projectId from the error log. Previously logged { tenantId, projectId, err }, now only logs { err: result.reason }.
Why: When a project branch query fails, operators won't be able to identify which project failed without correlating timestamps with other logs. This makes debugging harder during incidents affecting specific tenants.
Fix: Capture project context in the rejection or wrap the async function to preserve it:
| { err: result.reason }, | |
| const results = await Promise.allSettled( | |
| params.projects.map(async (project) => { | |
| const branchName = getProjectScopedRef(project.tenantId, project.projectId, 'main'); | |
| try { | |
| const rows = await db.execute( | |
| sql`SELECT id, tenant_id, project_id, agent_id, | |
| cron_expression, cron_timezone, run_at, | |
| next_run_at, enabled | |
| FROM scheduled_triggers AS OF ${sql.raw(`'${branchName}'`)} | |
| WHERE enabled = true | |
| AND next_run_at IS NOT NULL | |
| AND next_run_at <= ${params.asOf}` | |
| ); | |
| return rows.rows.map((row) => ({ | |
| id: row.id as string, | |
| tenantId: row.tenant_id as string, | |
| projectId: row.project_id as string, | |
| agentId: row.agent_id as string, | |
| cronExpression: row.cron_expression as string | null, | |
| cronTimezone: row.cron_timezone as string | null, | |
| runAt: row.run_at as string | null, | |
| nextRunAt: row.next_run_at as string | null, | |
| enabled: row.enabled as boolean, | |
| })); | |
| } catch (err) { | |
| logger.warn( | |
| { tenantId: project.tenantId, projectId: project.projectId, err }, | |
| 'Failed to query due triggers for project branch, skipping' | |
| ); | |
| return []; | |
| } | |
| }) | |
| ); | |
| return results | |
| .filter((r): r is PromiseFulfilledResult<DueScheduledTrigger[]> => r.status === 'fulfilled') | |
| .flatMap((r) => r.value); |
Refs:




















No description provided.