feat: update alert & implement alert.destination.disabled by alexluong · Pull Request #672 · hookdeck/outpost

alexluong · 2026-02-03T11:59:38Z

Changes

Alert Payload Schema

Restructure consecutive_failures into a nested object with current, max, threshold
Remove will_disable field (threshold == 100 implies disable)
Remove custom MarshalJSON methods in favor of standard json.Marshal
Add tenant_id to top-level of alert data
Expand AlertDestination with filter, metadata, updated_at
Simplify DeliveryAttempt struct: pass Event, Destination, Attempt directly

New `alert.destination.disabled` Callback

Sent when destination is auto-disabled after consecutive failures
Invariant check ensures DisabledAt is set on returned destination
Uses a reason field (e.g., "consecutive_failure") instead of embedding trigger-specific data, keeping the payload extensible for future disable mechanisms (e.g., error rate)
When the threshold is reached, both alert.destination.disabled and alert.destination.consecutive_failure (with threshold: 100) are sent. The disabled alert is sent first, but ordering is not guaranteed.

Consecutive Failure Alert at Threshold 100

The destination in the consecutive failure alert reflects the post-disable state (includes disabled_at)

Error Handling

Notifications are best-effort (logged, not propagated)
DestinationDisabler returns disabled destination for timestamp consistency

Notification Delivery

Alert notifications are sent synchronously via HTTP within HandleAttempt, which itself is called asynchronously from the delivery pipeline (go h.handleAlertAttempt(...)). This means notifications don't block event delivery, but a slow notification (e.g., the disabled alert) will delay subsequent notifications (e.g., the consecutive failure alert) within the same handler call.

Future consideration: If notification latency becomes an issue, we could decouple the two notifications — either by sending them concurrently or by introducing a notification queue. For now, the synchronous approach keeps the implementation simple and the best-effort error handling makes the current behavior acceptable.

Alert Payloads

event is a full models.Event and attempt is a full models.Attempt.

`alert.destination.consecutive_failure`

{
  "topic": "alert.destination.consecutive_failure",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "event": { ... },
    "attempt": { ... },
    "consecutive_failures": {
      "current": 10,
      "max": 20,
      "threshold": 50
    },
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-01T00:00:00Z",
      "disabled_at": null
    }
  }
}

Note: At threshold: 100, destination.disabled_at will be set (reflects post-disable state).

`alert.destination.disabled`

{
  "topic": "alert.destination.disabled",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "disabled_at": "2025-01-15T10:30:00Z",
    "reason": "consecutive_failure",
    "event": { ... },
    "attempt": { ... },
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-15T10:30:00Z",
      "disabled_at": "2025-01-15T10:30:00Z"
    }
  }
}

vercel · 2026-02-03T11:59:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
outpost-docs	Ready	Preview, Comment	Feb 12, 2026 2:49pm
outpost-website	Ready	Preview, Comment	Feb 12, 2026 2:49pm

alexbouchardd · 2026-02-03T13:12:51Z

I think we should rename alert.consecutive_failure to something like alert.destination.consecutive_failure or alert.destination.failure.

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

Should we include attempt instead of event?

Yes, your proposition makes more sense

2

No strong opinion, but the current payload does lack progress, which is the threshold that was triggered

alexluong · 2026-02-03T14:31:43Z

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

yes, let's do alert.destination.consecutive_failure in case we want to expand in the future?

but the current payload does lack progress

that's just the computed value of current / max tho, right? Or you want the threshold itself, so 50/70/90/100?

also do you know current can be higher than max? In that case, would progress be fixed at 100 (or 1) or would we continue adding it up?

alexbouchardd · 2026-02-04T14:57:04Z

I would represent the threshold itself which may not line up with the current / max exactly.

alexluong · 2026-02-06T10:21:42Z

@alexbouchardd updated, the PR description is up-to-date if you want to review the schema, etc.

alexbouchardd

Changes to the docs seems to be missing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

TDD setup - tests will pass once feature is implemented. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Send alert when destination is auto-disabled after reaching consecutive failure threshold. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…bject Replace flat fields (ConsecutiveFailures, MaxConsecutiveFailures, Progress, WillDisable) with a scoped ConsecutiveFailures struct containing Current, Max, and Threshold. This produces a cleaner JSON payload structure and removes the redundant WillDisable field (threshold == 100 implies disable). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace ConsecutiveFailures struct in DestinationDisabledData with a Reason string field. This decouples the disabled alert from the specific trigger mechanism, allowing future disable reasons (e.g., error rate) without restructuring the payload. Also update e2e types and assertions for the nested ConsecutiveFailures struct introduced in the previous commit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…d 100 After disabling a destination, update the consecutive failure alert's destination to include DisabledAt, so consumers see the post-disable state. Add test asserting this behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use generic "failed to send alert" / "alert sent" messages with a topic field instead of alert-type-specific message strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexbouchardd · 2026-03-27T00:00:02Z

Following the user conversation, I think we need to revisit the alert semantics altogether. Really, these are "Operator Events" and could span different topics, such as alerts. What I would propose is the following changes:

Introduce a config/mechanism to subscribe to operator event topics such as attempt.failed and tenant.updated
Introduce the ability to publish these events to one of the supported operator MQ like SQS/PubSub
Rework the alert logic / destination logic to more clearly set the trigger conditions for the relevant operator events

The list of initial events would be:

tenant.updated
destination.disabled
attempt.failed
attempt.succeesed
attempt.exhausted-retries
alert.destination.consecutive-failures
alert.destination.exhausted-retries
alert.destination.failure-rate

The new alert configs would look something like

ALERT_TRIGGER_CONSECUTIVE_FAILURES
ALERT_TRIGGER_EXAUSTED_RETRIES_COUNT
ALERT_TRIGGER_FAILURE_RATE

What do you think?

alexluong · 2026-03-27T08:26:51Z

I think the overall idea makes sense. There are some details that I may want to bring up for clarity, but nothing critical. The only high level topic I want to discuss is around publishing these Operator Events

Introduce the ability to publish these events to one of the supported operator MQ like SQS/PubSub

I assume your intention of relying on an MQ like SQS/PubSub is for reliability instead of HTTP? With that said, it's not fully guaranteed because there could be issue from Outpost side? So basically we cannot guarantee at-least-once but instead at-most-once for these events. Would that be the right take?

I'm thinking, if we want to provide more guarantee for these events, we can consider reusing the deliverymq mechanism of some sort? Treat it like a system-level destination somehow. I have an approach in mind that's not super huge in scope, just want to make sure we're on the same page first.

alexbouchardd · 2026-03-27T12:57:13Z

Good consideration, while I considered re-using the system level destination, my thinking is that it would leave a bunch of edge cases around having to hardcoded some exceptions for this "internal" destination, while we already have a pattern for operators to bring their own queue.

In terms of guarantee, we may be able can get there. Those events can be emitted by the log service when pulling attempts, and the message would get nacked in the log queue if the event can't be published. That would imply re-inserting into storage which should already be idempotent.

vercel bot deployed to Preview – outpost-docs February 3, 2026 11:59 View deployment

vercel bot deployed to Preview – outpost-website February 3, 2026 11:59 View deployment

alexluong force-pushed the alert branch from 7db63e9 to e778007 Compare February 6, 2026 07:28

vercel bot deployed to Preview – outpost-docs February 6, 2026 07:28 View deployment

vercel bot deployed to Preview – outpost-website February 6, 2026 07:29 View deployment

vercel bot deployed to Preview – outpost-website February 6, 2026 09:45 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 09:45 View deployment

vercel bot deployed to Preview – outpost-website February 6, 2026 10:00 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:00 View deployment

alexluong force-pushed the alert branch from 5016fbd to be93419 Compare February 6, 2026 10:02

vercel bot deployed to Preview – outpost-website February 6, 2026 10:03 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:03 View deployment

alexluong force-pushed the alert branch from be93419 to 3f54536 Compare February 6, 2026 10:05

vercel bot deployed to Preview – outpost-website February 6, 2026 10:05 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:06 View deployment

alexluong force-pushed the alert branch from 3f54536 to a454523 Compare February 6, 2026 10:06

vercel bot deployed to Preview – outpost-website February 6, 2026 10:06 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:07 View deployment

alexluong force-pushed the alert branch from a454523 to b3534d9 Compare February 6, 2026 10:08

vercel bot deployed to Preview – outpost-website February 6, 2026 10:08 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:09 View deployment

vercel bot deployed to Preview – outpost-website February 6, 2026 10:14 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:15 View deployment

vercel bot deployed to Preview – outpost-website February 6, 2026 10:21 View deployment

vercel bot deployed to Preview – outpost-docs February 6, 2026 10:21 View deployment

alexbouchardd reviewed Feb 9, 2026

View reviewed changes

alexluong and others added 10 commits February 12, 2026 21:44

refactor: update alert payload schema

07ced21

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

test: add failing tests for alert.destination.disabled callback

cea1127

TDD setup - tests will pass once feature is implemented. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

feat: add alert.destination.disabled callback

419d6fb

Send alert when destination is auto-disabled after reaching consecutive failure threshold. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore: error handling

0bc231e

chore: unify alert log messages with topic field

04d2648

Use generic "failed to send alert" / "alert sent" messages with a topic field instead of alert-type-specific message strings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chore: use AttemptFactory in SendsDestinationDisabledAlert test

49cb329

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: update alert payload schema and add destination disabled alert

fa48115

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexluong force-pushed the alert branch from c659fa1 to fa48115 Compare February 12, 2026 14:48

vercel bot deployed to Preview – outpost-website February 12, 2026 14:49 View deployment

vercel bot deployed to Preview – outpost-docs February 12, 2026 14:49 View deployment

alexbouchardd marked this pull request as draft March 13, 2026 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: update alert & implement alert.destination.disabled#672

feat: update alert & implement alert.destination.disabled#672
alexluong wants to merge 10 commits intomainfrom
alert

alexluong commented Feb 3, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

alexbouchardd commented Feb 3, 2026

Uh oh!

alexluong commented Feb 3, 2026

Uh oh!

alexbouchardd commented Feb 4, 2026

Uh oh!

alexluong commented Feb 6, 2026

Uh oh!

alexbouchardd left a comment

Uh oh!

alexbouchardd commented Mar 27, 2026

Uh oh!

alexluong commented Mar 27, 2026

Uh oh!

alexbouchardd commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexluong commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Alert Payload Schema

New alert.destination.disabled Callback

Consecutive Failure Alert at Threshold 100

Error Handling

Notification Delivery

Alert Payloads

alert.destination.consecutive_failure

alert.destination.disabled

Uh oh!

vercel bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbouchardd commented Feb 3, 2026

Uh oh!

alexluong commented Feb 3, 2026

Uh oh!

alexbouchardd commented Feb 4, 2026

Uh oh!

alexluong commented Feb 6, 2026

Uh oh!

alexbouchardd left a comment

Choose a reason for hiding this comment

Uh oh!

alexbouchardd commented Mar 27, 2026

Uh oh!

alexluong commented Mar 27, 2026

Uh oh!

alexbouchardd commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexluong commented Feb 3, 2026 •

edited

Loading

New `alert.destination.disabled` Callback

`alert.destination.consecutive_failure`

`alert.destination.disabled`

vercel bot commented Feb 3, 2026 •

edited

Loading