-
Notifications
You must be signed in to change notification settings - Fork 1.5k
docs: add Triggering On-Call Pages section to Nagios README #22991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
afe969a
f27ec37
d7bf3c5
ac80498
73ef41b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -105,6 +105,160 @@ The check watches the Nagios events log for log lines containing these strings, | |
|
|
||
| The Nagios check does not include any service checks. | ||
|
|
||
| ## Trigger on-call pages | ||
|
|
||
| Configure Nagios notification commands to call the [Datadog On-Call Paging API][11] directly, bypassing the Agent. The script creates a page on `PROBLEM` notifications and automatically resolves it on `RECOVERY`. | ||
|
|
||
| ### How on-call pages work | ||
|
|
||
| - `CRITICAL`, `DOWN`, or `WARNING` notifications create a page targeting the configured On-Call team. | ||
| - `RECOVERY` notifications resolve the corresponding page. | ||
| - `UNKNOWN` notifications are ignored. | ||
|
|
||
| ### Setup | ||
|
|
||
| #### Create the notification script | ||
|
|
||
| Create `/usr/local/nagios/libexec/notify_datadog_oncall.sh`: | ||
|
|
||
| ```bash | ||
| #!/bin/bash | ||
| set -u | ||
|
|
||
| DD_API_KEY="<YOUR_DATADOG_API_KEY>" | ||
| DD_APP_KEY="<YOUR_DATADOG_APP_KEY>" | ||
| DD_SITE="datadoghq.com" # Change to your Datadog site | ||
|
|
||
| NOTIF_TYPE="${1}" # PROBLEM or RECOVERY | ||
| HOSTNAME="${2}" | ||
| SERVICEDESC="${3}" | ||
| STATE="${4}" # CRITICAL, WARNING, OK, UNKNOWN, UP, DOWN | ||
| ONCALL_TEAM="${5}" # Datadog On-Call team handle, e.g. "ops" | ||
| OUTPUT="${6}" | ||
|
|
||
| # Map DD_SITE to On-Call API endpoint | ||
| case "$DD_SITE" in | ||
| datadoghq.com) ONCALL_URL="https://navy.oncall.datadoghq.com" ;; | ||
| datadoghq.eu) ONCALL_URL="https://beige.oncall.datadoghq.eu" ;; | ||
| us3.datadoghq.com) ONCALL_URL="https://teal.oncall.datadoghq.com" ;; | ||
| us5.datadoghq.com) ONCALL_URL="https://coral.oncall.datadoghq.com" ;; | ||
| ap1.datadoghq.com) ONCALL_URL="https://saffron.oncall.datadoghq.com" ;; | ||
| ap2.datadoghq.com) ONCALL_URL="https://lava.oncall.datadoghq.com" ;; | ||
| ddog-gov.com) ONCALL_URL="https://navy.oncall.datadoghq.com" ;; | ||
| *) echo "Unknown DD_SITE: $DD_SITE" >&2; exit 1 ;; | ||
| esac | ||
|
|
||
| STATE_DIR="/var/tmp/nagios_dd_oncall" | ||
| mkdir -p "$STATE_DIR" | ||
| PAGE_FILE="${STATE_DIR}/${HOSTNAME}-${SERVICEDESC}" | ||
|
|
||
| # Escape special characters for safe JSON embedding | ||
| OUTPUT=$(printf '%s' "$OUTPUT" | sed 's/\\/\\\\/g; s/"/\\"/g') | ||
|
|
||
| if [ "$NOTIF_TYPE" = "RECOVERY" ] || [ "$STATE" = "OK" ] || [ "$STATE" = "UP" ]; then | ||
| # Resolve existing page | ||
| if [ -f "$PAGE_FILE" ]; then | ||
| PAGE_ID=$(cat "$PAGE_FILE") | ||
| curl -s -m 15 -X POST \ | ||
| "${ONCALL_URL}/api/v2/on-call/pages/${PAGE_ID}/resolve" \ | ||
| -H "DD-API-KEY: ${DD_API_KEY}" \ | ||
| -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" | ||
| rm -f "$PAGE_FILE" | ||
| fi | ||
| elif [ "$STATE" = "CRITICAL" ] || [ "$STATE" = "DOWN" ] || [ "$STATE" = "WARNING" ]; then | ||
| # Create page | ||
| RESPONSE=$(curl -s -m 15 -X POST \ | ||
| "${ONCALL_URL}/api/v2/on-call/pages" \ | ||
| -H "DD-API-KEY: ${DD_API_KEY}" \ | ||
| -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d "{ | ||
| \"data\": { | ||
| \"type\": \"pages\", | ||
| \"attributes\": { | ||
| \"title\": \"Nagios: ${HOSTNAME} / ${SERVICEDESC} is ${STATE}\", | ||
| \"description\": \"${OUTPUT}\", | ||
| \"urgency\": \"high\", | ||
| \"tags\": [\"integration:nagios\", \"service:${SERVICEDESC}\", \"host:${HOSTNAME}\"], | ||
| \"target\": { | ||
| \"identifier\": \"${ONCALL_TEAM}\", | ||
| \"type\": \"team_handle\" | ||
| } | ||
| } | ||
| } | ||
| }") | ||
|
|
||
| # Save page ID for later resolution | ||
| PAGE_ID=$(printf '%s' "$RESPONSE" | sed -n 's/.*"id":"\([^"]*\)".*/\1/p') | ||
| if [ -n "$PAGE_ID" ]; then | ||
| printf '%s' "$PAGE_ID" > "$PAGE_FILE" | ||
| fi | ||
| fi | ||
| ``` | ||
|
|
||
| Make the script executable: | ||
|
|
||
| ```shell | ||
| sudo chmod 755 /usr/local/nagios/libexec/notify_datadog_oncall.sh | ||
| ``` | ||
|
|
||
| #### Define the Nagios commands | ||
|
|
||
| Add to `commands.cfg`. Use separate commands for service and host notifications so the correct Nagios macros are passed: | ||
|
|
||
| ```nagios | ||
| define command { | ||
| command_name notify-datadog-oncall-service | ||
| command_line /usr/local/nagios/libexec/notify_datadog_oncall.sh "$NOTIFICATIONTYPE$" "$HOSTALIAS$" "$SERVICEDESC$" "$SERVICESTATE$" "$_CONTACTONCALL_TEAM$" "$SERVICEOUTPUT$" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The command definition only passes service macros ( Useful? React with 👍 / 👎. |
||
| } | ||
|
|
||
| define command { | ||
| command_name notify-datadog-oncall-host | ||
| command_line /usr/local/nagios/libexec/notify_datadog_oncall.sh "$NOTIFICATIONTYPE$" "$HOSTALIAS$" "Host" "$HOSTSTATE$" "$_CONTACTONCALL_TEAM$" "$HOSTOUTPUT$" | ||
| } | ||
| ``` | ||
|
|
||
| #### Create contacts with the On-Call team handle | ||
|
|
||
| The custom variable `_oncall_team` sets the Datadog On-Call team handle per contact. Add contacts to `contacts.cfg`: | ||
|
|
||
| ```nagios | ||
| define contact { | ||
| contact_name datadog-ops | ||
| alias Ops Team On-Call | ||
| service_notification_period 24x7 | ||
| host_notification_period 24x7 | ||
| service_notification_options w,u,c,r | ||
| host_notification_options d,u,r | ||
| service_notification_commands notify-datadog-oncall-service | ||
| host_notification_commands notify-datadog-oncall-host | ||
| _oncall_team ops | ||
| } | ||
| ``` | ||
|
|
||
| The `_oncall_team` value (for example, `ops`) must match the team handle configured in [Datadog On-Call][12]. | ||
|
|
||
| #### Assign the contact to services or hosts | ||
|
|
||
| ```nagios | ||
| define service { | ||
| use generic-service | ||
| host_name webserver-01 | ||
| service_description HTTP_Service | ||
| check_command check_http | ||
| contacts datadog-ops | ||
| notification_options w,u,c,r | ||
| } | ||
| ``` | ||
|
|
||
| #### Reload Nagios | ||
|
|
||
| ```shell | ||
| sudo systemctl reload nagios | ||
| ``` | ||
|
|
||
| Verify pages appear under **On-Call > Pages** in Datadog. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| Need help? Contact [Datadog support][9]. | ||
|
|
@@ -123,3 +277,5 @@ Need help? Contact [Datadog support][9]. | |
| [8]: https://docs.datadoghq.com/agent/guide/agent-commands/#agent-status-and-information | ||
| [9]: https://docs.datadoghq.com/help/ | ||
| [10]: https://www.datadoghq.com/blog/nagios-monitoring | ||
| [11]: https://docs.datadoghq.com/api/latest/on-call-paging/ | ||
| [12]: https://docs.datadoghq.com/service_management/on-call/ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.