Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions docs/Collecting Metrics/Collectors/Networking/SNMP devices.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,34 @@ This collector discovers and monitors any SNMP-enabled network device.

> This table highlights common vendors—the **full library includes many more**.

**SNMP BGP monitoring**

Netdata ships BGP monitoring profiles for generic `BGP4-MIB` devices and vendor MIBs including Cisco, Juniper, Nokia SR OS, Huawei, Arista, and Dell.

The operator-facing BGP charts are normalized under:

- `snmp.bgp.peers.*`
- `snmp.bgp.peer_families.*`
- `snmp.bgp.devices.peer_counts`
- `snmp.bgp.devices.peer_states`

Rich per-peer diagnostics such as previous state, last error, graceful-restart state, and vendor unavailability reasons are exposed through the **Live** function `snmp:bgp-peers` instead of being charted as regular time-series.

This SNMP BGP surface is designed for:

- peer/session availability and FSM state
- established uptime
- BGP UPDATE and message traffic
- route-count monitoring where the vendor MIB exposes truthful counts
- stock alerts for peer down, update churn, transition anomalies, and accepted-prefix drift

**Important limits**

- Standard `BGP4-MIB` gives peer health and message counters, but **not** full route-count coverage.
- Some route counters are **current gauges**, while others are **cumulative totals**. Netdata keeps them separate instead of flattening unlike semantics into one fake chart.
- Huawei contributes to device-level **peer/session counts**, but not device-level **peer state counts** in this SNMP batch.
- SNMP does **not** provide live per-route inventory. If you need “all routes to and from a peer in real time”, that belongs to BMP, not this integration page.


:::info

Expand Down Expand Up @@ -368,6 +396,30 @@ jobs:
```
</details>

###### BGP router with forced profile

Use `manual_profiles` when auto-detection cannot safely distinguish the device, or when you want to force a specific vendor BGP profile during testing.

This example targets a Cisco ASR router and keeps the optional ICMP latency charts enabled.


<details open>
<summary>Config</summary>

```yaml
jobs:
- name: edge-router
update_every: 10
hostname: 192.0.2.10
community: public
manual_profiles:
- cisco-asr
options:
version: 2

```
</details>



## Alerts
Expand All @@ -384,6 +436,13 @@ The following alerts are available:
| [ snmp_license_state_warning ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp.conf) | snmp.license.state | One or more monitored licenses on this device are degraded, in grace, or otherwise in warning state. |
| [ snmp_license_state_critical ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp.conf) | snmp.license.state | One or more monitored licenses on this device are expired, invalid, unauthorized, or otherwise in critical state. |
| [ snmp_license_usage_high ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp.conf) | snmp.license.usage_percent | The most constrained monitored license pool on this device is nearing exhaustion. |
| [ snmp_bgp_peer_down ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peers.availability | BGP peer is administratively enabled but remains out of Established |
| [ snmp_bgp_peer_family_down ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peer_families.availability | BGP peer-family is administratively enabled but remains out of Established |
| [ snmp_bgp_peer_transitions_anomaly ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peers.established_transitions | ML anomaly detection on per-peer established transition activity |
| [ snmp_bgp_peer_family_transitions_anomaly ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peer_families.established_transitions | ML anomaly detection on per-peer-family established transition activity |
| [ snmp_bgp_peer_updates_anomaly ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peers.update_traffic | ML anomaly detection on per-peer BGP UPDATE traffic |
| [ snmp_bgp_peer_family_updates_anomaly ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peer_families.update_traffic | ML anomaly detection on per-peer-family BGP UPDATE traffic |
| [ snmp_bgp_peer_family_prefixes_accepted_anomaly ](https://github.com/netdata/netdata/blob/master/src/health/health.d/snmp_bgp.conf) | snmp.bgp.peer_families.route_counts.current | ML anomaly detection on accepted-prefix gauges where the vendor MIB exposes them |


## Metrics
Expand Down Expand Up @@ -427,6 +486,32 @@ To understand the structure of these profiles (metrics, tags, virtual metrics, e

If `ping.enabled` is true, ICMP latency/packet-loss charts are also provided (or exclusively, when `ping_only: true`).

**For BGP-capable profiles, the public chart contract is:**

- `snmp.bgp.peers.*` for one BGP peer/session per chart instance
- `snmp.bgp.peer_families.*` for one peer plus AFI/SAFI per chart instance
- `snmp.bgp.devices.peer_counts` for device-level peer/session counts
- `snmp.bgp.devices.peer_states` for device-level peer-state summaries where the source MIB exposes canonical peer rows
- Rich peer diagnostics live in the Live function `snmp:bgp-peers`, not in charted time-series

**BGP capability notes**

| Vendor / MIB surface | Peer charts | Peer-family charts | Device peer counts | Device peer states | Route counts |
|----------------------|-------------|--------------------|--------------------|--------------------|--------------|
| Standard `BGP4-MIB` | Yes | No | Yes | Yes | No |
| Cisco ASR | Yes | Yes | Yes | Yes | Yes |
| Juniper MX | Yes | Yes | Yes | Yes | Yes |
| Nokia SR OS | Yes | Yes | Yes | Yes | Yes |
| Arista | Yes | Yes | Yes | Yes | Yes |
| Dell OS10 | Yes | Yes | Yes | Yes | Yes |
| Huawei | Partial | Yes | Yes | No | Totals only |

**Interpretation guidance**

- `route_counts.current` contains current gauges such as received, accepted, advertised, active, suppressed, or withdrawn prefixes when the vendor MIB exposes them.
- `route_totals` contains cumulative counters where the vendor MIB only exposes totals.
- When the source model is peer-family scoped, alerts and chart labels include AFI/SAFI so operators can distinguish otherwise similar peers.


### Per device licensing

Expand Down Expand Up @@ -514,6 +599,65 @@ Network interface metrics from cached SNMP data, including traffic rates, packet
| Multicast In | float | packets/s | hidden | Rate of multicast packets (destined for a group) received per second. Common in video streaming, multicast applications, and routing protocols. |
| Multicast Out | float | packets/s | hidden | Rate of multicast packets transmitted per second. |

### BGP Peers

Provides detailed current BGP peer and peer-family state from cached SNMP data.

This function uses the normalized BGP surface produced during regular SNMP polling and presents it as a sortable, filterable troubleshooting table. It is designed for details that are useful operationally but should not be charted as regular time-series, such as previous state, last error, last down reason, graceful restart state, and vendor-specific unavailability reasons.

Use cases:
- Identify exactly which peer or peer-family is unhealthy right now
- See the most recent BGP NOTIFICATION error as human-readable text
- Inspect peer identity, AFI/SAFI scope, prefix gauges, and current troubleshooting context in one view

Data is sourced from the last successful SNMP collection cycle. No additional SNMP requests are triggered when calling this function.


| Aspect | Description |
|:-------|:------------|
| Name | `Snmp:bgp-peers` |
| Require Cloud | no |
| Performance | Uses cached normalized SNMP data only, no additional SNMP requests are triggered:<br/>• Responses are instantaneous from memory cache<br/>• Large devices with many peers or peer-families may return many rows |
| Security | Exposes current BGP control-plane state and identifiers only:<br/>• No authentication credentials are exposed<br/>• No device configuration changes are triggered<br/>• No packet payloads or full route inventory are exposed |
| Availability | Available when:<br/>• The collector has completed at least one successful BGP-capable SNMP collection cycle<br/>• BGP peer data exists for the matched profile(s)<br/>• Returns HTTP 503 if no BGP rows are available yet |

#### Prerequisites

No additional configuration is required.

#### Parameters

| Parameter | Type | Description | Required | Default | Options |
|:---------|:-----|:------------|:--------:|:--------|:--------|
| View | select | Choose whether to show peer rows, peer-family rows, or both. | yes | peers | Peers (default), Peer Families, All |

#### Returns

Current BGP peer and peer-family details from cached normalized SNMP data. Each row represents either one peer or one peer plus AFI/SAFI, depending on the selected view. Additional hidden columns provide raw codes, message totals, and threshold fields for deeper inspection in the UI.

| Column | Type | Unit | Visibility | Description |
|:-------|:-----|:-----|:-----------|:------------|
| Scope | string | | | Whether the row represents a peer or a peer-family. |
| Routing Instance | string | | | Routing-instance / VRF identifier when exposed by the source MIB. |
| Neighbor | string | | | Remote peer address. |
| Local Address | string | | | Local address used for the BGP session when exposed by the source MIB. |
| Remote AS | string | | | Remote Autonomous System number. |
| Peer Description | string | | | Peer description or label when exposed by the source MIB. |
| Family | string | | | Address-family / SAFI scope for peer-family rows. |
| Admin Status | string | | | Whether the peer is administratively enabled. |
| Connection State | string | | | Current BGP FSM state. |
| Previous State | string | | | Previous FSM state when the source MIB exposes it. |
| Established Uptime | integer | seconds | | Time spent in the Established state. |
| Last Update Age | integer | seconds | | Time since the last received UPDATE. |
| Updates Received | integer | updates | | Current received UPDATE counter from the latest poll. |
| Updates Sent | integer | updates | | Current sent UPDATE counter from the latest poll. |
| Prefixes Accepted | integer | prefixes | | Current accepted-prefix gauge where the source MIB exposes it. |
| Prefixes Advertised | integer | prefixes | | Current advertised-prefix gauge where the source MIB exposes it. |
| Last Error | string | | | Human-readable BGP last-error text derived from the code/subcode pair when available. |
| Down Reason | string | | | Last peer-down reason when the source MIB exposes it. |
| GR State | string | | | Graceful-restart state for peer-family scoped rows when exposed by the source MIB. |
| Unavailability Reason | string | | | Vendor-specific unavailability reason for peer-family scoped rows when exposed by the source MIB. |

### Network Topology

Provides the agent-wide SNMP topology view built from all currently running topology-enabled SNMP jobs.
Expand Down
121 changes: 120 additions & 1 deletion docs/Collecting Metrics/SNMP Profile Format.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ extends: <base profiles to include>
metadata: <device information>
metrics: <what to collect>
topology: <what to collect for topology>
bgp: <what to collect for typed BGP monitoring>
licensing: <what to collect for typed licensing>
metric_tags: <global tags>
static_tags: <static tags>
virtual_metrics: <calculated metrics>
Expand All @@ -175,6 +177,8 @@ virtual_metrics: <calculated metrics>
| [**metadata**](#3-metadata) | Collects device-level information (host labels). |
| [**metrics**](#4-metrics) | Defines which OIDs to collect and how to chart them. |
| [**topology**](#41-topology) | Defines SNMP topology rows and their topology kind. |
| [**bgp**](#bgp-rows) | Defines typed BGP device, peer, and peer-family rows. |
| [**licensing**](#licensing-rows) | Defines typed license rows. |
| [**metric_tags**](#5-metric_tags) | Defines global dynamic tags collected once per device and attached to all metrics. |
| [**static_tags**](#6-static_tags) | Defines fixed tags applied to all metrics. |
| [**virtual_metrics**](#7-virtual_metrics) | Defines calculated or aggregated metrics based on others. |
Expand Down Expand Up @@ -1246,7 +1250,7 @@ metrics:

**Important behavior**:

- `lookup_symbol` is used only for **cross-table tags**
- `lookup_symbol` is used for **cross-table tags** and typed BGP value fields
- it works together with `index_transform`, not instead of it
- if no row matches, the tag lookup fails for that row
- if one row matches, the collector reads the requested tag from that row
Expand Down Expand Up @@ -1355,6 +1359,23 @@ The two index extraction mechanisms use different counting bases:
So `index: 1` and `index_transform: [{start: 0, end: 0}]` both extract the
first index component.

Typed BGP value fields also support `index_from_end: N`, where `N` is
1-based from the right side of the row index. Use it only when the target
INDEX component is a trailing component after a variable-length field, such as
AFI/SAFI after an `InetAddress` peer address.

Use exactly one row-index selector per typed BGP value: `index`,
`index_from_end`, or `index_transform`. Profile validation rejects typed BGP
values that set more than one of these selectors.

Typed BGP cross-table value fields can also use `lookup_symbol` with
`table:` and `index_transform:`. This is needed when a BGP peer-family table is
indexed by a compact peer ID, but peer identity fields such as neighbor and
remote AS live in a peer table keyed by a different composite index. The
collector extracts the lookup value from the current row index, finds the row
in the referenced table whose `lookup_symbol` column has that value, and then
reads the requested typed value symbol from the matched row.

Examples:

- `Q-BRIDGE-MIB::dot1qTpFdbAddress` is `not-accessible` and is part of the
Expand Down Expand Up @@ -2198,6 +2219,104 @@ What this does
- Each `as` becomes a **dimension** (`in_ucast`, `out_ucast`, `in_mcast`, …).
- No `per_row`/`group_by` → totals aggregated across all interfaces.

## BGP rows

The SNMP collector ships a shared BGP pipeline that turns vendor-specific BGP
MIB rows into typed device, peer, and peer-family rows. Profiles describe this
telemetry in a top-level `bgp:` section. The collector emits typed BGP rows from
that section; underscore-prefixed helper tags and `virtual_metrics:` aliases
are legacy migration mechanisms, not the preferred BGP transport.

### Authoring contract

BGP row `kind` values are closed:

- `device` — device-level BGP summaries with no peer identity.
- `peer` — peer-level rows identified by `neighbor` and `remote_as`.
- `peer_family` — address-family rows identified by `neighbor`, `remote_as`,
`address_family`, and `subsequent_address_family`.

Peer-state mappings must use the six RFC 4271 state names: `idle`, `connect`,
`active`, `opensent`, `openconfirm`, and `established`. A complete source must
map all six states. If a source MIB is intentionally partial, set
`partial: true` and use `partial_states: [...]` to record which canonical
states the source can represent.

When a peer or peer-family row does not provide a routing instance, the public
chart/function label defaults to `default`. Profiles may still set
`identity.routing_instance` explicitly when a vendor MIB exposes VRF or routing
instance identity.

Device rows support `device_counts.peers`, `device_counts.ibgp_peers`,
`device_counts.ebgp_peers`, and per-state counters under
`device_counts.states`. Peer and peer-family rows support typed groups such as
`admin`, `state`, `connection`, `traffic`, `transitions`, `timers`,
`last_error`, `last_notifications`, `reasons`, `graceful_restart`, `routes`,
and `route_limits`.

For table-backed rows, readable columns use `symbol:`. `not-accessible` index
objects must be derived from the row index with `index`, `index_from_end`, or
`index_transform`. Values from related tables use the first-class `table:`
field on the BGP value, optionally with `lookup_symbol:` when the current row
must join to another table by value.

Example device-level counts:

```yaml
bgp:
- id: vendor-bgp-device-counts
MIB: VENDOR-BGP-MIB
kind: device
device_counts:
peers:
symbol: { OID: 1.3.6.1.4.1.99999.1, name: vendor.bgpPeerSessionNum }
ibgp_peers:
symbol: { OID: 1.3.6.1.4.1.99999.2, name: vendor.iBgpPeerSessionNum }
ebgp_peers:
symbol: { OID: 1.3.6.1.4.1.99999.3, name: vendor.eBgpPeerSessionNum }
```

Example peer-family row:

```yaml
bgp:
- id: vendor-bgp-peer-family
MIB: VENDOR-BGP-MIB
kind: peer_family
table:
OID: 1.3.6.1.4.1.99999.10
name: vendorBgpPeerTable
identity:
routing_instance: { value: default }
neighbor:
symbol: { OID: 1.3.6.1.4.1.99999.10.1.4, name: vendorBgpPeerRemoteAddr, format: ip_address }
remote_as:
symbol: { OID: 1.3.6.1.4.1.99999.10.1.5, name: vendorBgpPeerRemoteAs, format: uint32 }
address_family:
index: 1
mapping: { 1: ipv4, 2: ipv6, 25: l2vpn }
subsequent_address_family:
index: 2
mapping: { 1: unicast, 128: vpn }
state:
symbol:
OID: 1.3.6.1.4.1.99999.10.1.6
name: vendorBgpPeerState
mapping:
1: idle
2: connect
3: active
4: opensent
5: openconfirm
6: established
traffic:
updates:
received:
symbol: { OID: 1.3.6.1.4.1.99999.10.1.7, name: vendorBgpPeerInUpdates }
sent:
symbol: { OID: 1.3.6.1.4.1.99999.10.1.8, name: vendorBgpPeerOutUpdates }
```

## Licensing rows

The SNMP collector ships a **shared device-level licensing pipeline** that
Expand Down
Loading