Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pages/validators/_meta.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"setup-guide": "Setup Guide",
"monitoring": "Monitoring",
"monitoring": "Monitoring & Telemetry",
"system-requirements": "System Requirements",
"genvm-configuration": "GenVM Configuration",
"upgrade": "Upgrade Guide",
Expand Down
180 changes: 133 additions & 47 deletions pages/validators/monitoring.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { Callout } from "nextra-theme-docs";

# Monitoring Your Validator
# Monitoring & Telemetry

GenLayer validators expose comprehensive metrics that are ready for consumption by Prometheus and other monitoring tools. This allows you to monitor your validator's performance, health, and resource usage.

Expand Down Expand Up @@ -92,37 +92,53 @@ To contribute your node's metrics and logs to the centralized GenLayer Foundatio
- Metrics enabled in `config.yaml` (`endpoints.metrics: true` — default in recent versions).
- Ops port 9153 exposed in docker-compose (`ports: - "9153:9153"`).
- Credentials from the Foundation team (ask in #testnet-asimov):
- `CENTRAL_MONITORING_URL` — Prometheus remote write base URL (e.g., `https://prometheus-prod-XX.grafana.net`)
- `CENTRAL_LOKI_URL` — Loki push base URL (e.g., `https://logs-prod-XX.grafana.net`)
- `MONITORING_USERNAME` — Instance ID (a number)
- `MONITORING_PASSWORD` — Grafana Cloud API Key (with write permissions for metrics and logs)
- `CENTRAL_MONITORING_URL` — Prometheus remote write URL
- `CENTRAL_LOKI_URL` — Loki push URL
- `CENTRAL_MONITORING_USERNAME` / `CENTRAL_MONITORING_PASSWORD` — Metrics (Prometheus) credentials
- `CENTRAL_LOKI_USERNAME` / `CENTRAL_LOKI_PASSWORD` — Logs (Loki) credentials

**Steps**

1. **Create or update .env** (next to your docker-compose.yaml):

```env
# Grafana Cloud credentials (request from Foundation team in Discord)
CENTRAL_MONITORING_URL=https://prometheus-prod-...grafana.net
CENTRAL_LOKI_URL=https://logs-prod-...grafana.net
MONITORING_USERNAME=1234567890 # your instance ID
MONITORING_PASSWORD=glc_xxxxxxxxxxxxxxxxxxxxxxxxxxxx # API key

# Your node labels (customize for easy filtering in dashboards)
NODE_ID=0xYourValidatorAddressOrCustomID
VALIDATOR_NAME=validatorname
# Central monitoring server endpoints for GenLayer Foundation
CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push
CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net/loki/api/v1/push

# Authentication for central monitoring
# Metrics (Prometheus) credentials
CENTRAL_MONITORING_USERNAME=your-metrics-username
CENTRAL_MONITORING_PASSWORD=your-metrics-password
# Logs (Loki) credentials
CENTRAL_LOKI_USERNAME=your-logs-username
CENTRAL_LOKI_PASSWORD=your-logs-password

# Node identification
NODE_ID=validator-001
VALIDATOR_NAME=MyValidator

# Usually defaults are fine
NODE_METRICS_ENDPOINT=localhost:9153
NODE_METRICS_ENDPOINT=host.docker.internal:9153
LOG_FILE_PATTERN=/var/log/genlayer/node*.log
METRICS_SCRAPE_INTERVAL=15s
```

2. **Add or verify the Alloy service in docker-compose.yaml (copy if missing):
2. **Add or verify the Alloy service in docker-compose.yaml** (copy if missing):

```yaml
# Grafana Alloy for both logs and metrics forwarding
# Supports both single node and multi-node configurations
#
# Single Node Mode:
# Set NODE_ID, VALIDATOR_NAME, NODE_METRICS_ENDPOINT in .env
# docker compose --profile monitoring up -d
#
# Multi-Node Mode:
# Set SCRAPE_TARGETS_JSON in .env
# docker compose --profile monitoring up -d
alloy:
image: grafana/alloy:latest
image: grafana/alloy:v1.12.0
container_name: genlayer-node-alloy
command:
- run
Expand All @@ -134,25 +150,45 @@ alloy:
- ${NODE_LOGS_PATH:-./data/node/logs}:/var/log/genlayer:ro
- alloy_data:/var/lib/alloy
environment:
- CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL}
- CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL}
- MONITORING_USERNAME=${MONITORING_USERNAME}
- MONITORING_PASSWORD=${MONITORING_PASSWORD}
- NODE_ID=${NODE_ID}
- VALIDATOR_NAME=${VALIDATOR_NAME}
- NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT}
# Central monitoring endpoints
- CENTRAL_LOKI_URL=${CENTRAL_LOKI_URL:-https://logs-prod-042.grafana.net/loki/api/v1/push}
- CENTRAL_MONITORING_URL=${CENTRAL_MONITORING_URL:-https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push}

# Metrics (Prometheus) authentication
- CENTRAL_MONITORING_USERNAME=${CENTRAL_MONITORING_USERNAME:-telemetric}
- CENTRAL_MONITORING_PASSWORD=${CENTRAL_MONITORING_PASSWORD:-12345678}

# Logs (Loki) authentication
- CENTRAL_LOKI_USERNAME=${CENTRAL_LOKI_USERNAME:-telemetric}
- CENTRAL_LOKI_PASSWORD=${CENTRAL_LOKI_PASSWORD:-12345678}

# Single node configuration
- NODE_ID=${NODE_ID:-local}
- VALIDATOR_NAME=${VALIDATOR_NAME:-default}
- NODE_METRICS_ENDPOINT=${NODE_METRICS_ENDPOINT:-host.docker.internal:9153}

# Multi-node configuration
# When set, overrides single node config above
- SCRAPE_TARGETS_JSON=${SCRAPE_TARGETS_JSON:-}

# Scraping configuration
- METRICS_SCRAPE_INTERVAL=${METRICS_SCRAPE_INTERVAL:-15s}
- METRICS_SCRAPE_TIMEOUT=${METRICS_SCRAPE_TIMEOUT:-10s}
- ALLOY_SELF_MONITORING_INTERVAL=${ALLOY_SELF_MONITORING_INTERVAL:-60s}

# Log collection configuration
- LOG_FILE_PATTERN=${LOG_FILE_PATTERN:-/var/log/genlayer/node*.log}

# Log batching configuration
- LOKI_BATCH_SIZE=${LOKI_BATCH_SIZE:-1MiB}
- LOKI_BATCH_WAIT=${LOKI_BATCH_WAIT:-1s}
ports:
- "12345:12345" # Alloy UI for debugging
restart: unless-stopped
profiles:
- monitoring
extra_hosts:
- "host.docker.internal:host-gateway"

volumes:
alloy_data:
Expand All @@ -168,82 +204,128 @@ volumes:
// Log Collection and Forwarding
// ==========================================

// Discovery component to find log files using local.file_match
// Supports different log file patterns:
// - Single node: "/var/log/genlayer/node.log"
// - Multi-node: "/var/log/genlayer/*/logs/node.log" (each node in subdirectory)
// - Custom pattern via LOG_FILE_PATTERN env var
local.file_match "genlayer_logs" {
path_targets = [{
__path__ = coalesce(env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
__path__ = coalesce(sys.env("LOG_FILE_PATTERN"), "/var/log/genlayer/node*.log"),
}]
}

// Relabel to add metadata labels to log entries
discovery.relabel "add_labels" {
targets = local.file_match.genlayer_logs.targets

// Add instance label from environment variable
rule {
target_label = "instance"
replacement = env("NODE_ID")
replacement = sys.env("NODE_ID")
}

// Add validator_name label from environment variable
rule {
target_label = "validator_name"
replacement = env("VALIDATOR_NAME")
replacement = sys.env("VALIDATOR_NAME")
}

// Add component label
rule {
target_label = "component"
replacement = "alloy"
}

// Add job label
rule {
target_label = "job"
replacement = "genlayer-node"
}
}

// Source component to read log files
loki.source.file "genlayer" {
targets = discovery.relabel.add_labels.output
forward_to = [loki.write.central.receiver]

// Tail from end to avoid ingesting entire log history on startup
tail_from_end = true
}

// Write logs to central Loki instance
loki.write "central" {
endpoint {
url = env("CENTRAL_LOKI_URL") + "/loki/api/v1/push"
url = sys.env("CENTRAL_LOKI_URL")

// HTTP Basic Authentication
basic_auth {
username = env("MONITORING_USERNAME")
password = env("MONITORING_PASSWORD")
username = sys.env("CENTRAL_LOKI_USERNAME")
password = sys.env("CENTRAL_LOKI_PASSWORD")
}
batch_size = coalesce(env("LOKI_BATCH_SIZE"), "1MiB")
batch_wait = coalesce(env("LOKI_BATCH_WAIT"), "1s")

// Configurable batch settings for efficient log sending
batch_size = coalesce(sys.env("LOKI_BATCH_SIZE"), "1MiB")
batch_wait = coalesce(sys.env("LOKI_BATCH_WAIT"), "1s")
}
}

// ==========================================
// Prometheus Metrics Collection and Forwarding
// ==========================================

// Scrape metrics from GenLayer node(s)
// Supports both single node and multi-node configurations
//
// Single Node Mode:
// Set NODE_METRICS_ENDPOINT, NODE_ID, VALIDATOR_NAME
//
// Multi-Node Mode:
// Set SCRAPE_TARGETS_JSON with JSON array of target objects
// Example: [{"__address__":"host.docker.internal:9250","instance":"0x...","validator_name":"node-1"}]
prometheus.scrape "genlayer_node" {
targets = json_decode(coalesce(env("SCRAPE_TARGETS_JSON"), format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(env("NODE_METRICS_ENDPOINT"), "localhost:9153"), coalesce(env("NODE_ID"), "local"), coalesce(env("VALIDATOR_NAME"), "default"))))
// Dynamic targets based on environment variable
// If SCRAPE_TARGETS_JSON is set, use it (multi-node mode)
// Otherwise, build single target from individual env vars (single node mode)
targets = encoding.from_json(coalesce(sys.env("SCRAPE_TARGETS_JSON"), string.format("[{\"__address__\":\"%s\",\"instance\":\"%s\",\"validator_name\":\"%s\"}]", coalesce(sys.env("NODE_METRICS_ENDPOINT"), "host.docker.internal:9153"), coalesce(sys.env("NODE_ID"), "local"), coalesce(sys.env("VALIDATOR_NAME"), "default"))))

forward_to = [prometheus.relabel.metrics.receiver]
scrape_interval = coalesce(env("METRICS_SCRAPE_INTERVAL"), "15s")
scrape_timeout = coalesce(env("METRICS_SCRAPE_TIMEOUT"), "10s")

// Configurable scrape intervals
scrape_interval = coalesce(sys.env("METRICS_SCRAPE_INTERVAL"), "15s")
scrape_timeout = coalesce(sys.env("METRICS_SCRAPE_TIMEOUT"), "10s")
}

// Relabel metrics to filter before forwarding
prometheus.relabel "metrics" {
forward_to = [prometheus.remote_write.central.receiver]
// Optional: filter only GenLayer metrics to save bandwidth
// rule {
// source_labels = ["__name__"]
// regex = "genlayer_.*"
// action = "keep"
// }

// Option 1: Forward all metrics (default)
// Currently forwarding all metrics from the node.

// Option 2: Only keep genlayer_node_* metrics to reduce bandwidth (recommended)
// To enable filtering and reduce bandwidth, uncomment the following rule:
/*
rule {
source_labels = ["__name__"]
regex = "genlayer_node_.*"
action = "keep"
}
*/
}

// Remote write configuration for sending metrics to central Prometheus
prometheus.remote_write "central" {
endpoint {
url = env("CENTRAL_MONITORING_URL") + "/api/v1/write"
url = sys.env("CENTRAL_MONITORING_URL")

// HTTP Basic Authentication
basic_auth {
username = env("MONITORING_USERNAME")
password = env("MONITORING_PASSWORD")
username = sys.env("CENTRAL_MONITORING_USERNAME")
password = sys.env("CENTRAL_MONITORING_PASSWORD")
}

// Queue configuration for reliability
queue_config {
capacity = 10000
max_shards = 5
Expand All @@ -257,12 +339,16 @@ prometheus.remote_write "central" {
// Alloy Self-Monitoring
// ==========================================

// Alloy internal exporter for health monitoring
prometheus.exporter.self "alloy" {}

// Expose Alloy's own metrics on the HTTP server
prometheus.scrape "alloy" {
targets = prometheus.exporter.self.alloy.targets
forward_to = []
scrape_interval = coalesce(env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
forward_to = [] // Not forwarding Alloy metrics to reduce noise

// Configurable scrape interval for Alloy's internal health monitoring
scrape_interval = coalesce(sys.env("ALLOY_SELF_MONITORING_INTERVAL"), "60s")
}
```

Expand Down Expand Up @@ -298,7 +384,7 @@ curl http://localhost:9153/metrics
```

— it should return Prometheus-formatted data.
Authentication errors (401/403): Double-check MONITORING_USERNAME and MONITORING_PASSWORD in .env.
Authentication errors (401/403): Double-check CENTRAL_MONITORING_USERNAME, CENTRAL_MONITORING_PASSWORD, CENTRAL_LOKI_USERNAME, and CENTRAL_LOKI_PASSWORD in .env.
No data pushed: Ensure URLs in .env have no trailing slash.
Help: Share Alloy logs

Expand Down
26 changes: 16 additions & 10 deletions pages/validators/setup-guide.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -626,22 +626,28 @@ source .env && docker run --rm --env-file ./.env \
source .env && docker compose --profile node up -d
```

### Quick Start with Docker Compose
### Telemetry

For a quick setup using Docker Compose:

1. **Configure your `.env` file** with the required variables for monitoring:

```env copy
# Grafana Cloud credentials (request from Foundation team in Discord #testnet-asimov)
CENTRAL_MONITORING_URL=https://prometheus-prod-...grafana.net
CENTRAL_LOKI_URL=https://logs-prod-...grafana.net
MONITORING_USERNAME=1234567890 # your instance ID
MONITORING_PASSWORD=glc_xxxxxxxxxxxxxxxxxxxxxxxxxxxx # API key

# Your node labels (customize for easy filtering in dashboards)
NODE_ID=0xYourValidatorAddressOrCustomID
VALIDATOR_NAME=validatorname
# Central monitoring server endpoints for GenLayer Foundation
CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push
CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net/loki/api/v1/push

# Authentication for central monitoring
# Metrics (Prometheus) credentials
CENTRAL_MONITORING_USERNAME=your-metrics-username
CENTRAL_MONITORING_PASSWORD=your-metrics-password
# Logs (Loki) credentials
CENTRAL_LOKI_USERNAME=your-logs-username
CENTRAL_LOKI_PASSWORD=your-logs-password

# Node identification
NODE_ID=validator-001
VALIDATOR_NAME=MyValidator
Comment on lines +629 to +650
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for Alloy config files that use CENTRAL_* variables
fd -type f -name "*.alloy" -o -name "*alloy*" -type f | head -20

Repository: genlayerlabs/genlayer-docs

Length of output: 240


🏁 Script executed:

# Search for mentions of CENTRAL_MONITORING_URL and CENTRAL_LOKI_URL in codebase
rg "CENTRAL_MONITORING_URL|CENTRAL_LOKI_URL" -A 3 -B 3

Repository: genlayerlabs/genlayer-docs

Length of output: 3527


🏁 Script executed:

# Search for monitoring-related documentation
fd -type f \( -name "*monitoring*" -o -name "*telemetry*" \) | grep -E "\.(md|mdx)$"

Repository: genlayerlabs/genlayer-docs

Length of output: 240


🏁 Script executed:

# Look for docker-compose files that may show how these env vars are used
fd -type f -name "docker-compose*.yml" -o -name "docker-compose*.yaml"

Repository: genlayerlabs/genlayer-docs

Length of output: 240


Fix double-path bug in CENTRAL_ URLs — remove path components from env values.*

Lines 635-638 show full endpoint paths, but the Alloy config in pages/validators/monitoring.mdx appends additional paths:

  • CENTRAL_LOKI_URL + /loki/api/v1/push → malformed double path
  • CENTRAL_MONITORING_URL + /api/v1/write → wrong endpoint path

The .env values must be base URLs only:

-CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net/api/prom/push
-CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net/loki/api/v1/push
+CENTRAL_MONITORING_URL=https://prometheus-prod-66-prod-us-east-3.grafana.net
+CENTRAL_LOKI_URL=https://logs-prod-042.grafana.net

Apply the same fix to pages/validators/monitoring.mdx for consistency.

🤖 Prompt for AI Agents
In `@pages/validators/setup-guide.mdx` around lines 629 - 650, The
CENTRAL_MONITORING_URL and CENTRAL_LOKI_URL env values currently include full
endpoint paths, causing duplicated/malformed URLs when the Alloy config appends
its own paths; change those env variables (CENTRAL_MONITORING_URL,
CENTRAL_LOKI_URL) to be base URLs only (e.g.,
https://prometheus-prod-66-prod-us-east-3.grafana.net and
https://logs-prod-042.grafana.net) and remove any trailing path segments so the
monitoring config can safely append /api/v1/write or /loki/api/v1/push; apply
the same change to the corresponding examples used by the Alloy monitoring
config to keep both docs consistent.

```

2. **Run docker compose**:
Expand Down