| sidebar_position | 7 |
|---|---|
| title | API Reference |
The aggregator exposes two servers:
- gRPC (default
:50051) — agent data ingestion and programmatic queries - HTTP (default
:9090) — health checks, metrics, REST API for the web UI
| Method | Path | Description |
|---|---|---|
| GET | /healthz |
Liveness probe (returns ok) |
| GET | /readyz |
Readiness probe (checks buffer) |
| GET | /metrics |
Prometheus metrics (OpenMetrics text) |
| Method | Path | Description |
|---|---|---|
| POST | /api/aggregate |
Aggregate profiling events |
| POST | /api/diff |
Differential profiling (requires storage) |
| GET | /api/batches |
List recent ingested batches |
| GET | /api/health |
UI health info (buffer, push stats, ClickHouse) |
| Method | Path | Description |
|---|---|---|
| GET | /api/alerts |
List all alert rules |
| POST | /api/alerts |
Create a new alert rule |
| DELETE | /api/alerts/:id |
Delete an alert rule |
| POST | /api/alerts/:id/toggle |
Enable/disable an alert rule |
| GET | /api/alerts/history |
List fired alert events |
| POST | /api/alerts/evaluate |
Evaluate rules against current metrics |
| Method | Path | Description |
|---|---|---|
| GET | /api/export/json |
Download aggregated profile as JSON |
| GET | /api/export/collapsed |
Download CPU stacks in collapsed format |
Aggregate profiling events from the buffer or ClickHouse storage.
Request body (JSON):
{
"agent_id": "optional-agent-filter",
"time_start_ns": 1700000000000000000,
"time_end_ns": 1700000060000000000,
"limit": 100,
"event_type": "cpu"
}event_type:"cpu","lock","syscall", or omit for alllimit: max batches to aggregate (capped at 100)- All fields are optional
Response:
{
"cpu": {
"start_time": 1700000000000000000,
"end_time": 1700000060000000000,
"total_samples": 5000,
"sample_period_ns": 10000000,
"stacks": [
{
"stack": {
"frames": [
{ "ip": 4194304, "function": "main", "module": "myapp" }
]
},
"count": 150
}
]
},
"lock": { "..." : "..." },
"syscall": { "..." : "..." },
"total_events": 12000,
"skipped_batches": 0
}Compare two time windows or agent profiles. Requires ClickHouse storage.
Request body:
{
"baseline_agent_id": "agent-1",
"baseline_start_ns": 1700000000000000000,
"baseline_end_ns": 1700000030000000000,
"comparison_agent_id": "agent-1",
"comparison_start_ns": 1700000030000000000,
"comparison_end_ns": 1700000060000000000,
"event_type": "cpu",
"limit": 100
}Response:
{
"result_json": "{\"baseline_total\":2500,\"comparison_total\":3000,\"stacks\":[...]}",
"error": ""
}Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
agent_id |
string | (all) | Filter by agent |
limit |
number | 100 | Max results |
Response:
{
"batches": [
{
"agent_id": "agent-abc123",
"sequence": 42,
"event_count": 500,
"received_at_ns": 1700000000000000000
}
],
"error": ""
}Response:
{
"status": "healthy",
"buffer_batches": 150,
"buffer_utilization": 0.015,
"storage_enabled": true,
"push_total_ok": 1500,
"push_total_error": 0,
"push_events_total": 75000,
"clickhouse_flush_ok": 30,
"clickhouse_flush_error": 0,
"clickhouse_pending_rows": 0
}Create a new alert rule.
Request body:
{
"name": "High buffer usage",
"metric": "buffer_utilization",
"operator": "gt",
"threshold": 0.9,
"severity": "warning"
}Available metrics: buffer_utilization, push_error_rate, push_errors_total, clickhouse_flush_errors, clickhouse_pending_rows, event_throughput
Operators: gt, gte, lt, lte, eq
Severities: info, warning, critical
Response:
{ "id": "alert-1" }Evaluates all enabled rules against current aggregator metrics.
Response:
{
"fired": [
{
"rule_id": "alert-1",
"rule_name": "High buffer usage",
"severity": "warning",
"metric": "buffer_utilization",
"value": 0.95,
"threshold": 0.9,
"operator": "gt",
"message": "High buffer usage: Buffer Utilization > 0.9 (current: 0.95)",
"fired_at": 1700000000
}
],
"snapshot": {
"buffer_utilization": 0.95,
"push_error_rate": 0.0,
"push_errors_total": 0.0,
"clickhouse_flush_errors": 0.0,
"clickhouse_pending_rows": 0.0,
"event_throughput": 75000.0
}
}Download the aggregated profile as a JSON file.
Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
event_type |
string | (all) | cpu, lock, or syscall |
limit |
number | 100 | Max batches |
Returns Content-Disposition: attachment for browser download.
Download CPU stacks in Brendan Gregg's collapsed format.
Query parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
limit |
number | 100 | Max batches |
Output format (one line per unique stack):
main;handle_request;process_data;compute 150
main;handle_request;db_query 95
Compatible with: flamegraph.pl, speedscope, Grafana Pyroscope, pprof tools.
Proto file: aggregator/proto/aperture.proto
| RPC | Request | Response | Description |
|---|---|---|---|
| Push | PushRequest | PushResponse | Ingest agent data |
| Query | QueryRequest | QueryResponse | Query in-memory buffer |
| QueryStorage | QueryStorageRequest | QueryResponse | Query persistent storage |
| Aggregate | AggregateRequest | AggregateResponse | Server-side aggregation |
| Diff | DiffRequest | DiffResponse | Differential profiling |
Set APERTURE_AUTH_TOKEN on the aggregator. Agents send it as a Bearer token in the authorization gRPC metadata.
Scrape at http://<aggregator>:9090/metrics.
| Metric | Type | Labels | Description |
|---|---|---|---|
aperture_push_total |
counter | status=ok|error | Push RPCs received |
aperture_push_events_total |
counter | — | Total events ingested |
aperture_push_duration_seconds |
histogram | — | Push RPC latency |
aperture_buffer_batches |
gauge | — | Batches in buffer |
aperture_buffer_drops_total |
counter | — | Batches dropped (capacity) |
aperture_clickhouse_flush_total |
counter | status=ok|error | ClickHouse flush attempts |
aperture_clickhouse_flush_rows_total |
counter | — | Rows flushed |
aperture_clickhouse_flush_duration_seconds |
histogram | — | Flush latency |
aperture_clickhouse_pending_rows |
gauge | — | Pending rows |