Skip to content

Commit 934f7f8

Browse files
cdr-robotclaude
andcommitted
docs: document new logs and prometheus metrics
Document recent enhancements to logs and Prometheus metrics including: - Enhanced request logs with auth and DB context - New labels for API metrics to track by endpoint and method - Deprecated metrics that have been replaced - Add monitoring best practices and example queries Closes #16 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent aacb11d commit 934f7f8

3 files changed

Lines changed: 70 additions & 0 deletions

File tree

docs/admin/integrations/prometheus.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,27 @@ You must first enable `coderd_agentstats_*` with the flag
102102
`CODER_PROMETHEUS_COLLECT_AGENT_STATS` before they can be retrieved from the
103103
deployment. They will always be available from the agent.
104104

105+
### Recent Metrics Enhancements
106+
107+
Several metrics have been enhanced with additional labels to provide more detailed monitoring:
108+
109+
- **`coderd_api_concurrent_requests`**: Now includes `method` and `path` labels to track concurrent requests per API endpoint and HTTP method
110+
- **`coderd_api_concurrent_websockets`**: Now includes a `path` label to track concurrent websocket connections per API endpoint
111+
- **`coderd_api_requests_processed_total`**: Includes special handling for route patterns:
112+
- Routes not matching a specific pattern are labeled as "UNKNOWN"
113+
- Static file requests are labeled as "STATIC"
114+
115+
These enhancements allow for more granular monitoring and better insight into API performance by endpoint.
116+
117+
### Deprecated Metrics
118+
119+
The following metrics have been deprecated and replaced with metrics that follow Prometheus naming conventions:
120+
121+
- **`coderd_api_workspace_latest_build_total`**: Replaced by `coderd_api_workspace_latest_build` (gauge metrics should avoid the `_total` suffix)
122+
- **`coderd_oauth2_external_requests_rate_limit_total`**: Replaced by `coderd_oauth2_external_requests_rate_limit`
123+
124+
Please migrate to the new metric names in your dashboards and alerts.
125+
105126
<!-- Code generated by 'make docs/admin/integrations/prometheus.md'. DO NOT EDIT -->
106127

107128
| Name | Type | Description | Labels |

docs/admin/monitoring/logs.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,16 @@ machine/VM.
2222
Events such as server errors, audit logs, user activities, and SSO & OpenID
2323
Connect logs are all captured in the `coderd` logs.
2424

25+
### Request Logs
26+
27+
Request logs include detailed information to help troubleshooting and auditing:
28+
29+
- **Authentication Context**: Logs include requestor ID, name, and email for user subjects
30+
- **RBAC Subject Types**: Clearly identifies system services (provisioners, autostart, etc.)
31+
- **Route Parameters**: Includes workspace and agent name parameters for better traceability
32+
- **URL Parameters**: Query parameters are included with the prefix `params_`
33+
- **Database Context**: Information about database operations related to the request
34+
2535
## `provisionerd` Logs
2636

2737
Logs for [external provisioners](../provisioners/index.md) are structured

docs/admin/monitoring/metrics.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,42 @@ links point to relevant sections there.
2020
(Kubernetes users only)
2121
- [Configure Prometheus to scrape Coder metrics](../integrations/prometheus.md#prometheus-configuration)
2222
- [See the list of available metrics](../integrations/prometheus.md#available-metrics)
23+
24+
## Monitoring Best Practices
25+
26+
### Key Metrics to Monitor
27+
28+
Here are some of the most important metrics to track for operational health:
29+
30+
1. **API Performance**
31+
- `coderd_api_request_latencies_seconds`: Track latency by endpoint to identify slow APIs
32+
- `coderd_api_concurrent_requests`: Monitor by path and method to identify bottlenecks
33+
- `coderd_api_requests_processed_total`: Track error rates by status code
34+
35+
2. **Workspace Status**
36+
- `coderd_workspace_latest_build_status`: Monitor workspace build success rates
37+
- `coderd_agents_connections`: Track agent connectivity
38+
39+
3. **User Activity**
40+
- `coderd_api_active_users_duration_hour`: Monitor active user count
41+
- `coderd_insights_templates_active_users`: Track template usage
42+
43+
4. **System Health**
44+
- `go_memstats_alloc_bytes`: Monitor memory usage
45+
- `process_cpu_seconds_total`: Track CPU utilization
46+
47+
### Example Prometheus Queries
48+
49+
These queries can help monitor common scenarios:
50+
51+
```
52+
# API Error Rate (5xx errors as percentage of total)
53+
sum(rate(coderd_api_requests_processed_total{code=~"5.."}[5m])) /
54+
sum(rate(coderd_api_requests_processed_total[5m])) * 100
55+
56+
# Slow API Endpoints (95th percentile latency > 1s)
57+
histogram_quantile(0.95, sum(rate(coderd_api_request_latencies_seconds_bucket[5m])) by (path, le)) > 1
58+
59+
# Failed Workspace Builds (last 24h)
60+
sum(increase(coderd_workspace_builds_total{status="failed"}[24h])) by (template_name)
61+
```

0 commit comments

Comments
 (0)