Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/admin/integrations/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,27 @@ You must first enable `coderd_agentstats_*` with the flag
`CODER_PROMETHEUS_COLLECT_AGENT_STATS` before they can be retrieved from the
deployment. They will always be available from the agent.

### Recent Metrics Enhancements

Several metrics have been enhanced with additional labels to provide more detailed monitoring:

- **`coderd_api_concurrent_requests`**: Now includes `method` and `path` labels to track concurrent requests per API endpoint and HTTP method
- **`coderd_api_concurrent_websockets`**: Now includes a `path` label to track concurrent websocket connections per API endpoint
- **`coderd_api_requests_processed_total`**: Includes special handling for route patterns:
- Routes not matching a specific pattern are labeled as "UNKNOWN"
- Static file requests are labeled as "STATIC"

These enhancements allow for more granular monitoring and better insight into API performance by endpoint.

### Deprecated Metrics

The following metrics have been deprecated and replaced with metrics that follow Prometheus naming conventions:

- **`coderd_api_workspace_latest_build_total`**: Replaced by `coderd_api_workspace_latest_build` (gauge metrics should avoid the `_total` suffix)
- **`coderd_oauth2_external_requests_rate_limit_total`**: Replaced by `coderd_oauth2_external_requests_rate_limit`

Please migrate to the new metric names in your dashboards and alerts.

<!-- Code generated by 'make docs/admin/integrations/prometheus.md'. DO NOT EDIT -->

| Name | Type | Description | Labels |
Expand Down
10 changes: 10 additions & 0 deletions docs/admin/monitoring/logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@ machine/VM.
Events such as server errors, audit logs, user activities, and SSO & OpenID
Connect logs are all captured in the `coderd` logs.

### Request Logs

Request logs include detailed information to help troubleshooting and auditing:

- **Authentication Context**: Logs include requestor ID, name, and email for user subjects
- **RBAC Subject Types**: Clearly identifies system services (provisioners, autostart, etc.)
- **Route Parameters**: Includes workspace and agent name parameters for better traceability
- **URL Parameters**: Query parameters are included with the prefix `params_`
- **Database Context**: Information about database operations related to the request

## `provisionerd` Logs

Logs for [external provisioners](../provisioners/index.md) are structured
Expand Down
39 changes: 39 additions & 0 deletions docs/admin/monitoring/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,42 @@ links point to relevant sections there.
(Kubernetes users only)
- [Configure Prometheus to scrape Coder metrics](../integrations/prometheus.md#prometheus-configuration)
- [See the list of available metrics](../integrations/prometheus.md#available-metrics)

## Monitoring Best Practices

### Key Metrics to Monitor

Here are some of the most important metrics to track for operational health:

1. **API Performance**
- `coderd_api_request_latencies_seconds`: Track latency by endpoint to identify slow APIs
- `coderd_api_concurrent_requests`: Monitor by path and method to identify bottlenecks
- `coderd_api_requests_processed_total`: Track error rates by status code

2. **Workspace Status**
- `coderd_workspace_latest_build_status`: Monitor workspace build success rates
- `coderd_agents_connections`: Track agent connectivity

3. **User Activity**
- `coderd_api_active_users_duration_hour`: Monitor active user count
- `coderd_insights_templates_active_users`: Track template usage

4. **System Health**
- `go_memstats_alloc_bytes`: Monitor memory usage
- `process_cpu_seconds_total`: Track CPU utilization

### Example Prometheus Queries

These queries can help monitor common scenarios:

```
# API Error Rate (5xx errors as percentage of total)
sum(rate(coderd_api_requests_processed_total{code=~"5.."}[5m])) /
sum(rate(coderd_api_requests_processed_total[5m])) * 100

# Slow API Endpoints (95th percentile latency > 1s)
histogram_quantile(0.95, sum(rate(coderd_api_request_latencies_seconds_bucket[5m])) by (path, le)) > 1

# Failed Workspace Builds (last 24h)
sum(increase(coderd_workspace_builds_total{status="failed"}[24h])) by (template_name)
```
Loading