-
-
Notifications
You must be signed in to change notification settings - Fork 203
Description
RFC: Structured Audit Log System
Phase: 1 — Security Hardening & Enterprise Foundation
Priority: P0 — Critical
Estimated Effort: Medium
Problem Statement
Authorizer only has webhook-based event delivery (8 event types: user.login, user.created, user.signup, user.access_revoked, user.access_enabled, user.deleted, user.deactivated). There is no queryable audit trail. Webhooks are fire-and-forget — if the endpoint is down, events are lost.
Audit logs are required for SOC 2, HIPAA, and GDPR compliance. Every competitor (WorkOS, Keycloak, Clerk) has structured audit logs.
Current Architecture Context
- Webhook events defined in
internal/constants/webhook_event.go(7 event types) - Event dispatch in
internal/events/events.go— sends HTTP POST to webhook endpoints with 30s timeout - Webhook logs (
schemas.WebhookLog) store HTTP status + request/response bodies - Events are triggered from GraphQL mutation handlers (
internal/graphql/) - Storage provider interface in
internal/storage/provider.go— all 13+ DB providers implement it - No structured audit log schema or query API exists
Proposed Solution
1. AuditLog Schema
New schema: internal/storage/schemas/audit_log.go
type AuditLog struct {
ID string `json:"id" gorm:"primaryKey;type:char(36)"`
Timestamp int64 `json:"timestamp" gorm:"index:idx_audit_timestamp;autoCreateTime"`
// Who performed the action
ActorID string `json:"actor_id" gorm:"type:char(36);index:idx_audit_actor"`
ActorType string `json:"actor_type" gorm:"type:varchar(30)"` // user | admin | system | service_account
ActorEmail string `json:"actor_email" gorm:"type:varchar(256)"` // denormalized for query convenience
// What happened
Action string `json:"action" gorm:"type:varchar(100);index:idx_audit_action"`
// What was affected
ResourceType string `json:"resource_type" gorm:"type:varchar(50);index:idx_audit_resource"` // user | session | token | webhook | config | role | permission
ResourceID string `json:"resource_id" gorm:"type:char(36)"`
// Request context
IPAddress string `json:"ip_address" gorm:"type:varchar(45)"`
UserAgent string `json:"user_agent" gorm:"type:text"`
// Additional context (JSON)
Metadata string `json:"metadata" gorm:"type:text"` // JSON string — auth method, changed fields, etc.
// Multi-tenancy
OrganizationID string `json:"organization_id" gorm:"type:char(36);index:idx_audit_org"`
}Indexes for query performance:
(timestamp)— time-range queries, retention cleanup(actor_id)— "what did this user do?"(action)— "show all login failures"(resource_type, resource_id)— "what happened to this resource?"(organization_id, timestamp)— org-scoped audit views
2. Comprehensive Event Types
Expanding from 7 webhook events to 25+ audit event types:
Authentication events:
| Action | Actor Type | Resource Type | When |
|---|---|---|---|
user.login_success |
user | session | Successful login (any method) |
user.login_failed |
system | user | Failed login attempt |
user.signup |
user | user | New user registration |
user.logout |
user | session | User logout |
user.password_changed |
user | user | Password change |
user.password_reset |
user | user | Password reset via token |
user.email_verified |
user | user | Email verification completed |
user.phone_verified |
user | user | Phone verification completed |
user.mfa_enabled |
user | user | MFA/TOTP enabled |
user.mfa_disabled |
user | user | MFA/TOTP disabled |
user.deactivated |
user | user | Self-deactivation |
Admin events:
| Action | Actor Type | Resource Type | When |
|---|---|---|---|
admin.user_created |
admin | user | Admin creates user |
admin.user_updated |
admin | user | Admin updates user |
admin.user_deleted |
admin | user | Admin deletes user |
admin.access_revoked |
admin | user | Admin revokes access |
admin.access_enabled |
admin | user | Admin enables access |
admin.user_unlocked |
admin | user | Admin unlocks locked account |
admin.role_assigned |
admin | user | Admin assigns role |
admin.role_removed |
admin | user | Admin removes role |
admin.config_changed |
admin | config | Admin updates env/config |
admin.webhook_created |
admin | webhook | Webhook created |
admin.webhook_updated |
admin | webhook | Webhook modified |
admin.webhook_deleted |
admin | webhook | Webhook removed |
Token events:
| Action | Actor Type | Resource Type | When |
|---|---|---|---|
token.issued |
system | token | Access/refresh token issued |
token.refreshed |
system | token | Token refreshed |
token.revoked |
user/admin | token | Token explicitly revoked |
Session events:
| Action | Actor Type | Resource Type | When |
|---|---|---|---|
session.created |
system | session | New session created |
session.terminated |
user/admin | session | Session ended |
3. Audit Logger Service
New package: internal/audit/
type Dependencies struct {
Log *zerolog.Logger
Store storage.Provider
Config *config.Config
}
type Provider interface {
// Log records an audit event
Log(ctx context.Context, event AuditEvent) error
// Query retrieves audit logs with filters
Query(ctx context.Context, filter AuditFilter, pagination *model.Pagination) ([]*schemas.AuditLog, *model.Pagination, error)
}
type AuditEvent struct {
ActorID string
ActorType string // user | admin | system | service_account
ActorEmail string
Action string
ResourceType string
ResourceID string
IPAddress string
UserAgent string
Metadata map[string]interface{} // serialized to JSON
OrganizationID string
}Non-blocking write: Audit logging must not block the request path. Use a buffered channel with a background goroutine flushing to the database:
type auditProvider struct {
eventChan chan AuditEvent // buffered channel, size 1000
store storage.Provider
// ...
}
func (a *auditProvider) Log(ctx context.Context, event AuditEvent) error {
select {
case a.eventChan <- event:
return nil
default:
// Channel full — log warning, don't block request
a.log.Warn().Msg("audit log buffer full, event dropped")
return fmt.Errorf("audit buffer full")
}
}
// Background goroutine batches writes
func (a *auditProvider) flushLoop() {
batch := make([]AuditEvent, 0, 100)
ticker := time.NewTicker(1 * time.Second)
for {
select {
case event := <-a.eventChan:
batch = append(batch, event)
if len(batch) >= 100 {
a.writeBatch(batch)
batch = batch[:0]
}
case <-ticker.C:
if len(batch) > 0 {
a.writeBatch(batch)
batch = batch[:0]
}
}
}
}4. Integration Points
Wrap existing event dispatch — internal/events/events.go currently fires webhooks. Extend to also write audit logs:
// In each GraphQL handler, after the action:
auditProvider.Log(ctx, audit.AuditEvent{
ActorID: user.ID,
ActorType: "user",
ActorEmail: user.Email,
Action: "user.login_success",
ResourceType: "session",
ResourceID: sessionID,
IPAddress: ginCtx.ClientIP(),
UserAgent: ginCtx.Request.UserAgent(),
Metadata: map[string]interface{}{"method": "password", "mfa": false},
})Helper to extract request context — reduce boilerplate:
func EventFromContext(ctx context.Context, action string, resourceType string, resourceID string) AuditEvent {
// Extract IP, user agent, actor from Gin context
}5. Storage Interface Methods
// AddAuditLog writes an audit log entry
AddAuditLog(ctx context.Context, log *schemas.AuditLog) error
// AddAuditLogs batch-writes audit log entries
AddAuditLogs(ctx context.Context, logs []*schemas.AuditLog) error
// ListAuditLogs queries audit logs with filters and pagination
ListAuditLogs(ctx context.Context, filter map[string]interface{}, pagination *model.Pagination) ([]*schemas.AuditLog, *model.Pagination, error)
// DeleteAuditLogsBefore removes logs older than a timestamp (retention)
DeleteAuditLogsBefore(ctx context.Context, before int64) error6. GraphQL Query API
type AuditLog {
id: ID!
timestamp: Int64!
actor_id: String
actor_type: String!
actor_email: String
action: String!
resource_type: String
resource_id: String
ip_address: String
user_agent: String
metadata: Map
organization_id: String
}
input AuditLogFilter {
actor_id: String
actor_type: String
action: String
resource_type: String
resource_id: String
ip_address: String
organization_id: String
from_timestamp: Int64
to_timestamp: Int64
}
type AuditLogs {
audit_logs: [AuditLog!]!
pagination: Pagination!
}
type Query {
# Admin-only query
_audit_logs(params: AuditLogFilter, pagination: PaginatedInput): AuditLogs!
}7. Immutability & Retention
- Immutable: No update/delete mutations exposed via API. Logs can only be removed by the retention policy.
- Retention:
--audit-log-retention-days=90(default 90 days) - Background goroutine runs daily:
DELETE FROM audit_logs WHERE timestamp < (now - retention_days) - Same cleanup pattern as LoginAttempt retention from RFC: Rate Limiting & Brute Force Protection #501
CLI Configuration Flags
--enable-audit-log=true # Enable/disable audit logging
--audit-log-retention-days=90 # Days to retain audit logs (0 = forever)
--audit-log-buffer-size=1000 # Internal buffer size for async writes
Migration Strategy
- Create
audit_logstable/collection across all 13+ DB providers - Add composite indexes for all query patterns
- Wire audit provider into
cmd/root.goinitialization (after storage, before HTTP handlers) - Inject into all GraphQL mutation handlers and HTTP handlers
- Existing webhook system continues unchanged — audit logs are additive
Testing Plan
- Unit tests for audit logger buffering and batch flush
- Integration tests: verify audit log written for each event type
- Integration tests: query API with various filters
- Test retention cleanup correctly removes old entries
- Test immutability — verify no update/delete API exists
- Load test: verify non-blocking behavior under high event volume
- Test buffer-full scenario — verify graceful degradation (warning log, not crash)