RFC: Advanced Bot Protection (Radar-style Risk Engine)

## RFC: Advanced Bot Protection (Radar-style Risk Engine)

**Phase**: 5 — Advanced Security & Enterprise  
**Priority**: P3 — Medium  
**Estimated Effort**: High  
**Depends on**: Audit Logs (#505), Rate Limiting (#501)

---

### Problem Statement

Basic rate limiting and CAPTCHA (#501, #504) protect against simple attacks, but sophisticated attackers rotate IPs, use headless browsers, and mimic legitimate traffic. WorkOS Radar provides device fingerprinting, risk scoring, and behavioral analysis. A risk engine that considers multiple signals provides significantly stronger protection.

---

### Proposed Solution

#### 1. Device Fingerprinting

**Client-side**: Lightweight JavaScript fingerprinting in `web/app/` that collects stable browser/device signals:

```javascript
// Signals collected (privacy-conscious — no invasive tracking):
const fingerprint = {
    userAgent: navigator.userAgent,
    language: navigator.language,
    languages: navigator.languages,
    platform: navigator.platform,
    screenResolution: `${screen.width}x${screen.height}`,
    colorDepth: screen.colorDepth,
    timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
    touchSupport: navigator.maxTouchPoints > 0,
    hardwareConcurrency: navigator.hardwareConcurrency,
    // Canvas and WebGL fingerprints — optional, configurable
};

// Hash client-side, send hash + raw signals to server
const fingerprintHash = await crypto.subtle.digest('SHA-256', 
    new TextEncoder().encode(JSON.stringify(fingerprint)));
```

**Server-side schema**: `internal/storage/schemas/device_fingerprint.go`

```go
type DeviceFingerprint struct {
    ID              string `json:"id" gorm:"primaryKey;type:char(36)"`
    UserID          string `json:"user_id" gorm:"type:char(36);index:idx_device_user"`
    FingerprintHash string `json:"fingerprint_hash" gorm:"type:varchar(64);index:idx_device_hash"`
    RawSignals      string `json:"raw_signals" gorm:"type:text"`           // JSON of collected signals
    TrustLevel      string `json:"trust_level" gorm:"type:varchar(20)"`    // trusted | neutral | suspicious
    FirstSeenAt     int64  `json:"first_seen_at" gorm:"autoCreateTime"`
    LastSeenAt      int64  `json:"last_seen_at"`
    LoginCount      int64  `json:"login_count" gorm:"default:0"`
}
```

#### 2. Risk Scoring Engine

**New package**: `internal/risk/`

```go
type RiskSignal struct {
    Name   string
    Score  float64 // 0.0 (safe) to 1.0 (dangerous)
    Weight float64 // importance multiplier
    Reason string
}

type RiskAssessment struct {
    Score    float64      // weighted average: 0.0-1.0
    Level    string       // low | medium | high | critical
    Signals  []RiskSignal
    Decision string       // allow | challenge | block
}

func (e *RiskEngine) Assess(ctx context.Context, req RiskRequest) *RiskAssessment {
    signals := []RiskSignal{}
    
    // 1. IP Reputation (weight: 0.2)
    // - Known VPN/proxy/Tor exit node → +0.3
    // - IP country matches user's usual country → 0.0
    // - IP country is new for this user → +0.4
    // - IP has high failed login rate across accounts → +0.8
    
    // 2. Device Fingerprint (weight: 0.25)
    // - Known/trusted device → 0.0
    // - New device for this user → +0.3
    // - Device seen with many failed logins → +0.7
    // - No fingerprint provided (possible bot) → +0.5
    
    // 3. Login Velocity (weight: 0.2)
    // - Normal frequency → 0.0
    // - Multiple logins in short period → +0.4
    // - Failed attempts within window (#501) → scaled by count
    
    // 4. Time-of-Day Pattern (weight: 0.1)
    // - Login during user's usual hours → 0.0
    // - Login at unusual hour → +0.3
    
    // 5. Geographic Anomaly (weight: 0.15)
    // - "Impossible travel": login from NYC, then Tokyo 30 min later → +0.9
    
    // 6. User Agent Anomaly (weight: 0.1)
    // - Common browser → 0.0
    // - Headless browser / curl / bot signature → +0.6
    // - UA doesn't match device fingerprint signals → +0.5
    
    // Compute weighted score
    totalScore := weightedAverage(signals)
    
    // Map to decision
    decision := "allow"
    if totalScore > challengeThreshold { decision = "challenge" }  // require MFA or CAPTCHA
    if totalScore > blockThreshold { decision = "block" }
    
    return &RiskAssessment{Score: totalScore, Signals: signals, Decision: decision}
}
```

**Default thresholds** (configurable):
- `< 0.3` → Allow
- `0.3 - 0.7` → Challenge (require MFA or CAPTCHA)
- `> 0.7` → Block

#### 3. Integration with Auth Flow

In login handlers, after credential verification but before issuing tokens:

```go
risk := riskEngine.Assess(ctx, risk.RiskRequest{
    UserID:          user.ID,
    Email:           user.Email,
    IPAddress:       clientIP,
    UserAgent:       userAgent,
    DeviceFingerprint: fingerprintHash,
    LoginMethod:     "password",
})

switch risk.Decision {
case "block":
    // Reject with generic error (don't reveal risk analysis)
    auditLog("user.login_blocked_risk", risk)
    return error("login_blocked")
    
case "challenge":
    // Step-up authentication
    if user.IsMultiFactorAuthEnabled {
        return requireMFA(user)  // existing MFA flow
    } else if captchaProvider != nil {
        return requireCAPTCHA()  // from #504
    }
    // If no challenge method available, allow with warning
    auditLog("user.login_risk_warning", risk)
    
case "allow":
    // Proceed normally
}
```

#### 4. Credential Stuffing Detection

Building on LoginAttempt table from #501:

```go
// Detect credential stuffing: high failed rate from one IP across MANY accounts
func detectCredentialStuffing(ctx context.Context, ip string, window time.Duration) bool {
    // Count unique emails with failed logins from this IP
    uniqueEmails, _ := store.CountUniqueFailedEmailsByIP(ctx, ip, window)
    totalFailed, _ := store.CountFailedAttemptsByIP(ctx, ip, window)
    
    // If one IP is hitting many different accounts → credential stuffing
    if uniqueEmails > 10 && totalFailed > 20 {
        // Auto-block IP temporarily
        store.AddIPRule(ctx, &schemas.IPRule{
            IP: ip, Type: "block", Reason: "credential_stuffing_detected",
            ExpiresAt: time.Now().Add(1 * time.Hour).Unix(),
        })
        return true
    }
    return false
}
```

#### 5. New Device Alerts

When risk assessment detects a new device:

```go
if risk.HasSignal("new_device") && cfg.EnableNewDeviceAlerts {
    emailProvider.SendNewDeviceAlert(user.Email, DeviceAlertData{
        DeviceName:  parseUserAgent(userAgent),
        IPAddress:   clientIP,
        Location:    geolocate(clientIP),
        Time:        time.Now(),
        RiskLevel:   risk.Level,
    })
}
```

---

### CLI Configuration Flags

```
--enable-risk-engine=false                 # Enable risk scoring
--risk-challenge-threshold=0.3             # Score above which to challenge
--risk-block-threshold=0.7                 # Score above which to block
--enable-device-fingerprinting=false       # Enable client-side fingerprinting
--enable-new-device-alerts=true            # Email on new device login
--enable-impossible-travel-detection=false # Geographic anomaly detection
```

---

### Testing Plan

- Unit tests for each risk signal calculator
- Unit tests for weighted score computation
- Integration test: new device triggers challenge
- Integration test: impossible travel triggers block
- Test credential stuffing detection auto-blocks IP
- Test risk decisions (allow/challenge/block) at threshold boundaries
- Test with various user agent strings (legitimate vs. bot)

---

### References

- [WorkOS Radar](https://workos.com/docs/radar)
- [OWASP Credential Stuffing Prevention](https://cheatsheetseries.owasp.org/cheatsheets/Credential_Stuffing_Prevention_Cheat_Sheet.html)
- [Device Fingerprinting Best Practices](https://fingerprintjs.com/blog/browser-fingerprinting-techniques/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Advanced Bot Protection (Radar-style Risk Engine) #521