Skip to content

Optimize Gossip with Latency-Based Peer Scoring #24

@mgazza

Description

@mgazza

Problem

CLSet currently syncs with all connected peers equally, regardless of network locality. This causes:

  1. High cross-region traffic: Remote peers synced as frequently as local peers
  2. Increased latency: Sync operations wait for slow remote peers
  3. Unnecessary bandwidth: Redundant syncs with distant peers

CLSet uses libp2p's GossipSub for peer discovery and summary broadcasts (p2p_sync.go), but doesn't configure any peer scoring.

Solution

Use libp2p GossipSub's built-in peer scoring to prefer low-latency peers in the gossip mesh.

GossipSub Peer Scoring

GossipSub has PeerScoreParams with an AppSpecificScore function:

type PeerScoreParams struct {
    // P5: Application-specific peer scoring
    AppSpecificScore  func(p peer.ID) float64
    AppSpecificWeight float64
    
    // ... other params
}

How it works:

  • Score function called for each peer
  • Higher scores = more likely to be in gossip mesh
  • Mesh peers receive gossip messages
  • Lower-scored peers still reachable, just less preferred

Latency-Based Scoring

Use libp2p's built-in latency tracking:

func latencyBasedScore(h host.Host) func(peer.ID) float64 {
    return func(p peer.ID) float64 {
        latency := h.Peerstore().LatencyEWMA(p)
        
        if latency == 0 {
            return 0.0  // Unknown latency
        }
        
        // Convert latency to score
        // Lower latency = higher score
        // Example thresholds:
        //   <10ms:  score = 100 (same datacenter)
        //   <50ms:  score = 50  (same region)
        //   <100ms: score = 20  (cross-region)
        //   >200ms: score = 1   (slow/distant)
        
        if latency < 10*time.Millisecond {
            return 100.0
        } else if latency < 50*time.Millisecond {
            return 50.0
        } else if latency < 100*time.Millisecond {
            return 20.0
        } else if latency < 200*time.Millisecond {
            return 5.0
        } else {
            return 1.0
        }
    }
}

Implementation

1. Add Peer Scoring Config

Update p2p_sync.go:

type PeerConfig struct {
    PeriodicSyncInterval  time.Duration
    ScheduledSyncInterval time.Duration
    SummaryCacheTTL       time.Duration
    MinSummaryInterval    time.Duration
    
    // NEW: Peer scoring config
    EnablePeerScoring     bool
    PeerScoreWeight       float64
}

// NEW: Option for peer scoring
func WithPeerScoring(weight float64) PeerOption {
    return func(c *PeerConfig) {
        c.EnablePeerScoring = true
        c.PeerScoreWeight = weight
    }
}

2. Configure GossipSub with Scoring

Update NewPeer() in p2p_sync.go:

func NewPeer(
    crdt *CRDT,
    ctx context.Context,
    h host.Host,
    opts ...PeerOption,
) (*Peer, error) {
    config := DefaultPeerConfig()
    for _, opt := range opts {
        opt(&config)
    }

    // Configure GossipSub with peer scoring
    var pubsubOpts []pubsub.Option
    
    if config.EnablePeerScoring {
        scoreParams := &pubsub.PeerScoreParams{
            AppSpecificScore:  latencyBasedScore(h),
            AppSpecificWeight: config.PeerScoreWeight,
            DecayInterval:     time.Minute,
            DecayToZero:       0.01,
        }
        
        thresholds := &pubsub.PeerScoreThresholds{
            GossipThreshold:             -100,
            PublishThreshold:            -500,
            GraylistThreshold:           -1000,
            AcceptPXThreshold:           0,
            OpportunisticGraftThreshold: 1,
        }
        
        pubsubOpts = append(pubsubOpts, pubsub.WithPeerScore(scoreParams, thresholds))
    }
    
    ps, err := pubsub.NewGossipSub(ctx, h, pubsubOpts...)
    if err != nil {
        return nil, err
    }
    
    // ... rest of initialization
}

3. Add Latency Score Function

Add to p2p_sync.go:

// latencyBasedScore returns a peer scoring function based on network latency.
// Lower latency peers receive higher scores, making them preferred in the gossip mesh.
func latencyBasedScore(h host.Host) func(peer.ID) float64 {
    return func(p peer.ID) float64 {
        latency := h.Peerstore().LatencyEWMA(p)
        
        if latency == 0 {
            return 0.0  // Unknown latency, neutral score
        }
        
        // Score based on latency buckets
        switch {
        case latency < 10*time.Millisecond:
            return 100.0  // Same datacenter
        case latency < 50*time.Millisecond:
            return 50.0   // Same region
        case latency < 100*time.Millisecond:
            return 20.0   // Cross-region
        case latency < 200*time.Millisecond:
            return 5.0    // Distant
        default:
            return 1.0    // Very slow
        }
    }
}

Expected Behavior

Before (No Peer Scoring)

GossipSub mesh selection: Random peers
us-east-1-a syncs with:
  - us-east-1-b (1ms latency)   ← local
  - us-west-1-a (80ms latency)  ← remote
  - eu-west-1-a (120ms latency) ← remote

All peers equally preferred in mesh
High cross-region traffic

After (Latency-Based Scoring)

GossipSub mesh selection: Prefers high-scoring (low-latency) peers
us-east-1-a syncs with:
  - us-east-1-b (1ms, score=100)   ← local, preferred
  - us-east-1-c (2ms, score=100)   ← local, preferred
  - us-west-1-a (80ms, score=20)   ← remote, less preferred

Local peers dominate mesh
Reduced cross-region traffic
Remote peers still reachable for redundancy

Usage

Applications using CLSet can enable peer scoring:

p2pSync, err := clset.NewPeer(
    crdt,
    ctx,
    host,
    clset.WithPeriodicSyncInterval(5*time.Second),
    clset.WithPeerScoring(1.0),  // Enable with weight 1.0
)

Benefits

  1. Reduced cross-region traffic: Local peers preferred in gossip mesh
  2. Lower latency: Sync operations complete faster with nearby peers
  3. Better bandwidth utilization: Less redundant syncs with distant peers
  4. Automatic adaptation: Scoring adjusts as latencies change
  5. Still resilient: Remote peers available if local peers fail

Trade-offs

  1. Gossip coverage: Slightly slower propagation to distant regions (acceptable for CRDT use case)
  2. Scoring overhead: Small CPU cost for score calculation (negligible)
  3. Mesh churn: Peer scores may cause mesh changes (normal GossipSub behavior)

Testing

Unit Tests

  • Test latency score calculation
  • Test score buckets (10ms, 50ms, 100ms, 200ms)
  • Test zero latency handling
  • Test WithPeerScoring option

Integration Tests

  • Test GossipSub mesh composition with scoring enabled
  • Test local peers preferred over remote
  • Test remote peers still reachable
  • Measure cross-region traffic reduction
  • Verify CRDT convergence not impacted

Simulation

  • Multi-region setup (3 regions, 3 nodes each)
  • Measure gossip message distribution
  • Compare with/without peer scoring
  • Monitor peer scores over time

Tasks

  • Add EnablePeerScoring and PeerScoreWeight to PeerConfig
  • Implement WithPeerScoring() option function
  • Implement latencyBasedScore() function
  • Update NewPeer() to configure GossipSub with scoring
  • Add unit tests for score calculation
  • Add integration tests for mesh composition
  • Document usage in README
  • Add example with peer scoring enabled

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions