|
| 1 | +# Scheduling Send Refactor Design |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +The current SMS scheduling logic has two issues: |
| 6 | + |
| 7 | +1. **No way to send at an exact time without scheduling interference.** When a user specifies a `SendTime`/`SendAt`, the system still applies rate-limiting and schedule window logic, which may shift the actual send time. |
| 8 | + |
| 9 | +2. **Bulk message contention.** When bulk messages (API or CSV) are sent, all events arrive at the Cloud Tasks queue near-simultaneously, causing DB serialization conflicts in `PhoneNotificationRepository.Schedule()` (which uses `SELECT ... ORDER BY scheduled_at DESC` in a transaction). The current workaround is a hardcoded 1-second spacing hack. |
| 10 | + |
| 11 | +## Proposed Solution |
| 12 | + |
| 13 | +### Core Principle |
| 14 | + |
| 15 | +- **Explicit `SendTime`** = send at exactly that time, bypass all scheduling logic. |
| 16 | +- **No `SendTime`** = apply full scheduling logic (rate-limit + schedule windows), with rate-based Cloud Task dispatch delay to prevent DB contention. |
| 17 | + |
| 18 | +### Design |
| 19 | + |
| 20 | +#### 1. ExactSendTime Flag (Transient — not persisted) |
| 21 | + |
| 22 | +A boolean `ExactSendTime` flows through the event system: |
| 23 | + |
| 24 | +``` |
| 25 | +Request → MessageSendParams → MessageAPISentPayload → PhoneNotificationScheduleParams |
| 26 | +``` |
| 27 | + |
| 28 | +When `true`, the notification scheduling layer sets `ScheduledAt` to the exact time and skips rate-limit + window logic. |
| 29 | + |
| 30 | +#### 2. Rate-Based Dispatch Delay |
| 31 | + |
| 32 | +For bulk messages without an explicit `SendTime`, instead of the `index * 1s` hack, the service computes: |
| 33 | + |
| 34 | +```go |
| 35 | +interval := time.Minute / time.Duration(messagesPerMinute) |
| 36 | +delay := time.Duration(index) * interval |
| 37 | +``` |
| 38 | + |
| 39 | +Where `index` is **per-phone** (not global across the batch). This spreads Cloud Task deliveries at the phone's actual send rate, eliminating DB contention naturally. Duration math avoids integer truncation issues for rates > 60/min or non-divisors of 60. |
| 40 | + |
| 41 | +#### 3. Per-Endpoint Behavior |
| 42 | + |
| 43 | +| Endpoint | `SendAt` provided | `SendAt` absent | |
| 44 | +| --------------------------------------- | ---------------------------------------------------- | --------------------------------------------------------- | |
| 45 | +| Single SMS API (`/v1/messages/send`) | `ExactSendTime=true`, delay = `time.Until(SendAt)` | `ExactSendTime=false`, delay = 0 | |
| 46 | +| Bulk SMS API (`/v1/messages/bulk-send`) | N/A (no SendAt field) | `ExactSendTime=false`, delay = `perPhoneIndex * interval` | |
| 47 | +| CSV Upload | `ExactSendTime=true`, delay = `time.Until(SendTime)` | `ExactSendTime=false`, delay = `perPhoneIndex * interval` | |
| 48 | + |
| 49 | +**Index is per-phone**: In a CSV with messages to multiple phones, each phone maintains its own index counter. Messages to Phone A get indices 0, 1, 2... and messages to Phone B get separate indices 0, 1, 2... This ensures correct rate-limiting per phone without over-throttling unrelated phones. |
| 50 | + |
| 51 | +#### 4. Notification Scheduling Bypass |
| 52 | + |
| 53 | +In `PhoneNotificationService.Schedule()`: |
| 54 | + |
| 55 | +```go |
| 56 | +if params.ExactSendTime && params.ScheduledSendTime != nil { |
| 57 | + notification.ScheduledAt = *params.ScheduledSendTime |
| 58 | + // Skip rate-limit and schedule window logic |
| 59 | + // Insert directly |
| 60 | +} else { |
| 61 | + // Existing logic: rate-limit + schedule window |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +### Changes by File |
| 66 | + |
| 67 | +| File | Change | |
| 68 | +| -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 69 | +| `pkg/events/message_api_sent_event.go` | Add `ExactSendTime bool` field to `MessageAPISentPayload` | |
| 70 | +| `pkg/services/message_service.go` | Add `Index int` to `MessageSendParams`; update `getSendDelay()` to compute rate-based delay when `Index > 0` and `SendAt == nil`; set `ExactSendTime` on event payload when `SendAt != nil` | |
| 71 | +| `pkg/services/phone_notification_service.go` | Add `ExactSendTime bool` + `ScheduledSendTime *time.Time` to `PhoneNotificationScheduleParams`; add bypass path in `Schedule()` when `ExactSendTime && ScheduledSendTime != nil` — insert notification directly without transaction/rate logic | |
| 72 | +| `pkg/repositories/gorm_phone_notification_repository.go` | Add `ScheduleExact(ctx, notification)` method that inserts with a fixed `ScheduledAt` (no transaction, no rate query). Add unique constraint or dedupe check on `(message_id)` for pending notifications to ensure idempotency. | |
| 73 | +| `pkg/repositories/phone_notification_repository.go` | Add `ScheduleExact` to the repository interface | |
| 74 | +| `pkg/listeners/phone_notification_listener.go` | Pass `ExactSendTime` + `ScheduledSendTime` from event payload to service params | |
| 75 | +| `pkg/requests/message_bulk_send_request.go` | Remove per-index `SendAt` computation; add `Index` to each `MessageSendParams` | |
| 76 | +| `pkg/requests/bulk_message_request.go` | Propagate `Index` into params for CSV rows | |
| 77 | +| `pkg/handlers/message_handler.go` | Remove `index * 1s` hack in `BulkSend` handler | |
| 78 | +| `pkg/handlers/bulk_message_handler.go` | Compute per-phone index for CSV rows; remove any concurrent scheduling; ensure `Index` is passed to `MessageSendParams` | |
| 79 | + |
| 80 | +### Data Flow |
| 81 | + |
| 82 | +``` |
| 83 | +User sends request |
| 84 | + → Handler creates MessageSendParams (with Index for bulk, ExactSendTime derived from SendAt presence) |
| 85 | + → MessageService.SendMessage() |
| 86 | + → Computes dispatch delay: |
| 87 | + - ExactSendTime: time.Until(SendAt) |
| 88 | + - Bulk without SendAt: Index * (60/MessagesPerMinute)s |
| 89 | + - Single without SendAt: 0 |
| 90 | + → Sets ExactSendTime on MessageAPISentPayload |
| 91 | + → DispatchWithTimeout(event, delay) → Cloud Tasks |
| 92 | + → [delay elapses] → PhoneNotificationListener.onMessageAPISent() |
| 93 | + → PhoneNotificationService.Schedule(params with ExactSendTime) |
| 94 | + → If ExactSendTime: insert with exact ScheduledAt |
| 95 | + → Else: apply rate-limit + schedule window logic |
| 96 | +``` |
| 97 | + |
| 98 | +### Edge Cases |
| 99 | + |
| 100 | +- **SendAt in the past**: Send immediately (existing behavior preserved). |
| 101 | +- **MessagesPerMinute = 0**: No rate limiting; bulk messages dispatch immediately (existing behavior — `Schedule()` already handles this). Rate-based delay uses 0 when rate is 0. |
| 102 | +- **No schedule attached to phone**: Window logic returns current time unchanged (existing behavior). |
| 103 | +- **CSV with mixed rows**: Some rows have `SendTime`, others don't. Each row is processed independently — those with `SendTime` get exact dispatch, those without get rate-based delay. |
| 104 | +- **Cloud Task duplicate delivery**: `ScheduleExact` and `Schedule` use a dedupe check (unique active notification per `message_id`) to prevent duplicate notification creation on at-least-once delivery. |
| 105 | +- **Retries for exact-send messages**: When an exact-send message expires and triggers a retry, the retry does NOT preserve exact-send semantics — it falls through to standard scheduling. The explicit time was a one-shot intent. |
| 106 | + |
| 107 | +### Terminology Note |
| 108 | + |
| 109 | +"Send at exactly that time" means the system will not apply additional rate-limit or schedule-window adjustments. It does NOT guarantee precise handset delivery timing (which depends on Cloud Tasks delivery, FCM push, and device state). |
| 110 | + |
| 111 | +### What Does NOT Change |
| 112 | + |
| 113 | +- The `MessageSendSchedule` entity and its `ResolveScheduledAt()` logic |
| 114 | +- The `SendScheduleService` CRUD operations |
| 115 | +- The phone notification entity schema (no new DB columns) |
| 116 | +- The Android app behavior |
| 117 | +- The web frontend (models auto-generated from Swagger) |
0 commit comments