Skip to content

Commit 83844a5

Browse files
AchoArnoldCopilot
andcommitted
docs: add scheduling send refactor design spec
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 65f6365 commit 83844a5

1 file changed

Lines changed: 117 additions & 0 deletions

File tree

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Scheduling Send Refactor Design
2+
3+
## Problem Statement
4+
5+
The current SMS scheduling logic has two issues:
6+
7+
1. **No way to send at an exact time without scheduling interference.** When a user specifies a `SendTime`/`SendAt`, the system still applies rate-limiting and schedule window logic, which may shift the actual send time.
8+
9+
2. **Bulk message contention.** When bulk messages (API or CSV) are sent, all events arrive at the Cloud Tasks queue near-simultaneously, causing DB serialization conflicts in `PhoneNotificationRepository.Schedule()` (which uses `SELECT ... ORDER BY scheduled_at DESC` in a transaction). The current workaround is a hardcoded 1-second spacing hack.
10+
11+
## Proposed Solution
12+
13+
### Core Principle
14+
15+
- **Explicit `SendTime`** = send at exactly that time, bypass all scheduling logic.
16+
- **No `SendTime`** = apply full scheduling logic (rate-limit + schedule windows), with rate-based Cloud Task dispatch delay to prevent DB contention.
17+
18+
### Design
19+
20+
#### 1. ExactSendTime Flag (Transient — not persisted)
21+
22+
A boolean `ExactSendTime` flows through the event system:
23+
24+
```
25+
Request → MessageSendParams → MessageAPISentPayload → PhoneNotificationScheduleParams
26+
```
27+
28+
When `true`, the notification scheduling layer sets `ScheduledAt` to the exact time and skips rate-limit + window logic.
29+
30+
#### 2. Rate-Based Dispatch Delay
31+
32+
For bulk messages without an explicit `SendTime`, instead of the `index * 1s` hack, the service computes:
33+
34+
```go
35+
interval := time.Minute / time.Duration(messagesPerMinute)
36+
delay := time.Duration(index) * interval
37+
```
38+
39+
Where `index` is **per-phone** (not global across the batch). This spreads Cloud Task deliveries at the phone's actual send rate, eliminating DB contention naturally. Duration math avoids integer truncation issues for rates > 60/min or non-divisors of 60.
40+
41+
#### 3. Per-Endpoint Behavior
42+
43+
| Endpoint | `SendAt` provided | `SendAt` absent |
44+
| --------------------------------------- | ---------------------------------------------------- | --------------------------------------------------------- |
45+
| Single SMS API (`/v1/messages/send`) | `ExactSendTime=true`, delay = `time.Until(SendAt)` | `ExactSendTime=false`, delay = 0 |
46+
| Bulk SMS API (`/v1/messages/bulk-send`) | N/A (no SendAt field) | `ExactSendTime=false`, delay = `perPhoneIndex * interval` |
47+
| CSV Upload | `ExactSendTime=true`, delay = `time.Until(SendTime)` | `ExactSendTime=false`, delay = `perPhoneIndex * interval` |
48+
49+
**Index is per-phone**: In a CSV with messages to multiple phones, each phone maintains its own index counter. Messages to Phone A get indices 0, 1, 2... and messages to Phone B get separate indices 0, 1, 2... This ensures correct rate-limiting per phone without over-throttling unrelated phones.
50+
51+
#### 4. Notification Scheduling Bypass
52+
53+
In `PhoneNotificationService.Schedule()`:
54+
55+
```go
56+
if params.ExactSendTime && params.ScheduledSendTime != nil {
57+
notification.ScheduledAt = *params.ScheduledSendTime
58+
// Skip rate-limit and schedule window logic
59+
// Insert directly
60+
} else {
61+
// Existing logic: rate-limit + schedule window
62+
}
63+
```
64+
65+
### Changes by File
66+
67+
| File | Change |
68+
| -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
69+
| `pkg/events/message_api_sent_event.go` | Add `ExactSendTime bool` field to `MessageAPISentPayload` |
70+
| `pkg/services/message_service.go` | Add `Index int` to `MessageSendParams`; update `getSendDelay()` to compute rate-based delay when `Index > 0` and `SendAt == nil`; set `ExactSendTime` on event payload when `SendAt != nil` |
71+
| `pkg/services/phone_notification_service.go` | Add `ExactSendTime bool` + `ScheduledSendTime *time.Time` to `PhoneNotificationScheduleParams`; add bypass path in `Schedule()` when `ExactSendTime && ScheduledSendTime != nil` — insert notification directly without transaction/rate logic |
72+
| `pkg/repositories/gorm_phone_notification_repository.go` | Add `ScheduleExact(ctx, notification)` method that inserts with a fixed `ScheduledAt` (no transaction, no rate query). Add unique constraint or dedupe check on `(message_id)` for pending notifications to ensure idempotency. |
73+
| `pkg/repositories/phone_notification_repository.go` | Add `ScheduleExact` to the repository interface |
74+
| `pkg/listeners/phone_notification_listener.go` | Pass `ExactSendTime` + `ScheduledSendTime` from event payload to service params |
75+
| `pkg/requests/message_bulk_send_request.go` | Remove per-index `SendAt` computation; add `Index` to each `MessageSendParams` |
76+
| `pkg/requests/bulk_message_request.go` | Propagate `Index` into params for CSV rows |
77+
| `pkg/handlers/message_handler.go` | Remove `index * 1s` hack in `BulkSend` handler |
78+
| `pkg/handlers/bulk_message_handler.go` | Compute per-phone index for CSV rows; remove any concurrent scheduling; ensure `Index` is passed to `MessageSendParams` |
79+
80+
### Data Flow
81+
82+
```
83+
User sends request
84+
→ Handler creates MessageSendParams (with Index for bulk, ExactSendTime derived from SendAt presence)
85+
→ MessageService.SendMessage()
86+
→ Computes dispatch delay:
87+
- ExactSendTime: time.Until(SendAt)
88+
- Bulk without SendAt: Index * (60/MessagesPerMinute)s
89+
- Single without SendAt: 0
90+
→ Sets ExactSendTime on MessageAPISentPayload
91+
→ DispatchWithTimeout(event, delay) → Cloud Tasks
92+
→ [delay elapses] → PhoneNotificationListener.onMessageAPISent()
93+
→ PhoneNotificationService.Schedule(params with ExactSendTime)
94+
→ If ExactSendTime: insert with exact ScheduledAt
95+
→ Else: apply rate-limit + schedule window logic
96+
```
97+
98+
### Edge Cases
99+
100+
- **SendAt in the past**: Send immediately (existing behavior preserved).
101+
- **MessagesPerMinute = 0**: No rate limiting; bulk messages dispatch immediately (existing behavior — `Schedule()` already handles this). Rate-based delay uses 0 when rate is 0.
102+
- **No schedule attached to phone**: Window logic returns current time unchanged (existing behavior).
103+
- **CSV with mixed rows**: Some rows have `SendTime`, others don't. Each row is processed independently — those with `SendTime` get exact dispatch, those without get rate-based delay.
104+
- **Cloud Task duplicate delivery**: `ScheduleExact` and `Schedule` use a dedupe check (unique active notification per `message_id`) to prevent duplicate notification creation on at-least-once delivery.
105+
- **Retries for exact-send messages**: When an exact-send message expires and triggers a retry, the retry does NOT preserve exact-send semantics — it falls through to standard scheduling. The explicit time was a one-shot intent.
106+
107+
### Terminology Note
108+
109+
"Send at exactly that time" means the system will not apply additional rate-limit or schedule-window adjustments. It does NOT guarantee precise handset delivery timing (which depends on Cloud Tasks delivery, FCM push, and device state).
110+
111+
### What Does NOT Change
112+
113+
- The `MessageSendSchedule` entity and its `ResolveScheduledAt()` logic
114+
- The `SendScheduleService` CRUD operations
115+
- The phone notification entity schema (no new DB columns)
116+
- The Android app behavior
117+
- The web frontend (models auto-generated from Swagger)

0 commit comments

Comments
 (0)