Store Durable Object alarms per-namespace on disk#6104
Store Durable Object alarms per-namespace on disk#6104threepointone wants to merge 2 commits intomainfrom
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
|
the workers-sdk test failure seems to be just about fixing the file count on the workers-sdk side |
|
/bigbonk roast this PR. Hi Sunil. testing bonk |
|
/bigbonk roast this PR |
|
OK, I've seen enough. Time to roast. The Roast"(thanks opus)" -- The PR description is literally better written than most theses I've reviewed, and you're crediting it to a chatbot. Sunil, this is the equivalent of showing up to a potluck with a Michelin-star catered dish and going "oh yeah my microwave did this." The actual code though? Let's talk about it. The You check The
The test (line 2293+) sets an alarm for File count assertions -- you bumped
The deletion of The facets test fix (line 5031+) where you added Bottom line: The design is sound -- per-namespace alarm storage is clearly the right call, the PR description is immaculate (thanks, Opus), and the test coverage is genuinely good. But the implementation has the telltale signs of "an AI wrote the structure and a human glued it together" -- slightly redundant control flow, a few too many "trust me, this is fine" comments, and magic numbers scattered like confetti. 7/10 -- would merge after a cleanup pass. The architecture carries it. |
src/workerd/server/server.c++
Outdated
| this->actorStorage.emplace(dir.openSubdir( | ||
| kj::Path({d.uniqueKey}), kj::WriteMode::CREATE | kj::WriteMode::MODIFY)); | ||
|
|
||
| // Create per-namespace alarm scheduler backed by on-disk storage in the | ||
| // namespace directory, alongside the per-actor .sqlite files. | ||
| auto& as = KJ_ASSERT_NONNULL(this->actorStorage); |
There was a problem hiding this comment.
| this->actorStorage.emplace(dir.openSubdir( | |
| kj::Path({d.uniqueKey}), kj::WriteMode::CREATE | kj::WriteMode::MODIFY)); | |
| // Create per-namespace alarm scheduler backed by on-disk storage in the | |
| // namespace directory, alongside the per-actor .sqlite files. | |
| auto& as = KJ_ASSERT_NONNULL(this->actorStorage); | |
| auto& as = this->actorStorage.emplace(dir.openSubdir( | |
| kj::Path({d.uniqueKey}), kj::WriteMode::CREATE | kj::WriteMode::MODIFY)); | |
| // Create per-namespace alarm scheduler backed by on-disk storage in the | |
| // namespace directory, alongside the per-actor .sqlite files. |
src/workerd/server/server.c++
Outdated
|
|
||
| KJ_IF_SOME(d, config.tryGet<Durable>()) { | ||
| auto idFactory = kj::heap<ActorIdFactoryImpl>(d.uniqueKey); | ||
| sched->registerNamespace(d.uniqueKey, |
There was a problem hiding this comment.
So, AlarmScheduler is currently designed to handle multiple namespaces registered at once, and to that end the table it creates has a column called actor_unique_key which identifies the DO namespace.
But now that we store a separate database per namespace, we don't really want that anymore. It's just wasting bytes and potentially causing headaches if someone wants to change keys or something.
Seems like we should update the AlarmScheduler implementation to remove the redundant functionality.
This is something that's best to do now, not later after a bunch of these tables have already been written to disk...
Previously, all alarms were stored in a single global in-memory SQLite database, meaning they were lost on process restart. This made it impossible to test alarm resiliency scenarios. Move alarm scheduler ownership from Server into ActorNamespace, so each DO namespace gets its own AlarmScheduler backed by metadata.sqlite in the namespace's storage directory. On-disk namespaces get persistent alarms; in-memory namespaces get in-memory alarms. No new configuration is needed -- alarm storage follows the existing durableObjectStorage setting. Ref: #605 (comment) Co-authored-by: Cursor <cursoragent@cursor.com>
Address review feedback from kentonv: since each DO namespace now has its own separate SQLite database for alarm storage, the actor_unique_key column in the _cf_ALARM table is redundant. Previously, AlarmScheduler was designed to handle multiple namespaces in a single database, using actor_unique_key to distinguish between them. Now that we store a separate database per namespace, this column just wastes bytes and could cause headaches if someone wants to change keys later. As kentonv noted, this is best fixed now before any of these tables are written to disk in the wild. Changes: - Simplified ActorKey struct to contain only actorId (removed uniqueKey) - Simplified _cf_ALARM schema from (actor_unique_key, actor_id, scheduled_time) with a compound primary key to (actor_id, scheduled_time) with actor_id as the sole primary key - Removed registerNamespace() -- the GetActorFn is now passed directly to the AlarmScheduler constructor, since each scheduler serves exactly one namespace - Replaced the namespaces HashMap with a single GetActorFn member - Restructured ActorNamespace::link() in server.c++ to create the GetActorFn once and pass it to whichever storage path (on-disk or in-memory) is used, eliminating duplication - Removed unused setAlarmInMemory declaration and unused byte type alias
95b7ec1 to
a99f831
Compare
|
got the clanker to rebase and force push a rewrite, 🏓 |
(thanks opus)
Alarms in workerd are currently stored in a single global in-memory SQLite database. This means all scheduled alarms are lost on process restart, which prevents testing alarm resiliency scenarios and doesn't match production behavior.
This PR moves alarm scheduler ownership from
ServerintoActorNamespace, so each DO namespace gets its ownAlarmSchedulerbacked bymetadata.sqlitein the namespace's storage directory. On-disk namespaces get persistent alarms; in-memory namespaces get in-memory alarms.Follows up on the design discussion in #605 (comment).
Design decisions
Per-namespace, not global. Kenton's original feedback on #605 was that a single global alarm database is problematic: it decouples alarm storage from namespace data, it makes splitting/combining configs lossy, and it creates the confusing possibility of on-disk DOs with in-memory alarms (or vice versa). Each namespace now owns its scheduler and stores alarms alongside its actor
.sqlitefiles.No new configuration. Alarm storage mode is inferred from the existing
durableObjectStoragesetting. If the namespace useslocalDisk, alarms go on disk. If it'sinMemoryornone, alarms stay in memory. This was an explicit goal from the #605 discussion -- no new config knobs needed.metadata.sqliteas the filename. Named generically (rather thanalarms.sqlite) so it can hold other per-namespace metadata in the future, as Kenton suggested.AlarmSchedulerclass unchanged. The class already accepted a VFS + path in its constructor, so no API changes were needed. The only change is where schedulers are created and who owns them.Changes
server.h: Removed globalAlarmSchedulermember,startAlarmScheduler()declaration, and thealarm-scheduler.hinclude (moved to.c++).server.c++:ActorNamespacenow owns akj::Maybe<kj::Own<AlarmScheduler>>. Created inlink()using the namespace's own VFS (on-disk) or an in-memory VFS (fallback). The namespace self-registers with its scheduler at link time. Removed the global scheduler,LinkedIoChannels::alarmScheduler, and all related wiring.server-test.c++: Added a new "Durable Object alarm persistence (on disk)" test that sets an alarm, tears down the server, restarts with the same storage directory, and verifies the alarm survived. Updated two existing tests whose file-count assertions now includemetadata.sqlite.Notes for reviewers
alarm-scheduler.handalarm-scheduler.c++are completely untouched.kj::systemPreciseCalendarClock()call in theActorNamespaceconstructor is a pre-existing pattern -- the oldstartAlarmScheduler()called it the same way. Threading a calendar clock through theServerconstructor would be cleaner but significantly more churn for no immediate benefit.ActorNamespacematters:ownAlarmScheduleris declared beforeactorsso it outlives all actors that reference it (same constraint as the existingactorStoragefield).thiscapture in theregisterNamespacelambda is safe because the scheduler is owned by the namespace and destroyed first. This matches the original pattern which captured&actorNs.