Skip to content

Wait for disk utilization info cache to be populated on controllers during startup #17599

@mluvin-stripe

Description

@mluvin-stripe

Problem

When using the “pause ingestion based on resource utilization” feature (docs), upon restart, controllers initially don’t have their cache of server disk utilization information populated until the ResourceUtilizationChecker periodic task runs. There’s a config controller.resource.utilization.checker.initial.delay that we can set to zero seconds to kick off populating the cache immediately, but the controller could still start serving requests before the checker finishes populating the cache since the controller doesn’t wait for the checker to finish before marking itself as ready.

This is a problem for minion-based offline segment generation (code) and offline segment uploads (new feature proposed in #17557), since the disk utilization check will return UNDETERMINED if the controller’s disk utilization cache isn’t yet populated – so the segment creation/upload is allowed to proceed, even if the disk threshold has already been breached.

Solution

I propose adding an opt-in config controller.resource.utilization.checker.waitDuringStartup that ensures the disk utilization cache is populated before marking the controller as ready. This way, the controller is immediately ready to correctly reject segment creation/upload requests after starting up.

I was thinking of adding another serviceStatusCallback (like this one)that checks if the disk utilization cache has been populated yet, and doesn’t return GOOD until it’s populated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions