Skip to content

Introduce options for reverting or adjusting memory limits after OOM #7

@kitsuyaazuma

Description

@kitsuyaazuma

Is your feature request related to a problem? Please describe.

Currently, broom updates the memory limit after a single Out Of Memory (OOM) event. This immediate adjustment might lead to unnecessary resource allocation, especially when the OOM was caused by a temporary spike in memory usage. Such behavior can result in increased costs due to over-provisioning.

Describe the solution you'd like

Introduce configurable strategies for managing memory limits post-OOM events:

  1. Revert to Original Limit: After a defined cooldown period or a specified number of successful job runs, automatically revert the memory limit to its original value.
  2. Statistical-Based Adjustment: Calculate the new memory limit based on historical usage, such as setting it to the maximum memory usage observed over the last N runs plus a safety margin (e.g., 120%), ensuring it doesn’t fall below the original setting.
  3. Permanent Increase: Maintain the current behavior where the memory limit remains elevated after an OOM event.

Describe alternatives you've considered

Configurable Sidecar Container in Controller-Manager (Future Enhancement): Introduce the capability for users to specify a custom sidecar container image within the controller-manager. This sidecar can implement tailored logic for recommending memory limits, providing flexibility for organizations with unique requirements.

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions