OCPBUGS-80952: perf: latency: compute memory resources dynamically by shajmakh · Pull Request #1517 · openshift/cluster-node-tuning-operator

shajmakh · 2026-05-15T14:14:53Z

When CPUs are very high the pod's fixed memory resources may become too
low to run the latency checks. Add an environment variable to allow more
flexibility while preserving the old behavior for backward
compatibility.
The new behavior goes like this:

If no env var is set, scales up the memory amount per CPU floored at
defaultTestMemory.
else, if it was set to a an explicit memory value and it's a valid
quantity then use that in the latency pod, otherwise throw an error.

32Mi was picked as the per-CPU memory factor based on input from
consumers of the application; If happened that the memory is still not
enough, the user has the flexibility to override the total memory with
an explicit value.

Summary by CodeRabbit

New Features
- Added configurable memory for latency tests via an environment variable, including a dynamic mode that scales memory based on CPU count.
- Invalid memory values are now validated and will surface as configuration errors.
Documentation
- Updated docs to describe the new memory configuration and dynamic scaling behavior.

coderabbitai · 2026-05-15T14:15:08Z

Walkthrough

Adds LATENCY_TEST_MEMORY support for latency e2e tests: introduces defaults and a dynamic mode, computes/validates memory (dynamic = max(32Mi * cpus, default)), and applies the computed quantity as the pod container memory limit.

Changes

Latency Test Memory Configuration

Layer / File(s)	Summary
Memory configuration constants and initialization `test/e2e/performanceprofile/functests/4_latency/latency.go`	Defines `defaultTestMemory` (`1Gi`), `dynamicMemory` mode, per-CPU factors, initializes `latencyTestMemory`, and updates env var docs to include `LATENCY_TEST_MEMORY`.
Memory computation and validation logic `test/e2e/performanceprofile/functests/4_latency/latency.go`	Adds `getLatencyTestMemory(cpus int)` which returns default when unset, supports `dynamic` mode computing `max(32Mi * cpus, defaultTestMemory)` with fallbacks when `cpus` is unset/invalid, and validates explicit quantities via `resource.ParseQuantity`.
Pod creation integration `test/e2e/performanceprofile/functests/4_latency/latency.go`	`getLatencyTestPod` calls `getLatencyTestMemory(latencyTestCpus)` and applies `resource.MustParse(latencyTestMemory)` for the latency container memory limit instead of a hardcoded `1Gi`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	Line 384 assertion lacks error message. Expect(err).ToNot(HaveOccurred()) fails to provide context when getLatencyTestMemory fails, hindering diagnosis.	Add error message: Expect(err).ToNot(HaveOccurred(), "failed to compute latency test memory") to help diagnose memory configuration errors.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	Test names are static: Describe uses "[performance] Latency Test", Contexts use "with the oslat/cyclictest/hwlatdetect image", and It blocks use "should succeed". No dynamic values in titles.
Microshift Test Compatibility	✅ Passed	No new Ginkgo tests added. PR only modifies helper functions for memory configuration. Check applies to new tests, not infrastructure changes.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	New latency e2e tests are SNO-compatible. Tests use only one node, schedule pods by hostname, and skip gracefully if insufficient resources. No multi-node assumptions detected.
Topology-Aware Scheduling Compatibility	✅ Passed	Changes limited to test file resource config and error handling. No scheduling constraints introduced. Test pod uses hostname-based nodeSelector, compatible with all topologies.
Ote Binary Stdout Contract	✅ Passed	No process-level stdout violations. All code is in Ginkgo test functions. Errors use fmt.Errorf, not stdout. Logging configured to stderr in TestMain().
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No new Ginkgo tests added; only helper functions modified for configurable memory. No IPv4 assumptions, hardcoded addresses, or external connectivity requirements detected.
Title check	✅ Passed	The title directly and clearly summarizes the main change: adding dynamic memory resource computation for latency performance tests, which matches the core objective of the PR.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/performanceprofile/functests/4_latency/latency.go`:
- Around line 287-289: The check in getLatencyTestMemory that returns
defaultTestMemory when cpus == defaultTestCpus is unreachable given the current
call-site logic (latencyTestCpus is normalized before calling), so either remove
that branch to simplify getLatencyTestMemory (delete the if cpus ==
defaultTestCpus { return defaultTestMemory, nil } case) or keep it but add a
short comment on the cpus parameter explaining this is defensive for future
callers (mentioning defaultTestCpus and why it might still be passed) so readers
know the branch is intentional; locate getLatencyTestMemory and update
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c80f1a1c-34b6-48e4-95b1-f963938d2729

📥 Commits

Reviewing files that changed from the base of the PR and between 54a9ef7 and c9670ea.

📒 Files selected for processing (1)

test/e2e/performanceprofile/functests/4_latency/latency.go

openshift-ci-robot · 2026-05-15T14:26:16Z

@shajmakh: This pull request references Jira Issue OCPBUGS-80952, which is invalid:

expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

When CPUs are very high the pod's fixed memory resources may become too low to run the latency checks. Add an environment variable to allow more flexibility while preserving the old behavior for backward compatibility.
The new behavior goes like this:

If no env var is set, keep the default old behavior (1Gi)

else, if it was set to a specific memory value and it's valid quantity then use that in the latency pod, otherwise throw an error. If the env var value was set to dynamic, the test will compute the memory by (number of computed CPUs * 16Mi).

Summary by CodeRabbit

New Features

Added configurable memory settings for latency tests via the LATENCY_TEST_MEMORY environment variable, enabling fine-tuned resource allocation.

Supports dynamic memory calculation mode that scales memory based on CPU configuration.

Documentation

Updated configuration documentation to include the new memory environment variable.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-05-15T15:05:28Z

@shajmakh: This pull request references Jira Issue OCPBUGS-80952, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

When CPUs are very high the pod's fixed memory resources may become too
low to run the latency checks. Add an environment variable to allow more
flexibility while preserving the old behavior for backward
compatibility.
The new behavior goes like this:

If no env var is set, keep the default old behavior (1Gi)

else, if it was set to a specific memory value and it's valid
quantity then use that in the latency pod, otherwise throw an error.
If the env var value was set to dynamic, the test will compute the
memory by (number of computed CPUs * 32Mi).

32Mi was picked based on input from consumers of the application; If
happened that the memory is still not enough, the user has the
flexibility to override the total memory with an explicit value.

Summary by CodeRabbit

New Features

Added configurable memory settings for latency tests via the LATENCY_TEST_MEMORY environment variable, enabling fine-tuned resource allocation.

Supports dynamic memory calculation mode that scales memory based on CPU configuration.

Documentation

Updated configuration documentation to include the new memory environment variable.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

yanirq · 2026-05-16T17:50:18Z

/retest

yanirq · 2026-05-17T08:33:09Z

/retest-required

yanirq · 2026-05-18T08:39:34Z

/retest-required

openshift-ci-robot · 2026-05-18T12:56:45Z

@shajmakh: This pull request references Jira Issue OCPBUGS-80952, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Details

In response to this:

When CPUs are very high the pod's fixed memory resources may become too
low to run the latency checks. Add an environment variable to allow more
flexibility while preserving the old behavior for backward
compatibility.
The new behavior goes like this:

If no env var is set, keep the default old behavior (1Gi)

else, if it was set to a specific memory value and it's valid
quantity then use that in the latency pod, otherwise throw an error.
If the env var value was set to dynamic, the test will compute the
memory by (number of computed CPUs * 32Mi).

32Mi was picked based on input from consumers of the application; If
happened that the memory is still not enough, the user has the
flexibility to override the total memory with an explicit value.

Summary by CodeRabbit

New Features

Added configurable memory for latency tests via an environment variable, including a dynamic mode that scales memory based on CPU count.

Invalid memory values are now validated and will surface as configuration errors.

Documentation

Updated docs to describe the new memory configuration and dynamic scaling behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/performanceprofile/functests/4_latency/latency.go`:
- Around line 326-330: The code only parses the memory quantity string via
resource.ParseQuantity(val) but returns val without validating it is positive;
capture the parsed Quantity (e.g., q, err := resource.ParseQuantity(val)), then
check q.Sign() > 0 (or return an error if q.Sign() <= 0) before returning val,
so you avoid re-parsing and fail early for zero/negative memory; update the
error message to indicate non-positive values when rejecting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 853417d4-2e8b-4e08-9c31-2e82c7121ccf

📥 Commits

Reviewing files that changed from the base of the PR and between c9670ea and 496d1ba.

📒 Files selected for processing (1)

test/e2e/performanceprofile/functests/4_latency/latency.go

When CPUs are very high the pod's fixed memory resources may become too low to run the latency checks. Add an environment variable to allow more flexibility while preserving the old behavior for backward compatibility. The new behavior goes like this: 1. If no env var is set, scales up the memory amount per CPU floored at defaultTestMemory. 2. else, if it was set to a an explicit memory value and it's a valid quantity then use that in the latency pod, otherwise throw an error. `32Mi` was picked as the per-CPU memory factor based on input from consumers of the application; If happened that the memory is still not enough, the user has the flexibility to override the total memory with an explicit value. Signed-off-by: Shereen Haj <shajmakh@redhat.com>

MarSik

/lgtm
/hold You might want to update the PR description too.

openshift-ci · 2026-05-18T15:28:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MarSik, shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [MarSik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2026-05-18T15:42:13Z

@shajmakh: This pull request references Jira Issue OCPBUGS-80952, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (5.0.0) matches configured target version for branch (5.0.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

When CPUs are very high the pod's fixed memory resources may become too
low to run the latency checks. Add an environment variable to allow more
flexibility while preserving the old behavior for backward
compatibility.
The new behavior goes like this:

If no env var is set, scales up the memory amount per CPU floored at
defaultTestMemory.

else, if it was set to a an explicit memory value and it's a valid
quantity then use that in the latency pod, otherwise throw an error.

32Mi was picked as the per-CPU memory factor based on input from
consumers of the application; If happened that the memory is still not
enough, the user has the flexibility to override the total memory with
an explicit value.

Summary by CodeRabbit

New Features

Added configurable memory for latency tests via an environment variable, including a dynamic mode that scales memory based on CPU count.

Invalid memory values are now validated and will surface as configuration errors.

Documentation

Updated docs to describe the new memory configuration and dynamic scaling behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

shajmakh · 2026-05-18T15:42:31Z

/unhold

shajmakh · 2026-05-18T15:43:12Z

/cherry-pick release-4.22

openshift-cherrypick-robot · 2026-05-18T15:43:16Z

@shajmakh: once the present PR merges, I will cherry-pick it on top of release-4.22 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

shajmakh · 2026-05-18T15:43:35Z

/cherry-pick release-4.21

openshift-cherrypick-robot · 2026-05-18T15:43:38Z

@shajmakh: once the present PR merges, I will cherry-pick it on top of release-4.21 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

shajmakh · 2026-05-19T06:16:41Z

/retest

shajmakh · 2026-05-19T06:17:15Z

/verified later @mrniranjan

openshift-ci-robot · 2026-05-19T06:17:26Z

@shajmakh: This PR has been marked to be verified later by @mrniranjan.

Details

In response to this:

/verified later @mrniranjan

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-05-19T08:08:17Z

@shajmakh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot · 2026-05-19T08:12:15Z

@shajmakh: Jira Issue OCPBUGS-80952: All pull requests linked via external trackers have merged:

openshift/cluster-node-tuning-operator#1517

This pull request has the verified-later tag and will need to be manually moved to VERIFIED after testing. Jira Issue OCPBUGS-80952 has been moved to the MODIFIED state.

Details

In response to this:

When CPUs are very high the pod's fixed memory resources may become too
low to run the latency checks. Add an environment variable to allow more
flexibility while preserving the old behavior for backward
compatibility.
The new behavior goes like this:

If no env var is set, scales up the memory amount per CPU floored at
defaultTestMemory.

else, if it was set to a an explicit memory value and it's a valid
quantity then use that in the latency pod, otherwise throw an error.

32Mi was picked as the per-CPU memory factor based on input from
consumers of the application; If happened that the memory is still not
enough, the user has the flexibility to override the total memory with
an explicit value.

Summary by CodeRabbit

New Features

Added configurable memory for latency tests via an environment variable, including a dynamic mode that scales memory based on CPU count.

Invalid memory values are now validated and will surface as configuration errors.

Documentation

Updated docs to describe the new memory configuration and dynamic scaling behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-cherrypick-robot · 2026-05-19T08:13:05Z

@shajmakh: new pull request created: #1520

Details

In response to this:

/cherry-pick release-4.22

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-cherrypick-robot · 2026-05-19T08:13:48Z

@shajmakh: new pull request created: #1521

Details

In response to this:

/cherry-pick release-4.21

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-merge-robot · 2026-05-19T22:24:43Z

Fix included in release 5.0.0-0.nightly-2026-05-19-152900

openshift-ci Bot requested review from MarSik and yanirq May 15, 2026 14:16

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread test/e2e/performanceprofile/functests/4_latency/latency.go Outdated

shajmakh changed the title ~~perf: latenyc: compute memory resources dynamically~~ OCPBUGS-80952: perf: latenyc: compute memory resources dynamically May 15, 2026

shajmakh force-pushed the mem-for-latency branch from c9670ea to 75ba113 Compare May 15, 2026 14:47

MarSik reviewed May 15, 2026

View reviewed changes

Comment thread test/e2e/performanceprofile/functests/4_latency/latency.go Outdated

shajmakh force-pushed the mem-for-latency branch from 75ba113 to 3c4fe62 Compare May 15, 2026 15:04

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels May 15, 2026

shajmakh force-pushed the mem-for-latency branch from 3c4fe62 to 496d1ba Compare May 18, 2026 12:55

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread test/e2e/performanceprofile/functests/4_latency/latency.go Outdated

shajmakh changed the title ~~OCPBUGS-80952: perf: latenyc: compute memory resources dynamically~~ OCPBUGS-80952: perf: latency: compute memory resources dynamically May 18, 2026

shajmakh force-pushed the mem-for-latency branch from 496d1ba to 3232c28 Compare May 18, 2026 13:26

shajmakh mentioned this pull request May 18, 2026

WIP: perf: latency: e2e: check dynamic memory computation #1519

Draft

shajmakh force-pushed the mem-for-latency branch 3 times, most recently from 7538632 to f8b96cb Compare May 18, 2026 15:22

shajmakh force-pushed the mem-for-latency branch from f8b96cb to 93e5080 Compare May 18, 2026 15:24

MarSik approved these changes May 18, 2026

View reviewed changes

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2026

openshift-ci Bot assigned MarSik May 18, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2026

openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 18, 2026

openshift-ci-robot added the verified-later label May 19, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 19, 2026

openshift-merge-bot Bot merged commit 833d3e0 into openshift:main May 19, 2026
20 checks passed

openshift-cherrypick-robot mentioned this pull request May 19, 2026

[release-4.22] OCPBUGS-86071: perf: latency: compute memory resources dynamically #1520

Open

openshift-cherrypick-robot mentioned this pull request May 19, 2026

[release-4.21] OCPBUGS-86072: perf: latency: compute memory resources dynamically #1521

Open

Conversation

shajmakh commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci-robot commented May 15, 2026

Summary by CodeRabbit

Uh oh!

Uh oh!

openshift-ci-robot commented May 15, 2026

Summary by CodeRabbit

Uh oh!

yanirq commented May 16, 2026

Uh oh!

yanirq commented May 17, 2026

Uh oh!

yanirq commented May 18, 2026

Uh oh!

openshift-ci-robot commented May 18, 2026

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MarSik left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented May 18, 2026

Uh oh!

openshift-ci-robot commented May 18, 2026

Summary by CodeRabbit

Uh oh!

shajmakh commented May 18, 2026

Uh oh!

shajmakh commented May 18, 2026

Uh oh!

openshift-cherrypick-robot commented May 18, 2026

Uh oh!

shajmakh commented May 18, 2026

Uh oh!

openshift-cherrypick-robot commented May 18, 2026

Uh oh!

shajmakh commented May 19, 2026

Uh oh!

shajmakh commented May 19, 2026

Uh oh!

openshift-ci-robot commented May 19, 2026

Uh oh!

openshift-ci Bot commented May 19, 2026

Uh oh!

Uh oh!

openshift-ci-robot commented May 19, 2026

Summary by CodeRabbit

Uh oh!

openshift-cherrypick-robot commented May 19, 2026

Uh oh!

openshift-cherrypick-robot commented May 19, 2026

Uh oh!

openshift-merge-robot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

shajmakh commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading