Skip to content

detect resource constraints to set default parameters#4108

Open
dtrawins wants to merge 20 commits intomainfrom
limit-workers
Open

detect resource constraints to set default parameters#4108
dtrawins wants to merge 20 commits intomainfrom
limit-workers

Conversation

@dtrawins
Copy link
Copy Markdown
Collaborator

@dtrawins dtrawins commented Apr 1, 2026

🛠 Summary

Default number of REST workers based only on the number of CPUs might exceed the number of open files limit. It can happen for example in Kubernetes with Xeon CPU with almost 400 CPU cores and ulimit 1024 files.
This change make the default number of REST workers safe in such conditions. It will include open files limit as criteria to allow initialization and open connections. The ration between rest worker could assumes open files for drongon, mediapipe graphs and client connections.
When the number of REST workers is set explicitly, and required open files number exceeds the limit, a clear error will be raised and OVMS will exit without attempt to load the model.

This PR is also detecting CPU constraints inside docker containers. There is checked cpu affinity and quota. They are also used to set default number of threads for modules.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 1, 2026 23:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates OVMS REST worker configuration to account for Linux open-files limits, preventing overly large rest_workers values and adding regression tests.

Changes:

  • Compute a Linux-specific default for rest_workers based on RLIMIT_NOFILE and add a validation cap tied to open-files availability.
  • Add death tests validating behavior when the open-files limit is low.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/config.cpp Adds Linux open-files-based default/validation logic for rest_workers.
src/test/ovmsconfig_test.cpp Adds tests that manipulate RLIMIT_NOFILE to validate the new rest_workers constraints.

Comment thread src/config.cpp Outdated
Comment on lines +50 to +54
// We need to minimize the number of default drogon workers since this value is set for both: unary and streaming (making it always double)
// on linux, restrict also based on the max allowed number of open files
#ifdef __linux__
#include <sys/resource.h>
const uint64_t MAX_OPEN_FILES = []() {
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <sys/resource.h> is placed inside namespace ovms (and mid-file). Including system headers inside a namespace can unintentionally put all their declarations/types into that namespace and create subtle lookup/type issues. Move this include up with the other includes at file scope (outside the namespace) and keep only the constants inside ovms.

Copilot uses AI. Check for mistakes.
Comment thread src/config.cpp Outdated
Comment thread src/config.cpp Outdated
Comment on lines +326 to +329
if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) {
std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux, this check subtracts RESERVED_OPEN_FILES from MAX_OPEN_FILES without guarding MAX_OPEN_FILES <= RESERVED_OPEN_FILES. With unsigned types this can underflow and effectively disable the limit check. Add a guard (or use a saturating subtraction) before computing (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5, and consider emitting a clear error when the open-files limit is too low to run REST workers at all.

Suggested change
if (restWorkers() > (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5) {
std::cerr << "rest_workers count cannot be larger than " << (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5 << " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}
if (MAX_OPEN_FILES <= RESERVED_OPEN_FILES) {
std::cerr << "Open files limit (" << MAX_OPEN_FILES
<< ") is too low to run REST workers. Increase the RLIMIT_NOFILE soft limit "
<< "or adjust the server configuration." << std::endl;
return false;
}
const uint64_t maxRestWorkersByFiles = (MAX_OPEN_FILES - RESERVED_OPEN_FILES) / 5;
if (restWorkers() > maxRestWorkersByFiles) {
std::cerr << "rest_workers count cannot be larger than " << maxRestWorkersByFiles
<< " due to open files limit. Current open files limit: " << MAX_OPEN_FILES << std::endl;
return false;
}

Copilot uses AI. Check for mistakes.
Comment thread src/test/ovmsconfig_test.cpp Outdated
Comment on lines +19 to +20
#include <regex>
#include <sys/resource.h>
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<sys/resource.h> (and the new getrlimit/setrlimit-based tests) are Linux-specific, but the include is unconditional. This will break builds on non-Linux platforms. Wrap the include and these two new tests in #ifdef __linux__ (or equivalent platform guard).

Suggested change
#include <regex>
#include <sys/resource.h>
#include <regex>
#ifdef __linux__
#include <sys/resource.h>
#endif

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +212
TEST_F(OvmsConfigDeathTest, restWorkersDefaultReducedForOpenFilesLimit) {
// limit allowed number of open files to 1024 to make sure that rest_workers count is too large for the limit based on number of cpu cores alone
int cpu_cores = ovms::getCoreCount();
struct rlimit limit;
ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0);
struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max};
std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test comment says the open-files limit is set to 1024, but the code actually sets it to cpu_cores * 5. Please update the comment to match the behavior (or adjust the limit if 1024 is intended).

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +214
int cpu_cores = ovms::getCoreCount();
struct rlimit limit;
ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0);
struct rlimit newLimit = {static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max};
std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl;
ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &newLimit), 0);

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setrlimit(RLIMIT_NOFILE, ...) can fail if newLimit.rlim_cur exceeds limit.rlim_max (or if the process lacks permission to change limits). As written, cpu_cores * 5 might be > rlim_max on some systems/containers, making this test flaky. Consider clamping rlim_cur to limit.rlim_max and/or skipping the test with a clear message when the limit cannot be adjusted.

Copilot uses AI. Check for mistakes.
dtrawins and others added 4 commits April 2, 2026 09:18
Comment thread src/config.cpp Outdated
Comment thread src/config.cpp Outdated
if (maxOpenFiles <= RESERVED_OPEN_FILES) {
return static_cast<uint64_t>(2); // minimum functional number
}
return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add log that number of workers has been decreased due to limit of open files?

char* n_argv[] = {"ovms", "--config_path", "/path1", "--rest_port", "8080", "--port", "8081"};
int arg_count = 7;
ovms::Config::instance().parse(arg_count, n_argv);
EXPECT_TRUE(ovms::Config::instance().validate());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test doesnt check if number of workers has been changed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, it would require calculation how many cores the machine has

Comment thread src/config.cpp Outdated
Comment thread src/config.cpp Outdated
Comment on lines +58 to +64
static uint64_t getMaxOpenFilesLimit() {
struct rlimit limit;
if (getrlimit(RLIMIT_NOFILE, &limit) == 0) {
return limit.rlim_cur;
}
return std::numeric_limits<uint64_t>::max();
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like utility that could exist in systeminfo.hpp (already included) in L38.

Comment thread src/test/ovmsconfig_test.cpp Outdated
char* n_argv[] = {"ovms", "--config_path", "/path1", "--rest_port", "8080", "--port", "8081", "--rest_workers", "1000"};
int arg_count = 9;
EXPECT_EXIT(ovms::Config::instance().parse(arg_count, n_argv), ::testing::ExitedWithCode(OVMS_EX_USAGE), "rest_workers count cannot be larger than 168 due to open files limit. Current open files limit: 1024");
ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &limit), 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If test exits earlier we won't restore original rlimit value. Need either unique_ptr with custom deleter to handle that or helper struct. The same is true for previous test.

@dtrawins dtrawins changed the title restrict rest workers based on open files limit detect resource constraints to set default parameters May 8, 2026
@dtrawins dtrawins requested a review from mzegla May 8, 2026 07:08
@dtrawins dtrawins added this to the 2026.2_rc milestone May 8, 2026
@atobiszei atobiszei requested a review from Copilot May 8, 2026 11:34
Comment thread src/test/systeminfo_test.cpp Outdated
Comment on lines +35 to +36
EXPECT_GE(cpuCount, 2);
EXPECT_LE(cpuCount, std::max(static_cast<uint16_t>(2), static_cast<uint16_t>(std::thread::hardware_concurrency())));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this part of change is needed?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Comment thread src/systeminfo.cpp Outdated
Comment thread src/systeminfo.cpp Outdated
Comment thread src/systeminfo.cpp
Comment on lines 35 to +49
uint16_t getCoreCount() {
return std::thread::hardware_concurrency();
uint16_t detectedCoreCount = static_cast<uint16_t>(std::thread::hardware_concurrency());
#ifdef __linux__
if (isRunningInDocker()) {
const uint16_t affinityCount = getCpuAffinityCount();
const uint16_t quotaCount = getDockerCpuQuota();
if (quotaCount > 0) {
detectedCoreCount = std::min(affinityCount, quotaCount);
} else {
detectedCoreCount = affinityCount;
}
}
#endif
return std::max(static_cast<uint16_t>(2), detectedCoreCount);
}
Comment thread src/config.cpp Outdated
Comment on lines +39 to +41
#ifdef __linux__
#include <sys/resource.h>
#endif
Comment thread src/config.cpp Outdated
Comment on lines +57 to +65
const uint64_t RESERVED_OPEN_FILES = 15; // we need to reserve some file descriptors for other operations, so we don't want to use all of them for drogon workers
uint64_t getDefaultRestWorkers() {
const uint64_t maxOpenFiles = getMaxOpenFilesLimit();
if (maxOpenFiles <= RESERVED_OPEN_FILES) {
return static_cast<uint64_t>(2); // minimum functional number
}

return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections
}
Comment thread src/config.cpp Outdated
Comment on lines +330 to +333
if (restWorkers() > (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6) {
std::cerr << "rest_workers count cannot be larger than " << (getMaxOpenFilesLimit() - RESERVED_OPEN_FILES) / 6 << " due to open files limit. Current open files limit: " << getMaxOpenFilesLimit() << std::endl;
return false;
}
Comment thread src/config.cpp Outdated
Comment on lines +58 to +64
uint64_t getDefaultRestWorkers() {
const uint64_t maxOpenFiles = getMaxOpenFilesLimit();
if (maxOpenFiles <= RESERVED_OPEN_FILES) {
return static_cast<uint64_t>(2); // minimum functional number
}

return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections
Comment on lines +210 to +225
TEST_F(OvmsConfigDeathTest, restWorkersDefaultReducedForOpenFilesLimit) {
// limit allowed number of open files to value that enforce default rest_workers to be determined based on open files limit instead of number of cpu cores alone. This is to test that default rest_workers count is reduced when open files limit is low.
int cpu_cores = ovms::getCoreCount();
struct rlimit limit;
ASSERT_EQ(getrlimit(RLIMIT_NOFILE, &limit), 0);
struct rlimit newLimit = {std::min(static_cast<rlim_t>(cpu_cores * 5), limit.rlim_max), limit.rlim_max};
std::cout << "Setting open files limit to " << newLimit.rlim_cur << " to test that default rest_workers count is reduced based on open files limit" << std::endl;
ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &newLimit), 0);

char* n_argv[] = {"ovms", "--config_path", "/path1", "--rest_port", "8080", "--port", "8081"};
int arg_count = 7;
ovms::Config::instance().parse(arg_count, n_argv);
EXPECT_TRUE(ovms::Config::instance().validate());

ASSERT_EQ(setrlimit(RLIMIT_NOFILE, &limit), 0); // revert ulimit to original value
}
Comment thread src/config.cpp Outdated
return static_cast<uint64_t>(2); // minimum functional number
}

return std::min(static_cast<uint64_t>(AVAILABLE_CORES), (maxOpenFiles - RESERVED_OPEN_FILES) / 7); // 5x rest_workers to initialize ovms and 2x rest_workers for new connections
Comment thread src/config.cpp Outdated
Comment on lines +398 to +414
if (this->serverSettings.restWorkers.has_value()) {
return this->serverSettings.restWorkers.value();
}

const uint64_t defaultRestWorkers = getDefaultRestWorkers();
#ifdef __linux__
const uint64_t maxOpenFiles = getMaxOpenFilesLimit();
const uint16_t detectedCores = getCoreCount();
SPDLOG_DEBUG("Detected cores: {} (may be constrained by Docker limits), max open files: {}, calculated default rest workers: {}", detectedCores, maxOpenFiles, defaultRestWorkers);
if (isRunningInDocker()) {
SPDLOG_DEBUG("Docker CPU quota detected: {}, CPU affinity count: {}", getDockerCpuQuota(), getCpuAffinityCount());
}
#else
const uint16_t detectedCores = getCoreCount();
SPDLOG_DEBUG("Detected cores: {}, calculated default rest workers: {}", detectedCores, defaultRestWorkers);
#endif
return static_cast<uint32_t>(defaultRestWorkers);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it - we have a method getDefaultRestWorkers, already handling part of the core count limitation logic (open files & core count). Why not put all the logic there and here we just stick to old implemetnation with minro adjustement:

return this->serverSettings.restWorkers.value_or(getDefaultRestWorkers());

Much clearer IMO

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for debugging purposes. logger is initialized in the config parser but not in systeminfo. For that reason the methods are called again.

Comment thread src/systeminfo.cpp Outdated
}
}
#endif
return std::max(static_cast<uint16_t>(2), detectedCoreCount);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return 2 here? This is utility for detecting core count. If it returns 1 (when it fails?) we should change that in the explicit place that this >1 core is required as it is required by that specific place.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to avoid situation when only 1 thread is allocated in other modules. Even with fraction of cpu core, there should be at lease 2 threads for the service to function properly.

Comment thread src/test/systeminfo_test.cpp
dtrawins and others added 4 commits May 8, 2026 20:05
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants