-
Notifications
You must be signed in to change notification settings - Fork 739
overload error messages have been improved #30329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overload error messages have been improved #30329
Conversation
|
🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves overload error messages by introducing a new TOverloadStatus struct that wraps the existing EOverloadStatus enum with descriptive error reasons. This allows the system to provide more informative feedback to users when writes are rejected due to various overload conditions.
Key Changes:
- Introduced
TOverloadStatusstruct containing both status enum and a descriptive reason string - Updated
CheckOverloadedImmediate()andResourcesStatusToOverloadStatus()to return the new struct with detailed error messages - Enhanced error messages for disk quota, in-flight writes, transaction limits, and reject probability scenarios
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ydb/core/tx/columnshard/counters/columnshard.h | Adds the TOverloadStatus struct definition with Status and Reason fields |
| ydb/core/tx/columnshard/columnshard_impl.h | Adds type alias for TOverloadStatus and updates method signatures for ResourcesStatusToOverloadStatus() and CheckOverloadedImmediate() |
| ydb/core/tx/columnshard/columnshard__write.cpp | Updates write handler to extract and use the status from the struct, and includes the reason in error responses |
| ydb/core/tx/columnshard/columnshard__overload.cpp | Implements the new return type with descriptive error messages for each overload scenario |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
⚪ Ya make output | Test bloat | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪ ⚪ Ya make output | Test bloat | Test bloat
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
|
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return TOverloadStatus{EOverloadStatus::ShardWritesInFly, "The limit on the number of in-flight write requests to a shard has been exceeded. Please add more resources or reduce the database load."}; | ||
| case NOverload::EResourcesStatus::WritesSizeInFlyLimitReached: | ||
| return EOverloadStatus::ShardWritesSizeInFly; | ||
| return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."}; |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This switch statement is missing a default case. If an unexpected EResourcesStatus value is passed, the function will fall through without returning a value, leading to undefined behavior. Consider adding a default case that either asserts or returns an appropriate error status.
| return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."}; | |
| return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."}; | |
| default: | |
| Y_FAIL("Unexpected EResourcesStatus value: %d", static_cast<int>(status)); | |
| return TOverloadStatus{EOverloadStatus::None, "Unknown resource status."}; |
| TColumnShard::TOverloadStatus TColumnShard::CheckOverloadedImmediate(const TInternalPathId /* pathId */) const { | ||
| if (IsAnyChannelYellowStop()) { | ||
| return EOverloadStatus::Disk; | ||
| return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow), please rebalance groups or add new ones"}; |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue: The comma splice creates a run-on sentence. Consider restructuring as: "Channels are overloaded (yellow). Please rebalance groups or add new ones." or "Channels are overloaded (yellow); please rebalance groups or add new ones."
| return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow), please rebalance groups or add new ones"}; | |
| return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow). Please rebalance groups or add new ones"}; |
| AFL_WARN(NKikimrServices::TX_COLUMNSHARD_WRITE)("event", "shard_overload")("reason", "tx_in_fly")("sum", Executor()->GetStats().TxInFly)( | ||
| "limit", txLimit); | ||
| return EOverloadStatus::ShardTxInFly; | ||
| return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."}; |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatting of this error message is awkward. The transaction counts appear mid-sentence without proper punctuation. Consider reformatting as: "The local transaction limit has been exceeded: " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load." (adding a colon after "exceeded")
| return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."}; | |
| return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded: " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."}; |
Changelog entry
...
Changelog category
Description for reviewers
...