Skip to content

Conversation

@dorooleg
Copy link
Collaborator

@dorooleg dorooleg commented Dec 8, 2025

Changelog entry

...

Changelog category

  • Not for changelog (changelog entry is not required)

Description for reviewers

...

@ydbot
Copy link
Collaborator

ydbot commented Dec 8, 2025

Run Extra Tests

Run additional tests for this PR. You can customize:

  • Test Size: small, medium, large (default: all)
  • Test Targets: any directory path (default: ydb/)
  • Sanitizers: ASAN, MSAN, TSAN
  • Coredumps: enable for debugging (default: off)
  • Additional args: custom ya make arguments

▶  Run tests

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

🟢 2025-12-08 18:30:18 UTC The validation of the Pull Request description is successful.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves overload error messages by introducing a new TOverloadStatus struct that wraps the existing EOverloadStatus enum with descriptive error reasons. This allows the system to provide more informative feedback to users when writes are rejected due to various overload conditions.

Key Changes:

  • Introduced TOverloadStatus struct containing both status enum and a descriptive reason string
  • Updated CheckOverloadedImmediate() and ResourcesStatusToOverloadStatus() to return the new struct with detailed error messages
  • Enhanced error messages for disk quota, in-flight writes, transaction limits, and reject probability scenarios

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
ydb/core/tx/columnshard/counters/columnshard.h Adds the TOverloadStatus struct definition with Status and Reason fields
ydb/core/tx/columnshard/columnshard_impl.h Adds type alias for TOverloadStatus and updates method signatures for ResourcesStatusToOverloadStatus() and CheckOverloadedImmediate()
ydb/core/tx/columnshard/columnshard__write.cpp Updates write handler to extract and use the status from the struct, and includes the reason in error responses
ydb/core/tx/columnshard/columnshard__overload.cpp Implements the new return type with descriptive error messages for each overload scenario

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

2025-12-08 18:31:35 UTC Pre-commit check linux-x86_64-release-asan for 410e266 has started.
2025-12-08 18:31:51 UTC Artifacts will be uploaded here
2025-12-08 18:33:57 UTC ya make is running...
🟡 2025-12-08 19:57:57 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12012 11892 0 104 9 7

🟢 2025-12-08 19:58:06 UTC Build successful.
🟢 2025-12-08 19:58:37 UTC ydbd size 3.8 GiB changed* by +10.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 309cbae merge: 410e266 diff diff %
ydbd size 4 130 047 072 Bytes 4 130 057 752 Bytes +10.4 KiB +0.000%
ydbd stripped size 1 533 185 560 Bytes 1 533 192 536 Bytes +6.8 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

2025-12-08 18:31:38 UTC Pre-commit check linux-x86_64-relwithdebinfo for 410e266 has started.
2025-12-08 18:31:55 UTC Artifacts will be uploaded here
2025-12-08 18:34:08 UTC ya make is running...
🟡 2025-12-08 20:17:59 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39747 36790 0 6 2922 29

2025-12-08 20:18:15 UTC ya make is running... (failed tests rerun, try 2)
🟡 2025-12-08 20:31:42 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
86 (only retried tests) 69 0 2 0 15

2025-12-08 20:31:50 UTC ya make is running... (failed tests rerun, try 3)
🔴 2025-12-08 20:42:30 UTC Some tests failed, follow the links below.

Ya make output | Test bloat | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
32 (only retried tests) 17 0 2 0 13

🟢 2025-12-08 20:42:38 UTC Build successful.
🟢 2025-12-08 20:43:03 UTC ydbd size 2.3 GiB changed* by +3.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 309cbae merge: 410e266 diff diff %
ydbd size 2 466 169 536 Bytes 2 466 173 520 Bytes +3.9 KiB +0.000%
ydbd stripped size 524 847 552 Bytes 524 848 384 Bytes +832 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

2025-12-09 13:55:44 UTC Pre-commit check linux-x86_64-relwithdebinfo for 58693d9 has started.
2025-12-09 13:56:21 UTC Artifacts will be uploaded here
2025-12-09 13:58:28 UTC ya make is running...
🟡 2025-12-09 15:34:27 UTC Some tests failed, follow the links below. Going to retry failed tests...

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
39808 36862 0 2 2919 25

2025-12-09 15:34:42 UTC ya make is running... (failed tests rerun, try 2)
🟢 2025-12-09 15:47:08 UTC Tests successful.

Ya make output | Test bloat | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
70 (only retried tests) 54 0 0 0 16

🟢 2025-12-09 15:47:16 UTC Build successful.
🟢 2025-12-09 15:47:39 UTC ydbd size 2.3 GiB changed* by +3.9 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 22e20a2 merge: 58693d9 diff diff %
ydbd size 2 467 362 888 Bytes 2 467 366 872 Bytes +3.9 KiB +0.000%
ydbd stripped size 525 091 904 Bytes 525 092 736 Bytes +832 Bytes +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

2025-12-09 13:56:09 UTC Pre-commit check linux-x86_64-release-asan for 58693d9 has started.
2025-12-09 13:56:27 UTC Artifacts will be uploaded here
2025-12-09 13:58:31 UTC ya make is running...
🟡 2025-12-09 15:04:50 UTC Some tests failed, follow the links below. This fail is not in blocking policy yet

Ya make output | Test bloat

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
12044 11981 0 45 7 11

🟢 2025-12-09 15:04:59 UTC Build successful.
🟢 2025-12-09 15:05:29 UTC ydbd size 3.8 GiB changed* by +10.4 KiB, which is < 100.0 KiB vs main: OK

ydbd size dash main: 22e20a2 merge: 58693d9 diff diff %
ydbd size 4 131 808 272 Bytes 4 131 818 928 Bytes +10.4 KiB +0.000%
ydbd stripped size 1 533 777 880 Bytes 1 533 784 856 Bytes +6.8 KiB +0.000%

*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation

@Vladilen Vladilen requested a review from Copilot December 9, 2025 14:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return TOverloadStatus{EOverloadStatus::ShardWritesInFly, "The limit on the number of in-flight write requests to a shard has been exceeded. Please add more resources or reduce the database load."};
case NOverload::EResourcesStatus::WritesSizeInFlyLimitReached:
return EOverloadStatus::ShardWritesSizeInFly;
return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."};
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switch statement is missing a default case. If an unexpected EResourcesStatus value is passed, the function will fall through without returning a value, leading to undefined behavior. Consider adding a default case that either asserts or returns an appropriate error status.

Suggested change
return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."};
return TOverloadStatus{EOverloadStatus::ShardWritesSizeInFly, "The limit on the total size of in-flight write requests to the shard has been exceeded. Please add more resources or reduce the database load."};
default:
Y_FAIL("Unexpected EResourcesStatus value: %d", static_cast<int>(status));
return TOverloadStatus{EOverloadStatus::None, "Unknown resource status."};

Copilot uses AI. Check for mistakes.
TColumnShard::TOverloadStatus TColumnShard::CheckOverloadedImmediate(const TInternalPathId /* pathId */) const {
if (IsAnyChannelYellowStop()) {
return EOverloadStatus::Disk;
return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow), please rebalance groups or add new ones"};
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar issue: The comma splice creates a run-on sentence. Consider restructuring as: "Channels are overloaded (yellow). Please rebalance groups or add new ones." or "Channels are overloaded (yellow); please rebalance groups or add new ones."

Suggested change
return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow), please rebalance groups or add new ones"};
return TOverloadStatus{EOverloadStatus::Disk, "Channels are overloaded (yellow). Please rebalance groups or add new ones"};

Copilot uses AI. Check for mistakes.
AFL_WARN(NKikimrServices::TX_COLUMNSHARD_WRITE)("event", "shard_overload")("reason", "tx_in_fly")("sum", Executor()->GetStats().TxInFly)(
"limit", txLimit);
return EOverloadStatus::ShardTxInFly;
return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."};
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting of this error message is awkward. The transaction counts appear mid-sentence without proper punctuation. Consider reformatting as: "The local transaction limit has been exceeded: " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load." (adding a colon after "exceeded")

Suggested change
return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."};
return TOverloadStatus{EOverloadStatus::ShardTxInFly, TStringBuilder{} << "The local transaction limit has been exceeded: " << Executor()->GetStats().TxInFly << " of " << txLimit << ". Please add more resources or reduce the database load."};

Copilot uses AI. Check for mistakes.
@dorooleg dorooleg merged commit 0d69297 into ydb-platform:main Dec 9, 2025
14 of 15 checks passed
@dorooleg dorooleg deleted the overload_error_improvement branch December 9, 2025 17:47
@ydbot
Copy link
Collaborator

ydbot commented Dec 9, 2025

Backport

To backport this PR, click the button next to the target branch and then click "Run workflow" in the Run Actions UI.

Branch Run
stable-25-2, stable-25-2-1, stable-25-3, stable-25-3-1 ▶  Backport
stable-25-3, stable-25-3-1 ▶  Backport
stable-25-3 ▶  Backport

▶  Backport manual

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants