Skip to content

Fix race in AsyncWaiter#3755

Open
mkornaukhov wants to merge 1 commit intoaws:mainfrom
serenedb:mkornaukhov/fix-race
Open

Fix race in AsyncWaiter#3755
mkornaukhov wants to merge 1 commit intoaws:mainfrom
serenedb:mkornaukhov/fix-race

Conversation

@mkornaukhov
Copy link

@mkornaukhov mkornaukhov commented Mar 19, 2026

Fix race. In MakeRequest() async waiter is passed to callback by reference in lambda. So we had concurrent Wakeup() and Wait() + destructor as variable is local. So after setting m_wakeupIntentional and unlocking, the wait method had finished and destructor of async writer and consequently destructor of condvar is called. The latest calls some pthread stuff that is concurrent with notify, so tsan detects it.

  • Did a review by yourself.
  • Tests: need to run with tsan
  • Checked if this PR is a breaking (APIs have been changed) change.
  • Checked if this PR will not introduce cross-platform inconsistent behavior.
  • Checked if this PR would require a ReadMe/Wiki update.

Check which platforms you have built SDK on to verify the correctness of this PR.

  • Linux
  • Windows
  • Android
  • MacOS
  • IOS
  • Other Platforms

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Mikhail Kornaukhov <mkornaukhov@serenedb.com>
@pulimsr
Copy link
Contributor

pulimsr commented Mar 25, 2026

Thanks for the fix, we tested both main and your change with TSAN enabled (-fsanitize=thread) on Amazon Linux 2023 using the HttpClientTest.* suite. On main, TSAN flags the race as expected:

SUMMARY: ThreadSanitizer: data race (...) in AsyncWaiter::Wakeup()
SUMMARY: ThreadSanitizer: data race (...) in pthread_cond_destroy

However, we're seeing the same warnings with your fix applied. The TSAN output includes Mutex M... is already destroyed — the callback thread is calling Wakeup() after the stacklocal AsyncWaiter has already been destroyed, which is a different race than notify_one() being outside the lock.

Could you share the setup / repro steps you used to verify the fix? Curious if you saw TSAN clean on your end.

@mkornaukhov
Copy link
Author

mkornaukhov commented Mar 25, 2026

@pulimsr Thanks for reviewing!

The steps to reproduce are following:

  • Clone https://github.com/serenedb/serenedb, do git submodule update --init
  • Built it with tsan (most dependencies are vendored, so almost no preinstalled stuff required and everything should be really reproducible). I do it using CMakeUserPresets.json, it looks like:
{
  "version": 8,
  "configurePresets": [
    {
      "name": "tsan",
      "binaryDir": "build_tsan",
      "inherits": [
        "lldb"
      ],
      "cacheVariables": {
        "CMAKE_BUILD_TYPE": "RelWithDebInfo",
        "SDB_SANITIZE": "Thread"
      }
    }
  ]
}

Just complete cmake --preset tsan. Then cd build_tsan && ninja serened.

  • Run database:
./build_tsan/bin/serened /tmp/data-dir \
	--server.endpoint pgsql+tcp://0.0.0.0:6162 \
	--config=/path/to/serenedb/etc/relative/serened.conf
  • Launch s3 test: ./tests/sqllogic/run.sh --single-port 6162 --test "tests/sqllogic/**/file_s3.test*" --protocol simple

Without this fix I see

SUMMARY: ThreadSanitizer: data race (...) in AsyncWaiter::Wakeup()
SUMMARY: ThreadSanitizer: data race (...) in pthread_cond_destroy

stuff. But with this fix I do not see such rows.

If you face any difficulties during reproducing this (building, launching tests or smth else) -- feel free to tag and contact me.

@pulimsr pulimsr mentioned this pull request Mar 25, 2026
11 tasks
@pulimsr
Copy link
Contributor

pulimsr commented Mar 25, 2026

Thanks for the details. I re-tested with Clang TSAN (instead of GCC) and confirmed the fix - PR to merge, Thank-you for your contribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants