Skip to content

reduce_sum: Task Isolation Missing in TBB #3388

@andrepfeuffer

Description

@andrepfeuffer

Deadlock or Wrong Results

Files Affected

  • stan/lib/stan_math/stan/math/prim/functor/reduce_sum.hpp
    • tbb::parallel_reduce call site

Root Cause Analysis

When reduce_sum calls are nested (one reduce_sum inside another's ReduceFunction), TBB task stealing can corrupt partial sums:

struct outer_fn {
  double operator()(const std::vector<double>& slice, size_t start, size_t end,
                    std::ostream* msgs) const {
    // Inner reduce_sum inside outer reduce_sum!
    return stan::math::reduce_sum<inner_fn>(slice, grainsize, msgs);
  }
};

std::vector<double> outer_data(100, 1.0);
double result = stan::math::reduce_sum<outer_fn>(outer_data, 5, nullptr);

The Problem:

TBB's task stealing allows work from one task to be stolen by another:

Outer reduce_sum:
  Chunk A: running inner_reduce_sum
  Chunk B: ready to run
  Chunk C: ready to run

Inner reduce_sum (from Chunk A):
  Task 1: running
  Task 2: ready

TBB's work stealer:
  "Chunk B is idle, but Inner Task 1 is ready → steal it for Chunk B"
  
Result:
  Work from different reduction trees intermixes
  Partial sums from Inner get added to Outer incorrectly
  Deadlock possible if all threads stuck waiting on wrong tasks

The Fix

// BEFORE (allows task stealing):
tbb::parallel_reduce(range, worker);
return_type result = worker.sum_;

// AFTER (isolates task arena):
return_type result(0);
tbb::this_task_arena::isolate([&] {
  tbb::parallel_reduce(range, worker);
  result = worker.sum_;
});
return result;

tbb::this_task_arena::isolate() creates a boundary that TBB respects — work inside cannot be stolen by work outside, and vice versa.

Test Coverage

TEST(StanMathPrim_reduce_sum, nested_reduce_sum_isolation) {
  struct inner_fn {
    double operator()(const std::vector<double>& slice, size_t, size_t,
                      std::ostream*) const {
      double s = 0;
      for (auto x : slice) s += x;
      return s;
    }
  };

  struct outer_fn {
    double operator()(const std::vector<double>& slice, size_t, size_t,
                      std::ostream* msgs) const {
      // Inner reduce_sum
      return stan::math::reduce_sum<inner_fn>(slice, 1, msgs);
    }
  };

  std::vector<double> outer_data(100, 1.0);
  // Without isolation, task stealing could corrupt partial sums
  // With isolation, each reduce_sum respects its boundary
  double result = stan::math::reduce_sum<outer_fn>(outer_data, 5, nullptr);
  EXPECT_DOUBLE_EQ(result, 100.0);
}

Impact

Before Fix:

  • Nested reduce_sum calls silently produce wrong results
  • Symptoms: unpredictable values, race conditions, possible deadlock
  • Only manifests under specific threading/load conditions
  • Very difficult to debug

After Fix:

  • Task boundaries respected
  • Nested reduce_sum works correctly
  • Deterministic results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions