Skip to content

Conversation

@teskje
Copy link
Contributor

@teskje teskje commented Jul 17, 2023

This PR adds a shutdown token check to the mz_join_core linear join implementation. When the dataflow is shutting down, this makes the operator discard all its existing work and new input data, rather than processing it. As a result, differential join operators shut down faster and emit less data, which in turn speeds up shutdown of downstream operators.

Unfortunately, we can't make the same change for the DD join operator. See #18927 for details.

Performance tests below.

Motivation

  • This PR fixes a previously unreported bug.

Joins can consume resources and impact interactivity even after their dataflows have been dropped.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • N/A

@teskje teskje force-pushed the tokenize-mz_join_core branch from fbbdcf8 to 957b326 Compare October 20, 2023 13:04
@teskje
Copy link
Contributor Author

teskje commented Oct 20, 2023

Performance Measurements

We are interested in testing these performance aspects of tokenization:

  • Impact on normal processing: whether the additional token check in the join closure has non-negligible impact on the join processing performance when the dataflow is not shutting down.
  • Time to shutdown: how much time passes between cancellation of the dataflow until it is completely shut down.

As a test case I used a simple cross-join:

materialize=> create table t (a int);
CREATE TABLE
materialize=> explain select t1.a + t2.a from t t1, t t2;
              Optimized Plan
------------------------------------------
 Explained Query:                        +
   Return                                +
     Project (#2)                        +
       Map ((#0 + #1))                   +
         CrossJoin type=differential     +
           Get l0                        +
           Get l0                        +
   With                                  +
     cte l0 =                            +
       ArrangeBy keys=[[]]               +
         ReadStorage materialize.public.t+

Impact on Normal Processing

Run the join query to completion, measure the time that takes.
We add a LIMIT 1 to avoid an error due to a too-large result.

-- setup
create table t (a int);
insert into t select generate_series(1, 20000);

-- experiment
\timing
select t1.a + t2.a from t t1, t t2 limit 1;

-- result on main
Time: 245638.411 ms (04:05.638)

-- result on this branch
Time: 242369.678 ms (04:02.370)

There is no significant difference in run times, so we can assume the token check's impact on join processing to be negligible.

Time to Shutdown

Cancel the join query after 1s, use a subscribe to measure the time until the dataflow stops showing up in the introspection sources.

-- setup
create table t (a int);
insert into t select * from generate_series(1, 20000);

-- first SQL session
copy (subscribe mz_internal.mz_dataflows with (progress)) to stdout;

-- second SQL session
set statement_timeout = '1s';
insert into t select t1.a + t2.a from t t1, t t2;

-- result on main
1697819133000	f	1	215	Dataflow: oneshot-select-t5113
1697819134000	t	\N	\N	\N
1697819135000	t	\N	\N	\N
...
1697819378000	t	\N	\N	\N
1697819378000	f	-1	215	Dataflow: oneshot-select-t5113
-- => 245s

-- result on this branch
1697810777000	f	1	329	Dataflow: oneshot-select-t7929
1697810778000	t	\N	\N	\N
1697810778000	f	-1	329	Dataflow: oneshot-select-t7929
-- => 1s

The token check makes the dataflow shut down more or less instantly.

@teskje teskje marked this pull request as ready for review October 20, 2023 16:35
@teskje teskje requested review from a team, antiguru, umanwizard and vmarcos October 20, 2023 16:35
@antiguru
Copy link
Member

Looks great, but one ask: can you move the second commit to a different PR? It's a good change, but significantly larger than the actual change of functionality.

This commit adds a shutdown token check to the `mz_join_core` linear
join implementation. When the dataflow is shutting down, this makes the
operator discard all its existing work and new input data, rather than
processing it. As a result, differential join operators shut down faster
and emit less data, which in turn speeds up shutdown of downstream
operators.

Unfortunately, we can't make the same change for the DD join operator.
We could add a token check into the result closure we pass to that
operator, but the shutdown check would interfere with the fueling of the
DD join operator. Fuel is consumed based on the number of updates
emitted. When the token is dropped, the join closure stops producing
updates, which means the operator stops consuming fuel, so it does not
yield anymore until it has drained all its inputs. If there are many
inputs left, the replica may not accept commands for potentially quite a
long time.
@teskje teskje force-pushed the tokenize-mz_join_core branch from d7489ee to 4c7e240 Compare October 22, 2023 09:45
@teskje
Copy link
Contributor Author

teskje commented Oct 22, 2023

can you move the second commit to a different PR?

Done: #22563

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@teskje
Copy link
Contributor Author

teskje commented Oct 23, 2023

TFTRs!

@teskje teskje merged commit 88078a1 into MaterializeInc:main Oct 23, 2023
@teskje teskje deleted the tokenize-mz_join_core branch October 23, 2023 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants