Revisit DAG benchmarks by psalz · Pull Request #327 · celerity/celerity-runtime

psalz · 2025-01-14T12:20:54Z

This overhauls the DAG benchmarks to address some long standing issues, as well as to future-proof them for loop templates:

Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked.
Don't measure benchmark context creation / destruction.
Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect until now).
Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs.
Remove outdated "throttled submission" scheduler benchmarks (these were designed to measure the impact of the scheduler's lock-step operation with the main thread, which has long been removed).

Due to these changes, we should expect CDAG benchmark times to go down, because they no longer include TDAG generation. IDAG benchmarks no longer include TDAG / CDAG, but due to the switch from host_task to parallel_for, this is a tossup (benchmarks with lots of communication, such as the (strangely named) chain topology now take a lot longer).

For the scheduler benchmarks I've decided to include the TDAG generation time in the main thread, which is overlapped with CDAG / IDAG generation in the scheduler thread, as this best reflects reality.

I've also made some improvements to our CI perf impact plotting script:

Group benchmarks by category
Left-align legend, one entry per line
Don't create images for categories without changes ("No Data")
Support newly added and removed benchmarks

github-actions · 2025-01-14T12:31:58Z

Check-perf-impact results: (e103131cc2f75d73207c09cd7ef74e9c)

⚠️ Significant slowdown (>1.25x) in some microbenchmark results: 8 individual benchmarks affected
🚀 Significant speedup (<0.80x) in some microbenchmark results: 39 individual benchmarks affected
➕ Added microbenchmark(s): 24 individual benchmarks affected
➖ Removed microbenchmark(s): 48 individual benchmarks affected

Relative execution time per category: (mean of relative medians)

command-graph : 0.72x 🚀
graph-nodes : 0.99x
grid : 1.02x
instruction-graph : 3.40x ⚠️
scheduler : new 🌟
system : 1.05x
task-graph : 0.87x 🚀

coveralls · 2025-01-14T12:39:53Z

Pull Request Test Coverage Report for Build 15564521045

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall first build on revisit-dag-benchmarks at 95.07%

Totals
Change from base Build 15412343516:	95.1%
Covered Lines:	7146
Relevant Lines:	7251

💛 - Coveralls

PeterTh

LGTM

PeterTh · 2025-01-14T13:39:20Z

include/utils.h

 #define CELERITY_DETAIL_UTILS_CAT_2(a, b) a##b
 #define CELERITY_DETAIL_UTILS_CAT(a, b) CELERITY_DETAIL_UTILS_CAT_2(a, b)

+#define CELERITY_DETAIL_UTILS_NON_COPYABLE(classname)                                                                                                          \


fknorr

Like the new visualization!

fknorr · 2025-01-15T13:30:51Z

test/dag_benchmarks.cc

+
+	void initialize() {
+		tdag_benchmark_context::initialize();
+		create_all_commands();


This doesn't do anything since m_tasks is empty, right?

Also, what is it supposed to do, pre-allocate m_command_batches? Maybe also add a comment.

fknorr · 2025-01-15T13:33:30Z

test/dag_benchmarks.cc

+	void initialize() { tm.generate_epoch_task(celerity::detail::epoch_action::init); }
+
+	void prepare() { m_tasks.reserve(m_command_groups.size()); }
+
+	void execute() { create_all_tasks(); }
+
+	void finalize() {


Maybe have a comment about what the purpose of these four functions is (I would naively expect all setup to happen in the ctor, and the benchmarked code to be in a single member function).

fknorr · 2025-01-15T13:34:29Z

test/dag_benchmarks.cc

+	void finalize() {
+		m_tasks.clear();
+		tdag_benchmark_context::finalize();
+		create_all_commands();


Why do commands need to be created in what appears to be a teardown function?

GagaLP

Looks like a great way to unify benchmark execution, nicely done!

- Group benchmarks by category - Left-align legend, one entry per line - Don't create images for categories without changes ("No Data") - Support newly added and removed benchmarks

This overhauls the DAG benchmarks to address some long standing issues: - Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked. - Don't measure benchmark context creation / destruction. - Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect). - Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs. - Remove outdated "throttled submission" scheduler benchmarks.

psalz force-pushed the revisit-dag-benchmarks branch 2 times, most recently from dbf0901 to 7b13888 Compare January 14, 2025 12:31

PeterTh approved these changes Jan 14, 2025

View reviewed changes

fknorr reviewed Jan 15, 2025

View reviewed changes

GagaLP approved these changes May 26, 2025

View reviewed changes

psalz and others added 4 commits June 10, 2025 14:49

CI: Improve plots generated by performance check script

0c3f7ba

- Group benchmarks by category - Left-align legend, one entry per line - Don't create images for categories without changes ("No Data") - Support newly added and removed benchmarks

Update benchmark results for DAG benchmark overhaul

4ed4cf1

Clarify some of the new included functions

7441bdb

GagaLP force-pushed the revisit-dag-benchmarks branch from 7b13888 to 7441bdb Compare June 10, 2025 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit DAG benchmarks#327

Revisit DAG benchmarks#327
psalz wants to merge 4 commits intomasterfrom
revisit-dag-benchmarks

psalz commented Jan 14, 2025

Uh oh!

github-actions bot commented Jan 14, 2025

Uh oh!

coveralls commented Jan 14, 2025 •

edited

Loading

Uh oh!

PeterTh left a comment

Uh oh!

PeterTh Jan 14, 2025

Uh oh!

fknorr left a comment

Uh oh!

fknorr Jan 15, 2025

Uh oh!

fknorr Jan 15, 2025

Uh oh!

fknorr Jan 15, 2025

Uh oh!

GagaLP left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

psalz commented Jan 14, 2025

Uh oh!

github-actions bot commented Jan 14, 2025

Uh oh!

coveralls commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15564521045

Details

💛 - Coveralls

Uh oh!

PeterTh left a comment

Choose a reason for hiding this comment

Uh oh!

PeterTh Jan 14, 2025

Choose a reason for hiding this comment

Uh oh!

fknorr left a comment

Choose a reason for hiding this comment

Uh oh!

fknorr Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

fknorr Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

fknorr Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

GagaLP left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Jan 14, 2025 •

edited

Loading