Conversation
dbf0901 to
7b13888
Compare
|
Check-perf-impact results: (e103131cc2f75d73207c09cd7ef74e9c)
Relative execution time per category: (mean of relative medians)
|
Pull Request Test Coverage Report for Build 15564521045Details
💛 - Coveralls |
| #define CELERITY_DETAIL_UTILS_CAT_2(a, b) a##b | ||
| #define CELERITY_DETAIL_UTILS_CAT(a, b) CELERITY_DETAIL_UTILS_CAT_2(a, b) | ||
|
|
||
| #define CELERITY_DETAIL_UTILS_NON_COPYABLE(classname) \ |
fknorr
left a comment
There was a problem hiding this comment.
Like the new visualization!
test/dag_benchmarks.cc
Outdated
|
|
||
| void initialize() { | ||
| tdag_benchmark_context::initialize(); | ||
| create_all_commands(); |
There was a problem hiding this comment.
This doesn't do anything since m_tasks is empty, right?
Also, what is it supposed to do, pre-allocate m_command_batches? Maybe also add a comment.
| void initialize() { tm.generate_epoch_task(celerity::detail::epoch_action::init); } | ||
|
|
||
| void prepare() { m_tasks.reserve(m_command_groups.size()); } | ||
|
|
||
| void execute() { create_all_tasks(); } | ||
|
|
||
| void finalize() { |
There was a problem hiding this comment.
Maybe have a comment about what the purpose of these four functions is (I would naively expect all setup to happen in the ctor, and the benchmarked code to be in a single member function).
| void finalize() { | ||
| m_tasks.clear(); | ||
| tdag_benchmark_context::finalize(); | ||
| create_all_commands(); |
There was a problem hiding this comment.
Why do commands need to be created in what appears to be a teardown function?
GagaLP
left a comment
There was a problem hiding this comment.
Looks like a great way to unify benchmark execution, nicely done!
- Group benchmarks by category
- Left-align legend, one entry per line
- Don't create images for categories without changes ("No Data")
- Support newly added and removed benchmarks
This overhauls the DAG benchmarks to address some long standing issues: - Instead of measuring the combined time taken to generate a graph and all the graphs that come before it (i.e., TDAG -> CDAG -> IDAG), only measure the generation of the graph being benchmarked. - Don't measure benchmark context creation / destruction. - Fix IDAG benchmarks creating host tasks instead of device tasks (i.e., number of devices and p2p copy support had no effect). - Fix scheduler benchmarks comparing against single-threaded TDAG + CDAG generation, instead of all three graphs. - Remove outdated "throttled submission" scheduler benchmarks.
7b13888 to
7441bdb
Compare





















This overhauls the DAG benchmarks to address some long standing issues, as well as to future-proof them for loop templates:
Due to these changes, we should expect CDAG benchmark times to go down, because they no longer include TDAG generation. IDAG benchmarks no longer include TDAG / CDAG, but due to the switch from
host_tasktoparallel_for, this is a tossup (benchmarks with lots of communication, such as the (strangely named)chain topologynow take a lot longer).For the scheduler benchmarks I've decided to include the TDAG generation time in the main thread, which is overlapped with CDAG / IDAG generation in the scheduler thread, as this best reflects reality.
I've also made some improvements to our CI perf impact plotting script: