Skip to content

Conversation

@mwiebe
Copy link
Contributor

@mwiebe mwiebe commented Mar 3, 2025

What was the problem/requirement? (What/Why)

The TASK_CHUNKING RFC has been implemented and merged into openjd-model-for-python. #134 enabled minimal support, but does not support the adaptive chunk runtime option from the RFC.

In order to enable adaptive chunking, a larger refactor was necessary to move from pre-queuing all actions to instead determine them dynamically. I added more usability improvements to the scope of this refactor.

What was the solution? (How)

  • Add an --extensions option that can select which Open Job Description extensions to enable. By default all extensions that are implemented are enabled.
  • Validate that task parameters provided from the CLI are within the parameter space specified by the job template.
  • Modify LocalSession from a "deferred" style to "immediate." Previously it queued up all the session actions, then ran them. To be adaptive, it needs to run some session actions and use their timings to change the chunk size, and that's not compatible with pre-determining them all.
  • Add the ability to run multiple steps in the same session when they include step environments. It exits any step environments for a step before proceeding to the next one.
  • Change the default timestamps to be relative to session start, to make local debugging easier. Add an option to control the formatting between relative, local, and utc.
  • Modify __main__.py to be more idiomatic by removing the 'if __name__ == "__main__"' and moving the main() function out.
  • Create an openjd.cli.main function to call with the CLI arguments.
  • Modify a number of tests to use a more end-to-end CLI main function approach.

What is the impact of this change?

The CLI command supports chunking more fully. It also has better error messages, more informative log messages, has the ability to run a whole job by skipping the --step option, and can run multiple steps in the same session without the previous limit.

How was this change tested?

Added tests for the new functionality. Changed many tests to be more end-to-end instead of testing implementation details based on argparse.

Was this change documented?

Docstrings, CLI help output for new options.

Is this a breaking change?

BREAKING CHANGE: The logging output has changed to use relative
timestamps by default, and print more messages about the job and
steps that are running.

Does this change impact security?

  • Does the change need to be threat modeled? For example, does it create or modify files/directories that must only be readable by the process owner?
    • If so, then please label this pull request with the "security" label. We'll work with you to analyze the threats.

No, it doesn't change the threat modeling.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mwiebe mwiebe force-pushed the adaptive-chunking-and-more branch 3 times, most recently from 1f0f6a3 to 4a2fbaa Compare March 3, 2025 23:53
@mwiebe mwiebe marked this pull request as ready for review March 3, 2025 23:56
@mwiebe mwiebe requested a review from a team as a code owner March 3, 2025 23:56
crowecawcaw
crowecawcaw previously approved these changes Mar 4, 2025
* Add an --extensions option that can select which Open Job Description
  extensions to enable. By default all extensions that are implemented
  are enabled.
* Validate that task parameters provided from the CLI are within the
  parameter space specified by the job template.
* Modify LocalSession from a "deferred" style to "immediate." Previously
  it queued up all the session actions, then ran them. To be adaptive,
  it needs to run some session actions and use their timings to change
  the chunk size, and that's not compatible with pre-determining them
  all.
* Add the ability to run multiple steps in the same session when they
  include step environments. It exits any step environments for a
  step before proceeding to the next one.
* Change the default timestamps to be relative to session start, to make
  local debugging easier. Add an option to control the formatting
  between relative, local, and utc.
* Modify __main__.py to be more idiomatic by removing the 'if __name__
  == "__main__"' and moving the main() function out.
* Create an openjd.cli.main function to call with the CLI arguments.
* Modify a number of tests to use a more end-to-end CLI main function
  approach.
* Bump openjd-model and openjd-sessions dependency versions to include
  the required changes.

BREAKING CHANGE: The logging output has changed to use relative
   timestamps by default, and print more messages about the job and
   steps that are running.

Signed-off-by: Mark Wiebe <399551+mwiebe@users.noreply.github.com>
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 4, 2025

Copy link
Contributor

@miabatta miabatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks so much.


def test_openjd_run_on_chunked_job_adaptive_chunking(capsys):
# Test that running chunked_job.yaml with adaptive chunking and the TargetRuntime cranked really high
# resutls in two chunks, the first being one task, and the second being the remainder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling :)

Approving anyway since this is a big change, I'm sure there's more stuff to fix/improve at the same time if we ever do a docs pass of this repo

@mwiebe mwiebe merged commit 664d008 into OpenJobDescription:mainline Mar 7, 2025
19 checks passed
@mwiebe mwiebe deleted the adaptive-chunking-and-more branch March 7, 2025 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants