Discussion - Managing task and data dependencies

Hello all! I am a Senior Production Analyst at the National Weather Service Central Operations in the United States. My team uses ecFlow to manage our operational suite for all our numerical models and directly related products. Thank you for maintaining this software so we can effectively and reliably deliver our products.

I have a design problem that we frequently run into. We need to deliver forecast products as large, long-running forecast models are in progress. Our current solution is to run a "manager job" that waits for a target file to arrive, then calls `ecflow_client --event` to trigger a job that performs postprocessing. This solution works, but due to our supercomputer scheduler configuration, the manager job reserves an entire compute node for a 1-core task. We would like to avoid using a manager job to conserve compute resources.

Similarly, some of our data arrives from external sources and our jobs try to wait for the file to arrive. We use time triggers in this case, but there is of course some variability in file arrival times.

Does ECMWF have a standard approach for managing data and task dependencies that is more efficient than our current approaches? I can appreciate that parts of this discussion may not be directly related to ecFlow itself, so I would be happy to take the discussion elsewhere if more appropriate. Thank you for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion - Managing task and data dependencies #304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discussion - Managing task and data dependencies #304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions