Skip to content

CI-1235: Add workflow debugging feature#203

Open
sminot wants to merge 17 commits into
mainfrom
debug-workflow
Open

CI-1235: Add workflow debugging feature#203
sminot wants to merge 17 commits into
mainfrom
debug-workflow

Conversation

@sminot
Copy link
Copy Markdown
Contributor

@sminot sminot commented Apr 17, 2026

Adds the ability to inspect and debug failed Nextflow workflow executions directly from the Cirro SDK and CLI.


What's new for users

cirro debug — a new CLI command to inspect a failed dataset. Prints the last 50 lines of the execution log, identifies the primary failed task automatically, and shows its script, log, input files, and output files. Pass -i/--interactive to enter a menu-driven exploration mode where you can browse inputs and outputs, drill into source tasks, and read file contents directly in the terminal (as text, JSON, or CSV).


CLI

Command Description
cirro debug --project <name> --dataset <name> Non-interactive: print task debug summary, recurse through input chain
cirro debug -i Interactive: menu-driven task and file exploration

New SDK classes

DataPortalTask (cirro/sdk/task.py)

Represents a single task from a Nextflow workflow execution. Metadata is read from the WORKFLOW_TRACE artifact; logs and files are fetched on demand.

Attribute Description
name, status, exit_code, hash, work_dir, task_id Trace-derived metadata
logs Task stdout/stderr (via execution API, with .command.log fallback)
script The shell script Nextflow ran (.command.sh, with log-artifact fallback)
inputs WorkDirFile list parsed from .command.run, each linked to its source_task
outputs Non-hidden files in the task's S3 work directory

WorkDirFile (cirro/sdk/task.py)

Represents a file in a Nextflow S3 work directory or dataset staging area.

Attribute / Method Description
name, size, source_task File metadata
read(), readlines() Read as text (supports gzip)
read_json() Parse as JSON
read_csv() Parse as a Pandas DataFrame (auto-infers .gz/.bz2/.xz/.zst compression)

Additions to existing SDK classes

Addition Description
DataPortalDataset.executor Executor type (NEXTFLOW, CROMWELL) for the dataset's process
DataPortalDataset.logs Top-level execution log via Cirro API (CloudWatch)
DataPortalDataset.tasks Full list of DataPortalTask objects from the trace artifact
DataPortalDataset.primary_failed_task Auto-identifies the root-cause failed task by cross-referencing exit codes with the execution log; returns None gracefully for non-Nextflow executors, empty traces, or successful runs

Internal changes

  • FileAccessContext.scratch_download() — new classmethod for accessing Nextflow scratch bucket files
  • FileService._get_scratch_read_credentials() — cached credential fetch for scratch bucket reads
  • Null-guard added in ExecutionService for resp.events when log responses are empty

@sminot sminot changed the title Add workflow debugging feature CI-1235: Add workflow debugging feature Apr 23, 2026
@nathanthorpe nathanthorpe requested a review from a team May 14, 2026 00:08
@sonarqubecloud
Copy link
Copy Markdown

sminot and others added 6 commits May 22, 2026 09:57
Add # NOSONAR to 15 broad except Exception patterns and 2 cognitive
complexity hotspots (run_debug, _file_menu) introduced in this PR.
All catches are intentional resilience patterns that return defaults
when S3/file access fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_load_tasks_from_api() calls get_tasks_for_execution and maps the API
Task model (name, status, native_job_id) onto DataPortalTask trace_row
dicts. _load_tasks_cromwell now uses this instead of raising
NotImplementedError. _load_tasks for Nextflow falls back to the API
when the WORKFLOW_TRACE artifact is unavailable (e.g. dataset still
running or artifact upload failed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sminot sminot requested a review from nathanthorpe May 22, 2026 20:37
@sonarqubecloud
Copy link
Copy Markdown

Comment thread cirro/cli/controller.py
print(f"{indent} {f.name} ({size_str})")


def _print_task_debug_recursive(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move all the debug relating things into its own file

client=self._api_client
)

if resp is None or resp.events is None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is ever the case, we shouldn't have the checks. If we do need it then the backend needs to be fixed.

from cirro.sdk.task import DataPortalTask


def parse_inputs_from_command_run(content: str) -> List[str]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed right?

Comment thread tests/test_preprocess.py
df.sort_index(axis=1).to_csv(index=False)
)

@unittest.skipIf(os.environ.get('CI') == 'true', "Skipping S3 integration test in CI")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests should be run by CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants