Skip to content

Conversation

@mlin
Copy link
Contributor

@mlin mlin commented Jul 18, 2019

The hisat2 tasks stream output into samtools to avoid having to materialize a giant text SAM file on the scratch disk. This is a good idea but it's implemented in a slightly risky way, using an output command substitution like hisat2 ... -S >(samtools view -o output.bam ...). In this construct samtools is spawned as a background process, and bash does not wait for it before proceeding to the next command or exiting at the end of the script. Furthermore according to this Q&A it does not even provide a way to wait for it!

This creates a race condition where the next step is liable to start reading a partial BAM file, including the runtime system potentially outputting a truncated file (cf. chanzuckerberg/miniwdl#211).

Here we replace the output command substitutions with a less-elegant but hopefully reliable construct, which allows us to explicitly wait for samtools before proceeding.

@mlin mlin requested a review from mckinsel July 18, 2019 21:43
@pullapprove pullapprove bot requested a review from gbggrant July 18, 2019 22:22
Copy link

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you provide a test (or a set of example inputs) that can be run to verify that this works?

@barkasn
Copy link
Contributor

barkasn commented Jul 19, 2019

Given the wide impact to multiple pipelines of changes to this step it would be great if we had a test specifically for this step

@kbergin
Copy link
Contributor

kbergin commented Nov 7, 2019

Hi @mlin - Any chance you're planning to add a test to this PR? I suspect it's also a bit out of date with the codebase at this point. Let us know if you don't have the right access to add a test.

@mlin
Copy link
Contributor Author

mlin commented Nov 8, 2019

Hi @kbergin, all,

The new lines of code proposed here run unconditionally and thus, should be exercised in any existing tests which call on the HISAT2 tasks. They're a mechanical restructuring of the existing commands, eliminating the concurrency race condition described above. I think it would be disproportionately difficult to design a test that specifically provokes the race condition.

@kbergin
Copy link
Contributor

kbergin commented Nov 9, 2019

That seems reasonable to me, Mike, but I was mostly just bumping the PR to help get it resolved. @barkasn @gbggrant opinions here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants