Skip to content

Fix: classify in-frame complex variants (MNP + INDEL) as INFRAME_COMPLEX_VARIANT#1181

Open
vinitjain2005 wants to merge 1 commit intomalariagen:masterfrom
vinitjain2005:fix/inframe-complex-variant-classification
Open

Fix: classify in-frame complex variants (MNP + INDEL) as INFRAME_COMPLEX_VARIANT#1181
vinitjain2005 wants to merge 1 commit intomalariagen:masterfrom
vinitjain2005:fix/inframe-complex-variant-classification

Conversation

@vinitjain2005
Copy link
Copy Markdown

Summary

This PR replaces the placeholder effect for in-frame complex variants (MNP + INDEL) in _get_within_cds_effect with a proper classification.

Changes

  • Replaced:
    effect="TODO in-frame complex variation (MNP + INDEL)", impact="UNKNOWN"

  • With:
    effect="INFRAME_COMPLEX_VARIANT", impact="MODERATE"

Why this change?

  • Improves completeness of variant annotation
  • Provides meaningful and standard effect classification
  • Ensures consistency with other in-frame variant types

Impact

In-frame complex variants are now correctly labeled with a MODERATE impact instead of UNKNOWN, improving downstream analysis and interpretability.

Testing

  • Existing tests pass successfully
  • (Optional) Add test cases for in-frame complex variants in future

Closes #1180

Github: @vinitjain2005

@vinitjain2005
Copy link
Copy Markdown
Author

@tristanpwdennis @jonbrenas please check and merge

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @vinitjain2005, can you explain your reasoning?

@vinitjain2005
Copy link
Copy Markdown
Author

Hi @jonbrenas, thanks for asking — happy to walk through it.

Looking at the surrounding code, the branch just above uses effect="CODON_CHANGE", impact="MODERATE" for in-frame MNP-only variants. An MNP + INDEL combination is a superset of that case — it changes codons and shifts amino acid length — so MODERATE impact felt like the natural parallel.

For the effect label, I went with INFRAME_COMPLEX_VARIANT as a descriptive term for this mixed case. That said, I want to flag that I'm open to a different label if the project has a preferred convention — for example, would CODON_CHANGE_PLUS_INDEL or simply reusing CODON_CHANGE be more consistent with how effects are named elsewhere in the codebase?

I also want to be transparent: the TODO branch may have been left intentionally if there's ambiguity in how the net frame status is guaranteed for this case. If that's a concern, I'm happy to dig into the logic further or add a test case to validate the behavior. Let me know what would be most helpful!

Github: @vinitjain2005

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @vinitjain2005,

I think there are 2 different things to take into consideration here:

  • The part of the code that you have modified is never reached because the only code using veff.py is for SNPs, so len(ref) == 1 and len(alt) == 1 is always true. Future-proofing the code is a good idea, though.
  • The situation (as the name indicates) that you are trying to describe is complex so it is not going to correspond strictly to a single effect. The actual change would have to mimic the code above when either len(ref) == 1 or len(alt) == 1 (and the other > 1), except it gets harder to decide if there is a codon change if you have to look at more than one site.

All that to say that addressing this 'TODO' requires a bit of biological thinking and is not going to have any impact, so I wouldn't put a lot of effort (as I obviously haven't given that it is still 'TODO') in it.

@vinitjain2005
Copy link
Copy Markdown
Author

Thanks for the detailed explanation @jonbrenas — that's really helpful context.

I understand now that the branch isn't reachable with the current SNP-only usage, and that properly resolving the TODO would need genuine biological reasoning across multiple sites rather than just a label change. I'll leave it as-is.

I'll focus my attention on issues that affect active code paths instead. If you have any suggestions for areas of the codebase that would benefit from contribution — especially anything relevant to variant handling or data quality — I'd really appreciate the pointer!

kunal-10-cloud added a commit to kunal-10-cloud/malariagen-data-python that referenced this pull request Mar 23, 2026
Replace TODO with INFRAME_COMPLEX_VARIANT/MODERATE placeholder.
Code path unreachable in production (SNP-only), proper classification
requires frame-shift analysis per PR malariagen#1181 discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance variant effect annotation: classify in-frame complex variants properly in veff.py

2 participants