Skip to content

AMPlify ignores sequences containing stop codon indicator #17

@jasmezz

Description

@jasmezz

We noticed that AMPlify strictly sticks to the 20 standard amino acids in input sequences and ignores all others, as stated in its help message:

$AMPlify -h
[...]
AMPlify v2.0.0
------------------------------------------------------
Predict whether a sequence is AMP or not.
Input sequences should be in fasta format. 
Sequences should be shorter than 201 amino acids long, 
and should not contain amino acids other than the 20 standard ones. 

So far, so clear. But even if a stop codon is indicated with the commonly used asterisk *, the sequence is ignored. I believe this behaviour might not be desired, because several sequence annotation tools (e.g. Pyrodigal, Prodigal, Bakta, Prokka) append the * by default; for Prodigal, Prokka, and Bakta it is not even possible to deactivate the * as stop codon indicator. Thus, one cannot simply use the output from such annotation tools as input for AMPlify without first removing all *.

My feature request is thus, to have AMPlify accept sequences with stop codon indicator and remove the asterisk internally if necessary.

Minimum reproducible example:

zcat amplify-failed-genes.faa.gz > amplify-failed-genes.faa
AMPlify -s amplify-failed-genes.faa

I'll link another issue where this behaviour was observed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomersquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions