AMPlify ignores sequences containing stop codon indicator

We noticed that AMPlify strictly sticks to the 20 standard amino acids in input sequences and ignores all others, as stated in its help message:

```
$AMPlify -h
[...]
AMPlify v2.0.0
------------------------------------------------------
Predict whether a sequence is AMP or not.
Input sequences should be in fasta format. 
Sequences should be shorter than 201 amino acids long, 
and should not contain amino acids other than the 20 standard ones. 
```

So far, so clear. But even if a stop codon is indicated with the commonly used asterisk `*`, the sequence is ignored. I believe this behaviour might not be desired, because several sequence annotation tools (e.g. Pyrodigal, Prodigal, Bakta, Prokka) append the `*` by default; for Prodigal, Prokka, and Bakta it is not even possible to deactivate the `*` as stop codon indicator. Thus, one cannot simply use the output from such annotation tools as input for AMPlify without first removing all `*`.

My feature request is thus, to have AMPlify accept sequences with stop codon indicator and remove the asterisk internally if necessary.

**Minimum reproducible example:**
- Download this FASTA file: [amplify-failed-genes.faa.gz](https://github.com/user-attachments/files/18619296/amplify-failed-genes.faa.gz) (contains two sequences: one too long and one with `*`)

```
zcat amplify-failed-genes.faa.gz > amplify-failed-genes.faa
AMPlify -s amplify-failed-genes.faa
```

I'll link another issue where this behaviour was observed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMPlify ignores sequences containing stop codon indicator #17

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AMPlify ignores sequences containing stop codon indicator #17

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions