Skip to content

feat: Taxonomy service#6134

Merged
maverbiest merged 41 commits intomainfrom
taxonomy-service
Mar 26, 2026
Merged

feat: Taxonomy service#6134
maverbiest merged 41 commits intomainfrom
taxonomy-service

Conversation

@maverbiest
Copy link
Copy Markdown
Contributor

@maverbiest maverbiest commented Mar 13, 2026

Adding a taxonomy service that exposes NCBI taxonomy data via a REST API.
We will be able to use this for scientific host name validation in prepro and other taxonomy-related functionality.

API

Endpoint Description
GET / Health check
GET /taxa?scientific_name=<string> Endpoint used to validate user input. If the input is valid, the details of all associated taxa are returned. Currently only supports validation of scientific names (case-insensitive) through the scientific_name query parameter.
GET /taxa/{tax_id}?find_common_name=<boolean> Endpoint to use once a valid taxon ID is found. Looks up a taxon by NCBI taxon ID and returns it. If find_common_name=true, returns the nearest ancestor (including self) that has a common name.

Manual testing:

On a preview

I started a preview off this branch, which came up healthy. I could see the taxonomy-service come up on argoCD.
I could query the service from inside the cluster using kubectl, e.g., from a preprocessing pod:

➜  loculus git:(taxonomy-service) kubectl exec -n prev-taxonomy-service -it loculus-preprocessing-cchf-v1-0-654647bdc-snkbw -- /opt/conda/bin/python -c "import requests; print(requests.get('http://loculus-taxonomy-service:5000/taxa', params={'scientific_name': 'Aedes aegypti'}).json())"
{'tax_id': 7159, 'common_name': 'yellow fever mosquito', 'scientific_name': 'Aedes aegypti', 'parent_id': 53541, 'depth': 28}

The request was also logged in the argoCD pod of the taxonomy-service:

INFO:     10.42.1.116:34544 - "GET /taxa?name=Aedes+aegypti HTTP/1.1" 200 OK

Locally (thank you Claude)

  1. Set up the cluster (from kubernetes/):
../deploy.py cluster --dev
../deploy.py helm --dev --branch taxonomy-service
  1. Build the image for linux/amd64 (Docker Desktop on Apple Silicon requires --provenance=false to avoid a multi-platform manifest):
docker buildx build --platform linux/amd64 \
  --provenance=false \
  --output type=docker \
  -t ghcr.io/loculus-project/taxonomy-service:taxonomy-service \
  ./taxonomy/taxonomy_service
  1. Import the image into k3d:
k3d image import ghcr.io/loculus-project/taxonomy-service:taxonomy-service -c testCluster
  1. Set imagePullPolicy to Never so Kubernetes uses the local image instead of trying to pull from ghcr.io:
kubectl patch deployment loculus-taxonomy-service -p '{"spec":{"template":{"spec":{"containers":[{"name":"taxonomy-service","imagePullPolicy":"Never"}]}}}}'
kubectl delete pod -l app=loculus,component=loculus-taxonomy-service
  1. Forward the port:
kubectl port-forward service/loculus-taxonomy-service 5000:5000

After these steps, I can:

➜  loculus git:(taxonomy-service) curl http://localhost:5000/
{"message":"Taxonomy service is running"}
➜  loculus git:(taxonomy-service) curl "http://localhost:5000/taxa?scientific_name=Homo+sapiens"
{"tax_id":9606,"common_name":"human","scientific_name":"Homo sapiens","parent_id":9605,"depth":31}%

From within the cluster, the service is available at http://loculus-taxonomy-service:5000/taxa:

➜  loculus git:(taxonomy-service) kubectl exec -it loculus-taxonomy-service-5cd69867f7-rhtzt -- curl http://loculus-taxonomy-service:5000/taxa\?scientific_name\=Aedes+aegypti
{"tax_id":7159,"common_name":"yellow fever mosquito","scientific_name":"Aedes aegypti","parent_id":53541,"depth":28}%

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: https://taxonomy-service.loculus.org

@maverbiest maverbiest marked this pull request as ready for review March 20, 2026 13:02
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Mar 20, 2026

Claude finished @maverbiest's task in 3m 5s —— View job


Code Review

  • Gather context and understand request
  • Read taxonomy service source files
  • Read Kubernetes deployment files
  • Read CI/CD workflow files
  • Read test files
  • Analyze and write review

Good addition overall — clean FastAPI structure, well-documented, and the test setup with dependency injection is solid. A few things worth addressing before merge:

Bug

  • config.py:14config/defaults.yaml is opened as a relative path (relative to CWD), which works inside Docker (WORKDIR /opt/app) but will fail when running locally from any other directory. See inline comment for a fix. Fix this →

Code quality

  • api.py:27–37fetch_by_common_name is a stub that raises NotImplementedError and is never called. Remove it or mark it clearly as planned future work.
  • api.py:54 — The max(..., key=lambda x: x["depth"]) tiebreak for duplicate scientific names needs a comment explaining the intent.

Security

  • Dockerfile:11 — The taxonomy DB is downloaded from a public URL with no integrity verification (no checksum). A corrupted or replaced file would be consumed silently on every new build.

Performance

  • api.py:16 — A new SQLite connection is opened per request. For a read-only service, a single persistent connection at startup would be more efficient.
  • api.py:86fetch_common_name walks the tree with one SQL query per level (up to 30+ queries for deep taxa). A single recursive CTE would handle this in one round trip (noted as a follow-up, not a blocker).

Kubernetes

  • taxonomy-deployment.yaml — No liveness/readiness probes configured, despite the health-check endpoint already existing.
  • Dockerfile:24 — No CMD/ENTRYPOINT; the container will error if started without arguments. A default command would make it self-describing.

Testing

  • test_api.py:65dependency_overrides is set in setUp but never cleared in tearDown. This can leak into other test files.
  • test_api.py:97 — Missing test for the direct-hit case (taxon already has a common name, no traversal needed).

@maverbiest maverbiest added the preview Triggers a deployment to argocd label Mar 20, 2026
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/config.py
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py
Comment thread taxonomy/taxonomy_service/Dockerfile Outdated
Comment thread taxonomy/taxonomy_service/Dockerfile
Comment thread taxonomy/taxonomy_service/test/test_api.py
Comment thread taxonomy/taxonomy_service/test/test_api.py Outdated
Comment thread taxonomy/taxonomy_service/test/test_api.py
Comment thread kubernetes/loculus/templates/taxonomy-deployment.yaml
@anna-parker anna-parker self-requested a review March 23, 2026 09:13
Comment thread taxonomy/taxonomy_service/README.md Outdated
Comment thread taxonomy/taxonomy_service/README.md Outdated
Comment thread taxonomy/taxonomy_service/README.md Outdated
Comment thread taxonomy/taxonomy_service/README.md
Comment thread taxonomy/taxonomy_service/ncbi_taxonomy_digest.sha256 Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py Outdated
Comment thread taxonomy/taxonomy_service/src/taxonomy_service/api.py
Comment thread .github/workflows/taxonomy-service-image.yaml
Comment thread taxonomy/taxonomy_service/test/test_api.py Outdated
Copy link
Copy Markdown
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm - I would probably still remove the dbSha256 to reduce the config size but not blocking :-)

@maverbiest maverbiest merged commit f96bf5e into main Mar 26, 2026
46 of 47 checks passed
@maverbiest maverbiest deleted the taxonomy-service branch March 26, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants