Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
44415ba
Add Matomo analytics integration with GDPR consent support (#341)
t0mdavid-m Feb 20, 2026
c65d503
Remove duplicate `address` key in `.streamlit/config.toml` (#346)
Copilot Mar 3, 2026
c81fee6
Fix integration test failures caused by sys.modules pollution and shu…
Copilot Mar 11, 2026
fb2fe67
Remove server address from bundled config.toml for Windows installer …
t0mdavid-m Mar 14, 2026
128da6d
reenable cross origin protection
t0mdavid-m Mar 14, 2026
42fc187
Add CLAUDE.md and Claude Code skills for MS webapp development (#357)
t0mdavid-m Mar 27, 2026
6a2bc03
Add Kubernetes manifests and CI/CD workflows for deployment (#347)
t0mdavid-m Apr 2, 2026
93170fa
Claude/kubernetes migration plan kq jw d (#358)
t0mdavid-m Apr 4, 2026
b185cf0
Claude/kubernetes migration plan kq jw d (#359)
t0mdavid-m Apr 4, 2026
8c03310
Claude/fix mzml files validation y zfla (#361)
t0mdavid-m Apr 20, 2026
4069e11
Fix contrib tag (#360)
t0mdavid-m Apr 20, 2026
eae014b
Add Kubernetes deployment docs and refactor Claude skills (#362)
t0mdavid-m Apr 20, 2026
12f8e0d
ci: add ghcr-cleanup workflow (scheduled disabled, dry-run default)
t0mdavid-m Apr 20, 2026
0633790
ci: scaffold build-and-test workflow with lint-manifests job
t0mdavid-m Apr 20, 2026
aa585c2
ci: add build job skeleton with matrix, buildx, ghcr login
t0mdavid-m Apr 20, 2026
7a2dc86
ci: add metadata extraction, build-push, and registry cache
t0mdavid-m Apr 20, 2026
e7f6c08
ci: add kind integration steps to build job
t0mdavid-m Apr 20, 2026
fca45ee
Merge pull request #363 from OpenMS/ci/unify-docker-workflows
t0mdavid-m Apr 20, 2026
6a2a7ae
ci: lowercase image name for OCI cache refs
t0mdavid-m Apr 20, 2026
d8d3d03
Merge pull request #364 from OpenMS/ci/fix-lowercase-cache-ref
t0mdavid-m Apr 20, 2026
ebef5df
ci: don't pass unprefixed local tag to buildx push
t0mdavid-m Apr 20, 2026
fa46191
Merge pull request #365 from OpenMS/ci/fix-local-tag-push-to-hub
t0mdavid-m Apr 20, 2026
dd32bd1
ci: delete old docker workflows now superseded by build-and-test
t0mdavid-m Apr 20, 2026
9ce9585
k8s: pin overlay image tag to main-full (new CI scheme)
t0mdavid-m Apr 20, 2026
93fb2a4
docs(skill): update k8s deploy skill for unified CI workflow
t0mdavid-m Apr 20, 2026
29d94b4
docs(k8s): update deployment doc for unified CI workflow
t0mdavid-m Apr 20, 2026
e1cc9d4
Merge pull request #366 from OpenMS/ci/cutover-old-workflows
t0mdavid-m Apr 20, 2026
1c183d0
ci: pin container-retention-policy to v3.0.1
t0mdavid-m Apr 20, 2026
bd9bf5e
Merge pull request #367 from OpenMS/ci/pin-retention-action-to-v3.0.1
t0mdavid-m Apr 20, 2026
859e481
fix(docker): stop cache-busting on GITHUB_TOKEN
t0mdavid-m Apr 21, 2026
8f25b59
Merge pull request #368 from OpenMS/ci/fix-dockerfile-token-cache-bust
t0mdavid-m Apr 21, 2026
f3c21a5
docs: fix typo (Gihub -> GitHub) in Dockerfile comments
t0mdavid-m Apr 21, 2026
cf35e62
Merge pull request #369 from OpenMS/ci/cache-smoke-test
t0mdavid-m Apr 21, 2026
bbc43f4
ci: enable scheduled GHCR cleanup (weekly Sun 03:00 UTC)
t0mdavid-m Apr 21, 2026
25ed32c
Merge pull request #370 from OpenMS/ci/enable-ghcr-cleanup-cron
t0mdavid-m Apr 21, 2026
072161a
k8s: serve template app on both .de and .org TLDs
t0mdavid-m Apr 21, 2026
d0a1fde
ci: integration-test both .de and .org hosts on nginx and traefik
t0mdavid-m Apr 21, 2026
b970ce7
ci: enable kind to bind workspace PVC and clean up port-forwards
t0mdavid-m Apr 21, 2026
9933465
skill(configure-k8s-deployment): document dual-host overlay edit
t0mdavid-m Apr 21, 2026
0e9ceca
skill(configure-k8s-deployment): fix markdown rendering and clarify n…
t0mdavid-m Apr 21, 2026
69e655f
docs(kubernetes-deployment): document dual-host serving
t0mdavid-m Apr 21, 2026
b28b1f1
docs(kubernetes-deployment): fix stale job count and missing kind pat…
t0mdavid-m Apr 21, 2026
e840358
ci: use nginx Ingress hostnames for nginx-job curl assertions
t0mdavid-m Apr 21, 2026
636da9d
Merge pull request #372 from OpenMS/feature/dual-host-k8s
t0mdavid-m Apr 21, 2026
b69a2e5
k8s: mount admin password from streamlit-secrets Secret
claude Apr 22, 2026
594a83a
fix errors
t0mdavid-m Apr 24, 2026
e5d8044
ci: bump pyopenms to 3.5.0 and pin python 3.10 to match Dockerfile
t0mdavid-m Apr 24, 2026
64dfed8
fix(view): use pyopenms 3.5 get_df API instead of unreleased to_df
t0mdavid-m Apr 24, 2026
6f9c692
Merge pull request #374 from OpenMS/fix_pyopenms_errors
t0mdavid-m Apr 24, 2026
d78d8b8
Merge remote-tracking branch 'origin/main' into claude/hide-demo-pass…
t0mdavid-m Apr 24, 2026
8c6f869
fix(k8s): mount streamlit-secrets as directory so optional: true works
t0mdavid-m Apr 24, 2026
2cb4813
docs(k8s): add streamlit-secrets example to template-app overlay
t0mdavid-m Apr 24, 2026
971cfdd
Merge pull request #373 from OpenMS/claude/hide-demo-password-uB77g
t0mdavid-m Apr 24, 2026
6c61365
k8s: two-tier scheduling via Kustomize components + LimitRange
claude Apr 24, 2026
11ff5cd
Merge branch 'main' into claude/parallel-webapp-memory-optimization-R…
t0mdavid-m Apr 24, 2026
5870c34
ci(k8s): label kind node to match the overlay's memory tier
claude Apr 24, 2026
8abb90a
Merge branch 'claude/parallel-webapp-memory-optimization-RoNnJ' of ht…
claude Apr 24, 2026
0bd2ccf
k8s: move streamlit-secrets.yaml.example into overlays/prod/
claude Apr 24, 2026
2f28ed9
ci(k8s): two-node kind cluster with both tier labels
claude Apr 24, 2026
43c300b
ci(k8s): clear control-plane NoSchedule taint in two-node kind config
claude Apr 24, 2026
64f43e2
Merge pull request #375 from OpenMS/claude/parallel-webapp-memory-opt…
t0mdavid-m Apr 24, 2026
4ab1288
k8s: store demo workspaces on the workspaces PVC
claude Apr 24, 2026
5bd5898
Merge pull request #376 from OpenMS/claude/demo-workspace-storage-k8s…
t0mdavid-m Apr 24, 2026
391ed16
k8s: ship streamlit-secrets by default, hide admin UI when empty
claude Apr 24, 2026
3387b9c
Merge pull request #377 from OpenMS/claude/fix-streamlit-secrets-61u1H
t0mdavid-m Apr 24, 2026
0cce837
ci: reuse built docker images across ingress tests
claude Apr 24, 2026
eeb8e3f
ci: run test-traefik against both image variants
claude Apr 25, 2026
c627256
ci: harden ingress-test wait/curl flow for slow simple deployments
claude Apr 26, 2026
5901c90
Rework configure-k8s-deployment skill as an interview
t0mdavid-m Apr 26, 2026
03eff39
Merge pull request #378 from OpenMS/claude/reuse-docker-images-ci-DNvEO
t0mdavid-m Apr 26, 2026
8e096ac
Merge pull request #379 from OpenMS/claude/review-k8s-skill-UtYEm
t0mdavid-m Apr 26, 2026
8d76292
k8s: drop cross-fork pod-affinity, rely on RWO PVC for co-location
claude Apr 27, 2026
92f1d2c
ci: derive slug + Traefik hosts from overlay so forks stay green
claude Apr 27, 2026
9f819a8
Merge pull request #380 from OpenMS/claude/fix-pod-affinity-labels-YvnlN
t0mdavid-m Apr 27, 2026
99b6663
Merge branch 'main' into claude/fix-k8s-deployment-ci-XI82S
t0mdavid-m Apr 27, 2026
36d2af7
Merge pull request #381 from OpenMS/claude/fix-k8s-deployment-ci-XI82S
t0mdavid-m Apr 27, 2026
08086fc
Merge remote-tracking branch 'template/main' into fix_volume_bind
t0mdavid-m Apr 27, 2026
b2f2740
refix ci
t0mdavid-m Apr 27, 2026
e83f62e
refix admin panel
t0mdavid-m Apr 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions .claude/skills/configure-k8s-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Before asking the user anything, read a small known set of files directly (do no
3. `k8s/base/kustomization.yaml`, `k8s/base/streamlit-deployment.yaml`, `k8s/base/rq-worker-deployment.yaml`, `k8s/base/workspace-pvc.yaml` — confirm the layout still matches the template:
- PVC `metadata.name` is `workspaces-pvc`.
- Deployments reference `image: openms-streamlit` (the placeholder Kustomize swaps).
- `streamlit-deployment.yaml` has `claimName: workspaces-pvc` and `volume-group: workspaces` (both as a pod label and as the pod-affinity `matchExpressions` value).
- `streamlit-deployment.yaml` has `claimName: workspaces-pvc`. (Co-location of the workspace-using pods is enforced by the shared RWO PVC mount, not by a pod-affinity rule.)
4. `.github/workflows/build-and-test.yml` — confirm which tags CI publishes (the OpenMS template publishes `<branch>-full`, `<branch>-simple`, `<tag>-full`, `<tag>-simple`, plus `latest` on `main`-full pushes).

If any of those files are missing, renamed, or significantly restructured, stop and ask the user how to proceed. Do not pattern-match the standard answers onto an unknown layout.
Expand Down Expand Up @@ -155,7 +155,7 @@ spec:
storage: <size> # Q6: e.g. 100Gi, 1Ti, 3Ti
```

Do **not** rename the PVC, the base `kustomization.yaml` resource list, the `claimName` in `streamlit-deployment.yaml`, or the `volume-group` pod-affinity label. Kustomize's `namePrefix` already gives the in-cluster PVC a unique per-fork name; renaming the base creates a 3-file cascade for no benefit.
Do **not** rename the PVC, the base `kustomization.yaml` resource list, or the `claimName` in `streamlit-deployment.yaml`. Kustomize's `namePrefix` already gives the in-cluster PVC a unique per-fork name; renaming the base creates a 3-file cascade for no benefit.

Operator caveat (mention in handoff, not your job to verify): in-place expansion of an *already-deployed* PVC requires the StorageClass to have `allowVolumeExpansion: true`. If the operator's `cinder-csi` class does not allow expansion, growing a live PVC requires recreation, not a manifest edit. Resizing on first deploy is unaffected.

Expand All @@ -164,7 +164,7 @@ Operator caveat (mention in handoff, not your job to verify): in-place expansion
After committing the edits, tell the user the next steps belong to a human operator (or CI) and are out of scope for you:

1. Open a PR with the overlay edits and have it reviewed.
2. Merge to `main`. CI (`build-and-test.yml`) rebuilds and pushes the image to GHCR with the tag from Q3.
2. Merge to `main`. CI (`build-and-test.yml`) rebuilds and pushes the image to GHCR with the tag from Q3. The kind integration jobs (`test-nginx`, `test-traefik`) auto-discover slug and Traefik hostnames from the overlay output, so no workflow edits are needed for fork-specific values.
3. Cluster operator runs `kubectl apply -k k8s/overlays/prod/` against the OpenMS cluster.
4. Operator verifies with `kubectl -n openms rollout status deployment/<slug>-streamlit` and a browser check on `https://<sub>.webapps.openms.de`.

Expand All @@ -185,4 +185,5 @@ After committing the edits, tell the user the next steps belong to a human opera
- [ ] Redis URL written in both Deployment patches (`streamlit` and `rq-worker`)
- [ ] Memory-tier component selected
- [ ] Storage size in `k8s/base/workspace-pvc.yaml` updated only if the user picked a non-default size; PVC name and `claimName` untouched
- [ ] `.github/workflows/build-and-test.yml` uses dynamic overlay discovery (no `template-app` / `template.webapps.openms.*` literals); patched in if the fork's workflow was on the old hardcoded shape
- [ ] Changes committed on a feature branch (no PR opened unless the user asked for one)
37 changes: 26 additions & 11 deletions .github/workflows/build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -158,19 +158,24 @@ jobs:
sleep "${i}0"
done

- name: Discover overlay identity
run: |
SLUG=$(yq '.commonLabels.app' k8s/overlays/prod/kustomization.yaml)
echo "SLUG=$SLUG" >> "$GITHUB_ENV"

- name: Wait for Redis to be ready
run: |
kubectl wait -n openms --for=condition=ready pod -l app=quantms-ddalfq,component=redis --timeout=60s
kubectl wait -n openms --for=condition=ready pod -l app=${SLUG},component=redis --timeout=60s

- name: Verify Redis Service is reachable
run: |
kubectl run redis-test -n openms --image=redis:7-alpine --rm -i --restart=Never -- redis-cli -h quantms-ddalfq-redis.openms.svc.cluster.local ping
kubectl run redis-test -n openms --image=redis:7-alpine --rm -i --restart=Never -- redis-cli -h ${SLUG}-redis.openms.svc.cluster.local ping

- name: Verify all deployments are available
run: |
kubectl wait -n openms --for=condition=available deployment -l app=quantms-ddalfq --timeout=180s || true
kubectl get pods -n openms -l app=quantms-ddalfq
kubectl get services -n openms -l app=quantms-ddalfq
kubectl wait -n openms --for=condition=available deployment -l app=${SLUG} --timeout=180s || true
kubectl get pods -n openms -l app=${SLUG}
kubectl get services -n openms -l app=${SLUG}

- name: Curl both hostnames via nginx ingress
run: |
Expand Down Expand Up @@ -245,29 +250,39 @@ jobs:
sleep "${i}0"
done

- name: Discover overlay identity
run: |
SLUG=$(yq '.commonLabels.app' k8s/overlays/prod/kustomization.yaml)
TRAEFIK_HOSTS=$(kubectl kustomize k8s/overlays/prod/ \
| yq 'select(.kind == "IngressRoute") | .spec.routes[0].match' \
| grep -oP "Host\(\`\K[^\`]+" | tr '\n' ' ')
echo "SLUG=$SLUG" >> "$GITHUB_ENV"
echo "TRAEFIK_HOSTS=$TRAEFIK_HOSTS" >> "$GITHUB_ENV"

- name: Wait for Redis to be ready
run: |
kubectl wait -n openms --for=condition=ready pod -l app=quantms-ddalfq,component=redis --timeout=60s
kubectl wait -n openms --for=condition=ready pod -l app=${SLUG},component=redis --timeout=60s

- name: Verify all deployments are available
run: |
kubectl wait -n openms --for=condition=available deployment -l app=quantms-ddalfq --timeout=180s || true
kubectl get pods -n openms -l app=quantms-ddalfq
kubectl get services -n openms -l app=quantms-ddalfq
kubectl wait -n openms --for=condition=available deployment -l app=${SLUG} --timeout=180s || true
kubectl get pods -n openms -l app=${SLUG}
kubectl get services -n openms -l app=${SLUG}

- name: Curl both hostnames via Traefik
run: |
kubectl -n traefik port-forward svc/traefik 8080:80 &
PF_PID=$!
trap 'kill "$PF_PID" 2>/dev/null || true' EXIT
FIRST_HOST=$(echo ${TRAEFIK_HOSTS} | awk '{print $1}')
for i in $(seq 1 30); do
sleep 2
if curl -fsSo /dev/null --max-time 2 http://127.0.0.1:8080/_stcore/health -H "Host: opendda.webapps.openms.de"; then
if curl -fsSo /dev/null --max-time 2 http://127.0.0.1:8080/_stcore/health -H "Host: ${FIRST_HOST}"; then
break
fi
echo "port-forward / app not ready yet, retry $i"
done
for host in opendda.webapps.openms.de opendda.webapps.openms.org; do
for host in ${TRAEFIK_HOSTS}; do
curl -fsS --resolve "$host:8080:127.0.0.1" "http://$host:8080/_stcore/health"
echo ""
echo "$host -> 200 OK"
Expand Down
14 changes: 8 additions & 6 deletions docs/kubernetes-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ Every production OpenMS webapp (quantms-web, umetaflow, FLASHApp) deploys via th
│ Streamlit Deployment │
│ (N replicas, default 2) │
│ │
│ [pod affinity: co-locate with │
rq-worker + cleanup-cronjob pods]
│ [co-located with rq-worker +
│ cleanup pods via shared RWO PVC]
└────────┬────────────────────────┬───────┘
│ REDIS_URL │
│ │ /workspaces-...
Expand Down Expand Up @@ -76,9 +76,11 @@ Every production OpenMS webapp (quantms-web, umetaflow, FLASHApp) deploys via th
| Traefik IngressRoute | External HTTP entrypoint with sticky sessions | — | — |
| nginx Ingress | Alternative HTTP entrypoint used by the CI kind cluster | — | — |

### Pod affinity
### Pod co-location via the RWO PVC

All workspace-using pods (Streamlit, RQ worker, Cleanup) carry a `volume-group: workspaces` label and a `requiredDuringSchedulingIgnoredDuringExecution` pod-affinity rule keyed on `kubernetes.io/hostname`. This forces every workspace-using pod onto the same node, so they can share the `ReadWriteOnce` PVC.
All workspace-using pods (Streamlit, RQ worker, Cleanup) of a given fork mount the same `<slug>-workspaces-pvc` (`ReadWriteOnce`, `cinder-csi`). Once the first pod schedules, the volume is attached to that node and the kube-scheduler's `VolumeBinding` plugin pins every subsequent pod that mounts the same PVC to the same node. NodeSelector (`openms.de/memory-tier`) picks which set of nodes the fork is eligible for; the RWO mount picks the specific node within that set.

There is no pod-affinity rule. Forks are isolated from each other — co-location applies within a fork (because they share a PVC), not across forks (each fork has its own PVC).

Co-location is a placement constraint, not a replica cap. The Streamlit deployment can scale to N replicas — they all land on the same node alongside the worker.

Expand Down Expand Up @@ -130,14 +132,14 @@ Main Streamlit Deployment. Key fields:
- Mounts the workspace PVC at `/workspaces-streamlit-template`
- Mounts `settings-overrides.json` from the ConfigMap as a `subPath`
- Readiness and liveness probes hit `/_stcore/health`
- Pod affinity: `volume-group: workspaces`
- Co-located with the RQ worker (and any cleanup Job) on the node the RWO `workspaces-pvc` is attached to
- `seed-demos` initContainer merges image-shipped demos into `.demos/` on the PVC (see [Demo workspaces](#demo-workspaces))

### `streamlit-service.yaml`
ClusterIP Service exposing Streamlit on port 8501.

### `rq-worker-deployment.yaml`
RQ worker Deployment (1 replica). Runs `rq worker openms-workflows --url $REDIS_URL`. Shares the workspace PVC via the same `volume-group: workspaces` affinity rule.
RQ worker Deployment (1 replica). Runs `rq worker openms-workflows --url $REDIS_URL`. Shares the workspace PVC, so it co-locates onto the same node as the Streamlit pods via the RWO mount.

### `cleanup-cronjob.yaml`
CronJob that runs `python clean-up-workspaces.py` nightly at 03:00 UTC. Uses `concurrencyPolicy: Forbid`, retains 3 successful and 3 failed jobs. Shares the workspace PVC.
Expand Down
11 changes: 0 additions & 11 deletions k8s/base/cleanup-cronjob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,8 @@ spec:
metadata:
labels:
component: cleanup
volume-group: workspaces
spec:
restartPolicy: OnFailure
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: volume-group
operator: In
values:
- workspaces
topologyKey: kubernetes.io/hostname
containers:
- name: cleanup
image: openms-streamlit
Expand Down
11 changes: 0 additions & 11 deletions k8s/base/rq-worker-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,7 @@ spec:
metadata:
labels:
component: rq-worker
volume-group: workspaces
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: volume-group
operator: In
values:
- workspaces
topologyKey: kubernetes.io/hostname
containers:
- name: rq-worker
image: openms-streamlit
Expand Down
11 changes: 0 additions & 11 deletions k8s/base/streamlit-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,7 @@ spec:
metadata:
labels:
component: streamlit
volume-group: workspaces
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: volume-group
operator: In
values:
- workspaces
topologyKey: kubernetes.io/hostname
initContainers:
- name: seed-demos
image: openms-streamlit
Expand Down
Loading