Skip to content

feat: declarative Cloud Run configs for backend services#5873

Open
beastoin wants to merge 1 commit intomainfrom
feat/declarative-cloudrun-configs
Open

feat: declarative Cloud Run configs for backend services#5873
beastoin wants to merge 1 commit intomainfrom
feat/declarative-cloudrun-configs

Conversation

@beastoin
Copy link
Collaborator

Summary

  • Add per-service declarative YAML configs for all 3 Cloud Run backend services (backend, backend-sync, backend-integration), exported from live prod and dev environments
  • Follow the same naming convention as backend/charts/ for GKE: backend/cloudrun/{service}/{dev|prod}_{service}.yaml
  • Update deploy workflow to use gcloud run services replace instead of deploy-cloudrun@v2 action — env vars and secrets are now git-tracked and PR-reviewable
  • All secrets use secretKeyRef (GCP Secret Manager) — no plain-text secrets committed
  • Includes FAIR_USE env vars for Sync: fair-use tracking with lock-on-exhaustion and soft cap gates #5863
  • Replaced hardcoded dev IP with vad.omiapi.com DNS name

Structure

backend/cloudrun/
  backend/
    prod_backend.yaml
    dev_backend.yaml
  backend-sync/
    prod_backend-sync.yaml
    dev_backend-sync.yaml
  backend-integration/
    prod_backend-integration.yaml
    dev_backend-integration.yaml

How it works

  1. Edit env vars / secrets / scaling in the YAML files (commit + merge)
  2. Run "Deploy Backend to Cloud RUN" workflow — it stamps the latest image and runs gcloud run services replace
  3. The workflow auto-selects dev_ or prod_ prefix based on the environment input

cc @thainguyensunya

🤖 Generated with Claude Code

Add per-service YAML configs exported from prod and dev Cloud Run,
mirroring the backend/charts/ convention for GKE. Update deploy
workflow to use `gcloud run services replace` instead of the
deploy-cloudrun@v2 action, making env vars and secrets git-tracked.

Structure:
  backend/cloudrun/{service}/{dev|prod}_{service}.yaml

- All secrets use secretKeyRef (Secret Manager), no plain-text secrets
- Replaced hardcoded IP with vad.omiapi.com DNS name
- Added FAIR_USE env vars for PR #5863
- Converted LANGSMITH_API_KEY from plain text to secretKeyRef

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR migrates all three Cloud Run backend services (backend, backend-sync, backend-integration) from imperative deploy-cloudrun@v2 workflow actions to declarative Knative YAML configs managed via gcloud run services replace, following the same pattern as the existing GKE Helm charts. It also adds FAIR_USE env vars (#5863) across all services and replaces a hardcoded dev IP with the vad.omiapi.com DNS name.

Key changes and findings:

  • Workflow reliability risk: The sed -i command used to stamp the image tag into each YAML file exits 0 even when no substitution occurs (e.g., due to YAML formatting drift). No post-substitution verification step is present, so a silent failure would cause gcloud run services replace to deploy the stale image reference from the YAML file without failing the job.
  • GCP project numbers exposed in public repo: All six YAML files commit serviceAccountName fields that embed raw numeric GCP project IDs (1031333818730 for dev, 208440318997 for prod) in a public repository, which is an information disclosure concern.
  • :latest tag usage: The workflow builds, pushes, and deploys only a :latest tag, making deployments non-deterministic and preventing Git-based rollbacks without a rebuild.
  • GOOGLE_CLOUD_PROJECT inconsistency: Dev configs (dev_backend.yaml, dev_backend-sync.yaml) set GOOGLE_CLOUD_PROJECT: based-hardware (the prod project) while all other dev-specific settings (image registry, VPC, service account) target based-hardware-dev — likely intentional but worth confirming.
  • All secrets are properly referenced via secretKeyRef (GCP Secret Manager) — no plain-text secrets are committed, which is the correct approach.
  • Prod services (backend-sync, backend-integration) correctly use service-specific secret names (e.g., OPENAI_API_KEY_BACKEND_SYNC) for API key isolation.

Confidence Score: 3/5

  • Safe to deploy to dev; the silent sed failure and exposed project numbers should be addressed before treating prod as fully hardened.
  • The overall design is sound — declarative configs are a clear improvement over imperative flags, and secrets are all properly handled via Secret Manager. However, two P1 concerns are present: (1) the sed image-stamp step has no failure guard and could silently deploy a stale or untagged image with no workflow failure, and (2) numeric GCP project IDs are being committed to a public repo for the first time via the service account name fields. Neither is a hard blocker for the dev environment, but together they warrant a fix before this pattern is considered production-ready.
  • .github/workflows/gcp_backend.yml (sed validation), backend/cloudrun/backend/dev_backend.yaml and all five other YAML configs (service account name exposure).

Important Files Changed

Filename Overview
.github/workflows/gcp_backend.yml Replaces deploy-cloudrun@v2 action with declarative gcloud run services replace; a silent sed failure (no post-substitution check) could cause a wrong image to be deployed without failing the workflow.
backend/cloudrun/backend/dev_backend.yaml New declarative Cloud Run config for the dev backend service; exposes numeric GCP project ID via service account email, and GOOGLE_CLOUD_PROJECT is set to the prod project (based-hardware) rather than based-hardware-dev.
backend/cloudrun/backend/prod_backend.yaml New declarative Cloud Run config for prod backend; all secrets use secretKeyRef, scaling and resource limits look appropriate for production, but service account email exposes the production GCP project number.
backend/cloudrun/backend-sync/dev_backend-sync.yaml New declarative Cloud Run config for dev backend-sync; GOOGLE_CLOUD_PROJECT is unexpectedly set to prod project based-hardware; GCP project number exposed in service account name.
backend/cloudrun/backend-sync/prod_backend-sync.yaml New declarative Cloud Run config for prod backend-sync; uses separate secret names (OPENAI_API_KEY_BACKEND_SYNC, OPENROUTER_API_KEY_BACKEND_SYNC) for rate-limit isolation — good practice; service account email exposes prod project number.
backend/cloudrun/backend-integration/dev_backend-integration.yaml New declarative Cloud Run config for dev backend-integration; secrets and env vars look consistent with other dev configs; GCP project number exposed in service account name.
backend/cloudrun/backend-integration/prod_backend-integration.yaml New declarative Cloud Run config for prod backend-integration; uses service-specific secret names (OPENAI_API_KEY_BACKEND_INTEGRATION, OPENROUTER_API_KEY_BACKEND_INTEGRATION); service account email exposes prod project number.

Sequence Diagram

sequenceDiagram
    actor Dev as Developer
    participant GHA as GitHub Actions
    participant GCR as Google Container Registry
    participant FS as Filesystem (checkout)
    participant CR as Cloud Run

    Dev->>GHA: Trigger workflow (environment=dev|prod, branch)
    GHA->>GHA: Validate environment input
    GHA->>GHA: Set env prefix (dev/prod)
    GHA->>GHA: Build Docker image
    GHA->>GCR: Push image as :latest
    loop For each service (backend, backend-sync, backend-integration)
        GHA->>FS: sed -i replace image line in {prefix}_{service}.yaml
        Note over GHA,FS: ⚠️ No verification that sed succeeded
        GHA->>CR: gcloud run services replace {config}.yaml
        CR-->>GHA: Deploy complete
    end
    GHA->>GHA: kubectl rollout restart backend-listen (GKE)
Loading

Last reviewed commit: "feat: add declarativ..."

# Update image in declarative config
IMAGE="gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}:latest"
CONFIG="backend/cloudrun/${{ env.SERVICE }}/${{ steps.env-prefix.outputs.prefix }}_${{ env.SERVICE }}.yaml"
sed -i "s|image:.*|image: ${IMAGE}|" "$CONFIG"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent sed failure with no verification

The sed -i "s|image:.*|image: ${IMAGE}|" command exits 0 even if no substitution was made (e.g., if the image: line is indented differently, commented out, or the file path is wrong). In that case, gcloud run services replace would be called with the stale image reference already present in the YAML file (gcr.io/based-hardware-dev/backend or gcr.io/based-hardware/backend without a tag), potentially deploying a different image than intended with no workflow failure.

Add a verification step after the sed:

sed -i "s|image:.*|image: ${IMAGE}|" "$CONFIG"
grep -q "${IMAGE}" "$CONFIG" || { echo "ERROR: image line not updated in $CONFIG"; exit 1; }
gcloud run services replace "$CONFIG" \
  --region=${{ env.REGION }} --project=${{ vars.GCP_PROJECT_ID }}

This same issue affects the backend-sync deploy at line 95 and backend-integration deploy at line 114.

tcpSocket:
port: 8080
timeoutSeconds: 240
serviceAccountName: 1031333818730-compute@developer.gserviceaccount.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 GCP project numbers exposed in public repository

The serviceAccountName field embeds the raw GCP project number in the service account email. These numeric project IDs are committed in this public repository:

  • 1031333818730-compute@developer.gserviceaccount.com (dev) — appears in dev_backend.yaml:305, dev_backend-sync.yaml:193, and dev_backend-integration.yaml:222
  • 208440318997-compute@developer.gserviceaccount.com (prod) — appears in prod_backend.yaml:438, prod_backend-sync.yaml:223, and prod_backend-integration.yaml:333

GCP project numbers in a public repo allow adversaries to enumerate project resources, craft targeted phishing/social-engineering attacks, or abuse project-number-based APIs. Consider replacing these with a named service account alias (e.g., backend-cloudrun@based-hardware.iam.gserviceaccount.com) that does not leak the project number.

image: gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}
run: |
# Update image in declarative config
IMAGE="gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}:latest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 :latest tag makes deployments non-deterministic

Both the Docker build step (line 69) and all three deploy steps push/deploy the :latest tag. This means:

  1. There is no stable artifact tied to a specific commit — any concurrent build can overwrite :latest mid-deploy.
  2. Rollback from Git alone is not possible; you would need to rebuild the image.

Consider tagging the image with the Git SHA and using that tag in the deploy:

tags: |
  gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}:latest
  gcr.io/${{ vars.GCP_PROJECT_ID }}/${{ env.SERVICE }}:${{ github.sha }}

Then use ${{ github.sha }} for the IMAGE variable in the deploy steps.

Comment on lines +56 to +57
- name: GOOGLE_CLOUD_PROJECT
value: based-hardware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 GOOGLE_CLOUD_PROJECT points to prod project in dev config

Both dev_backend-sync.yaml:56 and dev_backend.yaml:79 set GOOGLE_CLOUD_PROJECT: based-hardware, which is the production GCP project, while the rest of the dev config (image registry gcr.io/based-hardware-dev/..., service account 1031333818730-compute@developer.gserviceaccount.com, VPC network omi-dev-vpc-1) all target the dev project based-hardware-dev.

If this is intentional (e.g., both environments share the same Firebase/Firestore project), a comment explaining the reason would help avoid future confusion. If unintentional, this should be changed to based-hardware-dev to match the rest of the dev environment configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant