Skip to content

fix(gcp): surface GCP Secret Manager errors and validate DB credentials#793

Open
lilyz-ai wants to merge 3 commits intomainfrom
lilyzhu/fix-gcp-db-secret-error-handling
Open

fix(gcp): surface GCP Secret Manager errors and validate DB credentials#793
lilyz-ai wants to merge 3 commits intomainfrom
lilyzhu/fix-gcp-db-secret-error-handling

Conversation

@lilyz-ai
Copy link
Copy Markdown
Collaborator

@lilyz-ai lilyz-ai commented Mar 31, 2026

Summary

  • get_gcp_key_file was silently returning {} on any Secret Manager error, causing str(None)'None' to be used as the DB port
  • This produced a cryptic invalid literal for int() with base 10: 'None' from SQLAlchemy (chained from ModuleNotFoundError: No module named 'plugins')
  • Fixed by removing the silent try/except in get_key_file so errors surface directly
  • Added validation of required credential fields with a clear error message before attempting to connect

Test plan

  • Deploy to GCP env with valid DB_SECRET_NAME — should connect as before
  • Deploy with invalid/missing DB_SECRET_NAME — should now get a clear GCP error instead of int('None')
  • Deploy with a secret missing required fields — should get ValueError: GCP DB secret '...' is missing required fields: [...]

🤖 Generated with Claude Code

Greptile Summary

This PR fixes two related issues that caused a cryptic invalid literal for int() with base 10: 'None' error during GCP database connection:

  • Root cause fix: get_key_file in secrets.py silently caught all exceptions from GCP Secret Manager and returned {}, causing downstream code to use str(None) as the DB port. The silent try/except is removed so errors propagate directly.
  • Credential validation: base.py now validates that all required fields (username, password, clusterHost, port, dbname) are present in the GCP secret before constructing the connection URL, raising a clear ValueError if any are missing.
  • Narrower exception handling: Four except ModuleNotFoundError blocks in dependencies.py are refined to only swallow the error when the plugins module itself is missing (checked via e.name.startswith("plugins")), re-raising unrelated import errors that were previously hidden. Warning logs are also added.

Note: The AWS secrets module (core/aws/secrets.py) and the AWS path in base.py (lines 125–135) still have the same silent-error and missing-validation patterns that this PR fixes for GCP. Consider applying the same fix there as a follow-up.

Confidence Score: 4/5

  • This PR is safe to merge — it improves error handling and surfaces previously hidden failures.
  • The changes are well-targeted: removing a silent catch-all exception handler, adding credential validation, and narrowing ModuleNotFoundError handling. The only minor concern is the ambiguous operator precedence in the host assignment ternary, which works correctly but could confuse future readers. No logical bugs found.
  • model-engine/model_engine_server/db/base.py — the host assignment ternary at line 90 relies on implicit operator precedence and would benefit from explicit parentheses for clarity.

Important Files Changed

Filename Overview
model-engine/model_engine_server/core/gcp/secrets.py Removed silent try/except that swallowed GCP Secret Manager errors and returned an empty dict. Errors now propagate directly, which is the correct fix for the root cause of the int('None') bug.
model-engine/model_engine_server/db/base.py Added credential validation for GCP secrets with a clear error message and switched from .get() to direct key access. The host assignment ternary is correct but has ambiguous formatting due to operator precedence.
model-engine/model_engine_server/api/dependencies.py Refined ModuleNotFoundError handling across 4 catch sites to only swallow errors from the plugins module itself, re-raising unrelated import errors. Also adds warning logs and removes unnecessary finally: pass blocks.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_engine_url called] --> B{cloud_provider == gcp?}
    B -->|No| Z[Other providers]
    B -->|Yes| C{DB_SECRET_NAME set?}
    C -->|No| D[Use env vars: DB_USER, DB_HOST, etc.]
    C -->|Yes| E[get_gcp_key_file from Secret Manager]
    E -->|Exception| F[❌ Error propagates directly\n— was silently returning empty dict]
    E -->|Success| G[Validate required fields:\nusername, password, clusterHost, port, dbname]
    G -->|Missing fields| H[❌ ValueError with clear message\n— was producing int of None]
    G -->|All present| I{read_only?}
    I -->|Yes| J[Use clusterHostRo or fallback to clusterHost]
    I -->|No| K[Use clusterHost]
    J --> L[Build postgresql:// URL]
    K --> L
    D --> L
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: model-engine/model_engine_server/db/base.py
Line: 90-94

Comment:
**Ambiguous operator precedence in ternary**

The expression `creds.get("clusterHostRo") or creds["clusterHost"] if read_only else creds["clusterHost"]` relies on Python's precedence rules where `or` binds tighter than the ternary `if/else`. It happens to work correctly — when `read_only` is `True`, it tries `clusterHostRo` and falls back to `clusterHost`; when `False`, it uses `clusterHost`. However, the formatting across multiple lines makes this easy to misread. Adding explicit parentheses would make the intent clear:

```suggestion
                host = (
                    (creds.get("clusterHostRo") or creds["clusterHost"])
                    if read_only
                    else creds["clusterHost"]
                )
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(plugins): log warning when falling b..." | Re-trigger Greptile

Previously get_key_file silently returned {} on any error, causing
str(None) → 'None' to be used as the port, leading to a cryptic
'invalid literal for int() with base 10: None' error from SQLAlchemy.

- Remove the try/except in get_key_file so GCP errors surface directly
- Validate required fields after fetching creds and raise with a clear message
- Use dict key access instead of .get() for required fields
… missing

The bare 'except ModuleNotFoundError' was catching any ModuleNotFoundError
raised inside the plugins module (e.g. a missing dependency), silently
falling back to defaults and masking the real error.

Check e.name to only fall back when 'plugins' itself is not installed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant