-
Notifications
You must be signed in to change notification settings - Fork 353
Description
Bug Description
We are experiencing intermittent authentication failures when connecting to a Cloud SQL PostgreSQL Enterprise Plus instance using the Cloud SQL Proxy as a GKE sidecar with --auto-iam-authn.
The issue specifically appears when Managed Connection Pooling (MCP) is enabled. After a period of stability, multiple backend services simultaneously fail with the following error:
FATAL: Cloud SQL IAM service account authentication failed for user "..." (SQLSTATE 08P01)
Environment
- Proxy Version: 2.21
- Database: PostgreSQL Enterprise Plus (v14)
- Environment: GKE with Proxy Sidecar
- Authentication: IAM with --auto-iam-authn
- Feature: Managed Connection Pooling (MCP) Enabled
Observations
Token Expiration: Google Support suggests the root cause is IAM token expiration (approx. 1 hour). When MCP is active, pooled connections appear to hold onto expired tokens, leading to failures on subsequent queries.
Frequency: The issue recurs every 1–2 days.
Temporary Resolution: Disabling MCP resolves the SQLSTATE 08P01 error but leads to max_connections exhaustion due to the lack of pooling.
Mitigation Attempts: We were advised to upgrade to v2.21+ and aggressively recycle connections (every 1-10 minutes), which suggests the Proxy or MCP is not handling token refreshes for long-lived pooled connections as expected.
Expected Behavior
The Cloud SQL Proxy (or the MCP integration) should transparently refresh IAM credentials/tokens so that long-lived connections in a managed pool do not fail with authentication errors.
Questions for Maintainers
- Is there a known incompatibility between MCP and the Proxy's IAM refresh logic?
- Should --auto-iam-authn handle token rotation automatically for connections managed by the server-side MCP, or is client-side connection recycling (e.g., max_connection_lifetime) strictly required by the user?