-
Notifications
You must be signed in to change notification settings - Fork 139
Add central keycloak OIDC #1859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @jezekra1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the authentication and authorization mechanisms by integrating Keycloak as the central identity provider. The changes involve deploying Keycloak as part of the application's infrastructure, updating both the backend server and frontend UI to interact with Keycloak for user authentication and role management, and migrating user role data out of the application's database. This move aims to centralize identity management, enhance security, and provide a more robust and scalable authentication solution. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
08f54fc to
a26dca7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Keycloak integration for authentication, significantly refactoring the OIDC and basic authentication mechanisms across the agentstack-server and agentstack-ui components. It includes new Helm charts for deploying Keycloak, provisioning scripts, and updates to various Python and TypeScript files to support the new authentication flow. The changes also involve removing user role management from the database, shifting it to Keycloak, and updating e2e tests to reflect these authentication changes. Overall, the changes are substantial and aim to centralize user management and authentication through Keycloak.
I am having trouble creating individual review comments. Click here to see my feedback.
apps/agentstack-cli/src/agentstack_cli/commands/platform/base_driver.py (119-126)
Hardcoding secrets like encryptionKey, uiClientSecret, serverClientSecret, and adminPassword directly in the values dictionary is a security risk. These should be managed through Helm secrets or environment variables, especially adminPassword which is highly sensitive. Consider using valueFrom with secretKeyRef for these values, similar to how they are handled in the Helm templates.
apps/agentstack-ui/.env.example (27-32)
Hardcoding OIDC_ENABLED=true, OIDC_PROVIDER_NAME=Keycloak, OIDC_PROVIDER_ID=keycloak, OIDC_PROVIDER_CLIENT_ID=agentstack-ui, OIDC_PROVIDER_CLIENT_SECRET=agentstack-ui-secret, and OIDC_PROVIDER_ISSUER=http://localhost:8336/realms/agentstack in .env.example as default values is problematic. The .env.example file should ideally show examples of configuration, not necessarily active defaults, especially for sensitive values like client secrets. It's better to leave these blank or provide placeholders with clear instructions for users to fill them in, emphasizing that OIDC_ENABLED should be false by default unless explicitly configured.
helm/values.yaml (131-134)
The keycloakIssuerUrl is introduced, with a warning about insecure transport for internal Keycloak. This is a critical configuration. Ensure that the documentation clearly explains the implications of using http://keycloak internally versus a public https URL for production, and the necessary steps to secure it (e.g., mTLS, public exposure with TLS).
apps/agentstack-server/config.yaml (1-5)
This config.yaml file seems to contain hardcoded authentication details (adminPassword: "neco"). If this is a temporary or development-specific configuration, it should be clearly documented or moved to a more appropriate location (e.g., a .env.development file) to prevent accidental deployment with sensitive information. For production, such values should always be managed via environment variables or secrets.
apps/agentstack-server/src/agentstack_server/api/auth/auth.py (336-369)
The validate_basic_auth function uses the Resource Owner Password Credentials Grant (ROPC). While convenient for certain scenarios, ROPC is generally discouraged in modern OAuth 2.0/OIDC flows due to security concerns, as it requires the client to handle the user's credentials directly. It's often preferable to use more secure flows like Authorization Code Grant with PKCE. If ROPC is strictly necessary, ensure that its use is limited to highly trusted first-party clients and that the risks are understood and mitigated.
helm/templates/ui/secret.yaml (14-21)
The logic for generating agentstackUiClientSecret ensures that a secret is generated if not explicitly provided or found in an existing secret. Similar to the server client secret, for production environments, it's crucial to ensure this secret is explicitly set and managed securely rather than relying on auto-generation.
apps/agentstack-server/src/agentstack_server/configuration.py (94-98)
Hardcoding name, issuer, external_issuer, client_id, and client_secret directly in the OidcConfiguration class as default values can lead to inflexibility and potential security issues if these values are sensitive or need to change per environment. It's generally better to load these from environment variables or a configuration management system, allowing for dynamic configuration without code changes. While pydantic-settings handles environment variables, providing sensitive defaults directly in code is not ideal.
helm/templates/keycloak/secret.yaml (22-38)
The db-password for Keycloak is conditionally set based on useDedicatedDatabase. When useDedicatedDatabase is false, it uses include "agentstack.databasePassword" .. This implies Keycloak shares the database with the main application. Ensure that the shared database password management is robust and that Keycloak's access to this shared database is appropriately permissioned to prevent unauthorized access or privilege escalation.
helm/templates/keycloak/secret.yaml (10-21)
The logic for generating admin-password for Keycloak ensures that a password is set either from .Values.keycloak.auth.adminPassword or auto-generated if not found in an existing secret. While auto-generation is convenient, for production deployments, it's highly recommended to explicitly provide and securely manage the Keycloak admin password rather than relying on auto-generation, to prevent potential security risks.
apps/agentstack-server/src/agentstack_server/infrastructure/persistence/migrations/alembic/versions/109a6afdf170_.py (31)
The downgrade function raises a NotImplementedError. While understandable given the shift in user role management to Keycloak, this means there's no clear path to revert this migration. In a production environment, this could be problematic if a rollback is ever required. Consider if there's any way to provide a graceful downgrade path, even if it involves data loss warnings or manual steps.
helm/templates/config/secret.yaml (62-69)
The logic for generating agentstackServerClientSecret ensures that a secret is generated if not explicitly provided or found in an existing secret. This is good for initial deployments. However, for production environments, it's crucial to ensure that this secret is explicitly set and managed securely (e.g., via a secrets management system) rather than relying on auto-generation, to maintain consistent and secure client credentials.
helm/templates/_helpers.tpl (76-122)
The agentstack.keycloak.dbEnvVars helper is well-structured for managing Keycloak database environment variables. However, the KC_DB_PASSWORD is retrieved from keycloak-secret. Ensure that this secret is properly created and populated with the correct database password, especially when useDedicatedDatabase is false and it relies on the shared database password, to prevent connection failures.
apps/agentstack-server/src/agentstack_server/configuration.py (118-119)
The condition self.issuer.scheme != "http" or self.issuer.host != "keycloak" implies that insecure transport is only allowed for a specific internal Keycloak instance. This is a good security measure. However, it might be more robust to check for self.issuer.is_secure (if AnyUrl supports it or similar logic) and explicitly list allowed insecure hosts rather than relying on a single hardcoded host, especially if the internal Keycloak hostname could change.
apps/agentstack-ui/src/app/(auth)/auth.ts (30-44)
The createOIDCProvider function now includes specific Keycloak endpoint URLs (/protocol/openid-connect/auth, /protocol/openid-connect/token, /protocol/openid-connect/userinfo, /protocol/openid-connect/certs) when external_issuer is present. While this works for Keycloak, it makes the createOIDCProvider function less generic. If other OIDC providers are to be supported in the future, this logic might need to be more dynamic or configurable, possibly by relying on the .well-known/openid-configuration endpoint for discovery.
apps/agentstack-ui/src/app/(auth)/auth.ts (161-163)
The signOut event handler directly accesses process.env.OIDC_PROVIDER_ISSUER, process.env.OIDC_PROVIDER_CLIENT_ID, and process.env.OIDC_PROVIDER_CLIENT_SECRET. While these are available server-side, it's generally safer to pass configuration values through the runtimeConfig or providers object rather than directly accessing process.env within callbacks, especially if these values could vary or be sensitive.
apps/agentstack-ui/src/app/(auth)/types.ts (16)
Adding external_issuer: z.string().optional() to baseProviderConfigSchema is a good step for Keycloak integration. However, ensure that the UI components and logic correctly handle the presence or absence of external_issuer when constructing URLs or displaying information, to avoid unexpected behavior or broken links.
apps/agentstack-cli/src/agentstack_cli/commands/platform/init.py (88)
This print statement appears to be for debugging purposes. Please remove it before merging to avoid unnecessary output in production environments.
apps/agentstack-server/tasks.toml (173-180)
The agentstack-server:dev:reload task is a good addition for development. However, it forces helm:build and agentstack-cli:build:copy-helm-chart every time. While helm:build might be necessary, agentstack-cli:build:copy-helm-chart might not always be needed if the chart hasn't changed. Consider if these can be made conditional or if there's a faster way to reload without rebuilding everything if not necessary.
helm/templates/deployment.yaml (272-273)
The AUTH__OIDC__EXTERNAL_ISSUER value is conditionally set. If keycloakIssuerUrl starts with http://keycloak, it defaults to http://localhost:8336/realms/agentstack. This localhost URL might not be accessible from all contexts (e.g., if the UI is not running on the same host as the server or if network policies restrict localhost access). Ensure this localhost URL is appropriate for the intended deployment scenario, or consider making it configurable.
helm/templates/keycloak/provision-job.yaml (56-62)
The Keycloak provisioning job creates or updates the agentstack-server client. It uses kcadm.sh get clients to check for existence and then either create or update. This is a robust approach. However, ensure that the SERVER_CLIENT_SECRET is correctly passed and handled securely, as it's a critical credential for the server to interact with Keycloak.
helm/templates/keycloak/provision-job.yaml (65-98)
The provisioning logic for the agentstack-ui client, including client scope and audience mapper creation, is quite extensive. This ensures proper OIDC configuration for the UI. However, the UI_URL is used in the audience mapper configuration. Ensure that UI_URL is correctly set and reflects the actual external URL of the UI, as misconfiguration could lead to token validation issues or security vulnerabilities.
apps/agentstack-server/src/agentstack_server/configuration.py (130)
Changing the default for BasicAuthConfiguration.enabled to True means basic authentication will be active by default. Ensure this is the desired behavior for all deployment scenarios, as it could have security implications if not properly managed (e.g., default credentials being active).
apps/agentstack-server/template.env (34-36)
The commented-out lines for AUTH__BASIC__ADMIN_PASSWORD and AUTH__JWT_SECRET_KEY are removed. While removing commented-out code is generally good, ensure that these environment variables are now correctly configured elsewhere or are no longer needed, especially AUTH__JWT_SECRET_KEY which is critical for internal JWTs. If they are still relevant for local development, they should be present and correctly configured.
helm/templates/ui/deployment.yaml (83)
The NEXTAUTH_SECRET is now retrieved from agentstack-ui-secret. This is a good practice for managing secrets. Ensure that this secret is properly generated and stored securely, as it's critical for NextAuth's session management.
helm/templates/ui/deployment.yaml (93-96)
The OIDC_PROVIDER_CLIENT_SECRET for the UI is now retrieved from agentstack-ui-secret. This is a good practice. However, if the UI is a public client (as suggested by publicClient=false in keycloak/provision-job.yaml but clientAuthenticatorType=client-secret implies confidential), the client secret might not be strictly necessary or should be handled with extreme care to prevent exposure.
helm/templates/ui/deployment.yaml (97-100)
The OIDC_PROVIDER_ISSUER and OIDC_PROVIDER_EXTERNAL_ISSUER are configured for the UI. Similar to the server deployment, the EXTERNAL_ISSUER might default to localhost if keycloakIssuerUrl is internal. Ensure that the UI can correctly resolve and access this EXTERNAL_ISSUER URL from its deployment context, as misconfiguration can lead to authentication failures.
helm/templates/ui/providers.yaml (12)
The providers.json content is now directly {{ .Values.auth.oidc.providers | toJson }}. Given the refactoring in apps/agentstack-ui/src/app/(auth)/auth.ts to read individual environment variables for a single OIDC provider, this providers.json secret might no longer be used by the UI application. Verify if this secret is still necessary or if it can be removed to simplify the deployment.
apps/agentstack-server/src/agentstack_server/api/dependencies.py (89)
Logging e directly at critical level without sanitization might expose sensitive information (e.g., stack traces, token details) in logs. Consider logging a more generic error message or redacting sensitive parts of the exception to prevent information leakage.
apps/agentstack-cli/src/agentstack_cli/commands/platform/base_driver.py (183-184)
The istio.install call is commented out. If Istio integration is no longer needed or is being deferred, please remove the commented-out code and the corresponding import statement (import agentstack_cli.commands.platform.istio) from the file to keep the codebase clean and avoid confusion.
helm/values.yaml (136-139)
The nextauth_trust_host and nextauth_url are important for NextAuth configuration. The nextauth_url is now http://localhost:8334. Ensure this localhost URL is correctly handled in deployment environments where the UI might not be accessed directly via localhost, or if it's exposed through an ingress. It might need to be dynamically set based on the ingress URL.
helm/values.yaml (140)
The providers list under auth.oidc has been removed. This aligns with the new single-provider configuration in the server and UI. Ensure that any existing deployments or configurations that relied on a list of OIDC providers are updated to reflect this change.
policy.yaml (1-54)
The new policy.yaml defines a NetworkPolicy for restricted egress access. This is a good security practice to limit outbound connections from pods. Ensure that the podSelector (app: test-client) and the egress rules (DNS, agentstack-server, external internet with exceptions) are precisely configured to meet the application's communication requirements while minimizing the attack surface. Any misconfiguration could lead to connectivity issues.
tasks.toml (185)
The __OLD_ADMIN_PASSWORD export has been removed. This aligns with the shift in authentication mechanisms. Ensure that any scripts or processes that previously relied on this environment variable for admin password access are updated or no longer require it, to prevent unexpected authentication failures.
a26dca7 to
d086859
Compare
9341001 to
53d6529
Compare
53d6529 to
38e820c
Compare
Signed-off-by: Radek Ježek <radek.jezek@ibm.com>
38e820c to
dc7331b
Compare
|
/gemini review |
|
Recreated as a new PR because gemini refuses to re-review: #1879 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive integration with Keycloak for OIDC authentication, centralizing user and role management. This is a significant architectural shift that improves security and scalability by offloading authentication concerns to a dedicated identity provider. The changes span across the server, CLI, UI, and Helm charts, reflecting a consistent approach to this new authentication mechanism. The removal of internal user role management and the introduction of Keycloak provisioning via Helm hooks are well-executed. Several hardcoded credentials have been replaced with more dynamic or configurable options, which is a positive security improvement. The addition of new e2e tests specifically for the authentication flow is also highly valuable.
| host=os.getenv("HOST", "127.0.0.1"), | ||
| port=int(os.getenv("PORT", 8000)), | ||
| configure_telemetry=True, | ||
| self_registration_client_factory=lambda: PlatformClient(auth=("admin", "admin")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding credentials like ("admin", "admin") for PlatformClient is a security risk. These credentials should ideally be fetched from secure configuration, environment variables, or a secrets management system, especially in production environments. Consider making these configurable.
| self_registration_client_factory=lambda: PlatformClient(auth=("admin", "admin")), | |
| self_registration_client_factory=lambda: PlatformClient(auth=(os.getenv("PLATFORM_ADMIN_USERNAME", "admin"), os.getenv("PLATFORM_ADMIN_PASSWORD", "admin"))), |
| "auth": {"adminPassword": "admin"}, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding adminPassword: "admin" for Keycloak in the Helm values is a security concern. This default password should be dynamically generated or retrieved from a secure source, especially since it's used for an admin account. The Helm chart already has logic to generate random secrets for Keycloak admin passwords, which should be leveraged here.
| "auth": {"adminPassword": "admin"}, | |
| }, | |
| "auth": {"adminPassword": "{{ .Values.keycloak.auth.adminPassword | default (randAlphaNum 32) }}"}, |
| if self.insecure_transport: | ||
| if self.issuer.scheme != "http" or self.issuer.host != "keycloak": | ||
| raise ValueError("Insecure transport is only allowed for internal keycloak!") | ||
|
|
||
| os.environ["AUTHLIB_INSECURE_TRANSPORT"] = "1" | ||
| logger.warning( | ||
| "Oauth Authentication is enabled with insecure transport! " | ||
| + "Using mTLS (using istio service mesh) or exposing keycloak publicly is highly recommended!" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting os.environ["AUTHLIB_INSECURE_TRANSPORT"] = "1" when insecure_transport is enabled is a significant security risk. While a warning is logged, it's crucial to emphasize that this should only be used for internal, trusted Keycloak instances (as indicated by the issuer.host == "keycloak" check) and never in a production environment exposed to the public internet without proper TLS termination. Misuse could lead to token interception.
| # TODO: Implement keycloak integration | ||
| # app.add_typer( | ||
| # agentstack_cli.commands.user.app, | ||
| # name="user", | ||
| # no_args_is_help=True, | ||
| # help="Manage users. [Admin only]", | ||
| # ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user management functionality has been commented out with a TODO for Keycloak integration. While the shift to Keycloak is understood, leaving core features commented out with a generic TODO can lead to them being forgotten or deprioritized. It would be beneficial to create a dedicated issue to track the re-implementation of user management via Keycloak, including a clear plan and timeline, or to remove the commented-out code if it's not intended to be re-enabled in the near future.
| async def validate_basic_auth(username: str, password: str, configuration: Configuration) -> str: | ||
| """ | ||
| Validate basic auth credentials against the OIDC provider using the Resource Owner Password Credentials Grant. | ||
| """ | ||
| provider = configuration.auth.oidc.provider | ||
| issuer = await discover_issuer(provider) | ||
| token_endpoint = issuer.get("token_endpoint") | ||
| if not isinstance(token_endpoint, str): | ||
| raise IssuerDiscoveryError(f"Provider {provider.issuer} does not support token endpoint discovery") | ||
|
|
||
| async with httpx.AsyncClient() as client: | ||
| try: | ||
| resp = await client.post( | ||
| token_endpoint, | ||
| data={ | ||
| "grant_type": "password", | ||
| "username": username, | ||
| "password": password, | ||
| "client_id": provider.client_id, | ||
| "client_secret": provider.client_secret.get_secret_value(), | ||
| "scope": "openid email profile", # Request standard scopes | ||
| }, | ||
| ) | ||
| access_token: str = cast(str, resp.raise_for_status().json()["access_token"]) | ||
| return access_token | ||
| except httpx.HTTPStatusError as e: | ||
| if e.response.status_code == 401: | ||
| # Invalid credentials | ||
| raise ValueError("Invalid credentials") from e | ||
| logger.warning(f"Basic auth validation failed for user {username}: {e}") | ||
| raise ValueError(f"Basic auth validation failed: {e}") from e | ||
| except Exception as e: | ||
| logger.warning(f"Basic auth validation unexpected error for user {username}: {e}") | ||
| raise ValueError(f"Basic auth validation failed: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validate_basic_auth function implements the Resource Owner Password Credentials Grant. While this grant type can be useful for certain trusted clients or legacy integrations, it is generally discouraged for new applications due to security concerns, as it involves sending user credentials directly to the authorization server. Ensure that the use of this grant type is justified by specific requirements and that appropriate security measures are in place to mitigate the associated risks.
| if configuration.auth.basic.enabled and basic_auth: | ||
| try: # get access_token and continue with bearer_auth | ||
| access_token = await validate_basic_auth(basic_auth.username, basic_auth.password, configuration) | ||
| bearer_auth = HTTPAuthorizationCredentials(scheme="Bearer", credentials=access_token) | ||
| except Exception as e: | ||
| raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The authorized_user function now converts basic authentication into a bearer token and then proceeds with the bearer token authentication flow. While this approach consolidates the authentication logic, the overall structure of the authorized_user function has become quite complex with nested if statements and try-except blocks. Consider refactoring this function to improve readability and maintainability, perhaps by extracting parts of the logic into smaller, more focused helper functions.
Summary
Linked Issues
Documentation
If this PR adds new feature or changes existing. Make sure documentation is adjusted accordingly. If the docs is not needed, please explain why.