Skip to content

Conversation

@jezekra1
Copy link
Collaborator

Summary

Linked Issues

Documentation

  • No Docs Needed:

If this PR adds new feature or changes existing. Make sure documentation is adjusted accordingly. If the docs is not needed, please explain why.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jezekra1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the authentication and authorization mechanisms by integrating Keycloak as the central identity provider. The changes involve deploying Keycloak as part of the application's infrastructure, updating both the backend server and frontend UI to interact with Keycloak for user authentication and role management, and migrating user role data out of the application's database. This move aims to centralize identity management, enhance security, and provide a more robust and scalable authentication solution.

Highlights

  • Keycloak Integration: Introduced a Proof of Concept (PoC) for Keycloak integration, deploying Keycloak via Helm with custom login themes and automated provisioning of the 'agentstack' realm, roles (admin, developer), and clients (server, UI).
  • Centralized User Management: Migrated user role management from the local database to Keycloak. This involved removing 'role' and 'role_updated_at' fields from user models and database schemas, and updating server logic to derive user roles from Keycloak realm roles.
  • Refactored Authentication Flow: The 'agentstack-server' now validates OIDC tokens and basic authentication credentials directly against Keycloak. The 'authorized_user' dependency was updated to prioritize basic auth via Keycloak and streamline bearer token handling.
  • Streamlined UI Authentication: The 'agentstack-ui' has been updated to use a flattened OIDC configuration from environment variables, simplifying setup. It also includes federated logout functionality to revoke Keycloak sessions upon user sign-out.
  • Enhanced E2E Testing: New and updated end-to-end tests leverage Keycloak for user creation and authentication scenarios, ensuring the new authentication flow works as expected across various components.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Keycloak integration for authentication, significantly refactoring the OIDC and basic authentication mechanisms across the agentstack-server and agentstack-ui components. It includes new Helm charts for deploying Keycloak, provisioning scripts, and updates to various Python and TypeScript files to support the new authentication flow. The changes also involve removing user role management from the database, shifting it to Keycloak, and updating e2e tests to reflect these authentication changes. Overall, the changes are substantial and aim to centralize user management and authentication through Keycloak.

I am having trouble creating individual review comments. Click here to see my feedback.

apps/agentstack-cli/src/agentstack_cli/commands/platform/base_driver.py (119-126)

critical

Hardcoding secrets like encryptionKey, uiClientSecret, serverClientSecret, and adminPassword directly in the values dictionary is a security risk. These should be managed through Helm secrets or environment variables, especially adminPassword which is highly sensitive. Consider using valueFrom with secretKeyRef for these values, similar to how they are handled in the Helm templates.

apps/agentstack-ui/.env.example (27-32)

high

Hardcoding OIDC_ENABLED=true, OIDC_PROVIDER_NAME=Keycloak, OIDC_PROVIDER_ID=keycloak, OIDC_PROVIDER_CLIENT_ID=agentstack-ui, OIDC_PROVIDER_CLIENT_SECRET=agentstack-ui-secret, and OIDC_PROVIDER_ISSUER=http://localhost:8336/realms/agentstack in .env.example as default values is problematic. The .env.example file should ideally show examples of configuration, not necessarily active defaults, especially for sensitive values like client secrets. It's better to leave these blank or provide placeholders with clear instructions for users to fill them in, emphasizing that OIDC_ENABLED should be false by default unless explicitly configured.

helm/values.yaml (131-134)

high

The keycloakIssuerUrl is introduced, with a warning about insecure transport for internal Keycloak. This is a critical configuration. Ensure that the documentation clearly explains the implications of using http://keycloak internally versus a public https URL for production, and the necessary steps to secure it (e.g., mTLS, public exposure with TLS).

apps/agentstack-server/config.yaml (1-5)

high

This config.yaml file seems to contain hardcoded authentication details (adminPassword: "neco"). If this is a temporary or development-specific configuration, it should be clearly documented or moved to a more appropriate location (e.g., a .env.development file) to prevent accidental deployment with sensitive information. For production, such values should always be managed via environment variables or secrets.

apps/agentstack-server/src/agentstack_server/api/auth/auth.py (336-369)

high

The validate_basic_auth function uses the Resource Owner Password Credentials Grant (ROPC). While convenient for certain scenarios, ROPC is generally discouraged in modern OAuth 2.0/OIDC flows due to security concerns, as it requires the client to handle the user's credentials directly. It's often preferable to use more secure flows like Authorization Code Grant with PKCE. If ROPC is strictly necessary, ensure that its use is limited to highly trusted first-party clients and that the risks are understood and mitigated.

helm/templates/ui/secret.yaml (14-21)

high

The logic for generating agentstackUiClientSecret ensures that a secret is generated if not explicitly provided or found in an existing secret. Similar to the server client secret, for production environments, it's crucial to ensure this secret is explicitly set and managed securely rather than relying on auto-generation.

apps/agentstack-server/src/agentstack_server/configuration.py (94-98)

high

Hardcoding name, issuer, external_issuer, client_id, and client_secret directly in the OidcConfiguration class as default values can lead to inflexibility and potential security issues if these values are sensitive or need to change per environment. It's generally better to load these from environment variables or a configuration management system, allowing for dynamic configuration without code changes. While pydantic-settings handles environment variables, providing sensitive defaults directly in code is not ideal.

helm/templates/keycloak/secret.yaml (22-38)

high

The db-password for Keycloak is conditionally set based on useDedicatedDatabase. When useDedicatedDatabase is false, it uses include "agentstack.databasePassword" .. This implies Keycloak shares the database with the main application. Ensure that the shared database password management is robust and that Keycloak's access to this shared database is appropriately permissioned to prevent unauthorized access or privilege escalation.

helm/templates/keycloak/secret.yaml (10-21)

high

The logic for generating admin-password for Keycloak ensures that a password is set either from .Values.keycloak.auth.adminPassword or auto-generated if not found in an existing secret. While auto-generation is convenient, for production deployments, it's highly recommended to explicitly provide and securely manage the Keycloak admin password rather than relying on auto-generation, to prevent potential security risks.

apps/agentstack-server/src/agentstack_server/infrastructure/persistence/migrations/alembic/versions/109a6afdf170_.py (31)

high

The downgrade function raises a NotImplementedError. While understandable given the shift in user role management to Keycloak, this means there's no clear path to revert this migration. In a production environment, this could be problematic if a rollback is ever required. Consider if there's any way to provide a graceful downgrade path, even if it involves data loss warnings or manual steps.

helm/templates/config/secret.yaml (62-69)

high

The logic for generating agentstackServerClientSecret ensures that a secret is generated if not explicitly provided or found in an existing secret. This is good for initial deployments. However, for production environments, it's crucial to ensure that this secret is explicitly set and managed securely (e.g., via a secrets management system) rather than relying on auto-generation, to maintain consistent and secure client credentials.

helm/templates/_helpers.tpl (76-122)

medium

The agentstack.keycloak.dbEnvVars helper is well-structured for managing Keycloak database environment variables. However, the KC_DB_PASSWORD is retrieved from keycloak-secret. Ensure that this secret is properly created and populated with the correct database password, especially when useDedicatedDatabase is false and it relies on the shared database password, to prevent connection failures.

apps/agentstack-server/src/agentstack_server/configuration.py (118-119)

medium

The condition self.issuer.scheme != "http" or self.issuer.host != "keycloak" implies that insecure transport is only allowed for a specific internal Keycloak instance. This is a good security measure. However, it might be more robust to check for self.issuer.is_secure (if AnyUrl supports it or similar logic) and explicitly list allowed insecure hosts rather than relying on a single hardcoded host, especially if the internal Keycloak hostname could change.

apps/agentstack-ui/src/app/(auth)/auth.ts (30-44)

medium

The createOIDCProvider function now includes specific Keycloak endpoint URLs (/protocol/openid-connect/auth, /protocol/openid-connect/token, /protocol/openid-connect/userinfo, /protocol/openid-connect/certs) when external_issuer is present. While this works for Keycloak, it makes the createOIDCProvider function less generic. If other OIDC providers are to be supported in the future, this logic might need to be more dynamic or configurable, possibly by relying on the .well-known/openid-configuration endpoint for discovery.

apps/agentstack-ui/src/app/(auth)/auth.ts (161-163)

medium

The signOut event handler directly accesses process.env.OIDC_PROVIDER_ISSUER, process.env.OIDC_PROVIDER_CLIENT_ID, and process.env.OIDC_PROVIDER_CLIENT_SECRET. While these are available server-side, it's generally safer to pass configuration values through the runtimeConfig or providers object rather than directly accessing process.env within callbacks, especially if these values could vary or be sensitive.

apps/agentstack-ui/src/app/(auth)/types.ts (16)

medium

Adding external_issuer: z.string().optional() to baseProviderConfigSchema is a good step for Keycloak integration. However, ensure that the UI components and logic correctly handle the presence or absence of external_issuer when constructing URLs or displaying information, to avoid unexpected behavior or broken links.

apps/agentstack-cli/src/agentstack_cli/commands/platform/init.py (88)

medium

This print statement appears to be for debugging purposes. Please remove it before merging to avoid unnecessary output in production environments.

apps/agentstack-server/tasks.toml (173-180)

medium

The agentstack-server:dev:reload task is a good addition for development. However, it forces helm:build and agentstack-cli:build:copy-helm-chart every time. While helm:build might be necessary, agentstack-cli:build:copy-helm-chart might not always be needed if the chart hasn't changed. Consider if these can be made conditional or if there's a faster way to reload without rebuilding everything if not necessary.

helm/templates/deployment.yaml (272-273)

medium

The AUTH__OIDC__EXTERNAL_ISSUER value is conditionally set. If keycloakIssuerUrl starts with http://keycloak, it defaults to http://localhost:8336/realms/agentstack. This localhost URL might not be accessible from all contexts (e.g., if the UI is not running on the same host as the server or if network policies restrict localhost access). Ensure this localhost URL is appropriate for the intended deployment scenario, or consider making it configurable.

helm/templates/keycloak/provision-job.yaml (56-62)

medium

The Keycloak provisioning job creates or updates the agentstack-server client. It uses kcadm.sh get clients to check for existence and then either create or update. This is a robust approach. However, ensure that the SERVER_CLIENT_SECRET is correctly passed and handled securely, as it's a critical credential for the server to interact with Keycloak.

helm/templates/keycloak/provision-job.yaml (65-98)

medium

The provisioning logic for the agentstack-ui client, including client scope and audience mapper creation, is quite extensive. This ensures proper OIDC configuration for the UI. However, the UI_URL is used in the audience mapper configuration. Ensure that UI_URL is correctly set and reflects the actual external URL of the UI, as misconfiguration could lead to token validation issues or security vulnerabilities.

apps/agentstack-server/src/agentstack_server/configuration.py (130)

medium

Changing the default for BasicAuthConfiguration.enabled to True means basic authentication will be active by default. Ensure this is the desired behavior for all deployment scenarios, as it could have security implications if not properly managed (e.g., default credentials being active).

apps/agentstack-server/template.env (34-36)

medium

The commented-out lines for AUTH__BASIC__ADMIN_PASSWORD and AUTH__JWT_SECRET_KEY are removed. While removing commented-out code is generally good, ensure that these environment variables are now correctly configured elsewhere or are no longer needed, especially AUTH__JWT_SECRET_KEY which is critical for internal JWTs. If they are still relevant for local development, they should be present and correctly configured.

helm/templates/ui/deployment.yaml (83)

medium

The NEXTAUTH_SECRET is now retrieved from agentstack-ui-secret. This is a good practice for managing secrets. Ensure that this secret is properly generated and stored securely, as it's critical for NextAuth's session management.

helm/templates/ui/deployment.yaml (93-96)

medium

The OIDC_PROVIDER_CLIENT_SECRET for the UI is now retrieved from agentstack-ui-secret. This is a good practice. However, if the UI is a public client (as suggested by publicClient=false in keycloak/provision-job.yaml but clientAuthenticatorType=client-secret implies confidential), the client secret might not be strictly necessary or should be handled with extreme care to prevent exposure.

helm/templates/ui/deployment.yaml (97-100)

medium

The OIDC_PROVIDER_ISSUER and OIDC_PROVIDER_EXTERNAL_ISSUER are configured for the UI. Similar to the server deployment, the EXTERNAL_ISSUER might default to localhost if keycloakIssuerUrl is internal. Ensure that the UI can correctly resolve and access this EXTERNAL_ISSUER URL from its deployment context, as misconfiguration can lead to authentication failures.

helm/templates/ui/providers.yaml (12)

medium

The providers.json content is now directly {{ .Values.auth.oidc.providers | toJson }}. Given the refactoring in apps/agentstack-ui/src/app/(auth)/auth.ts to read individual environment variables for a single OIDC provider, this providers.json secret might no longer be used by the UI application. Verify if this secret is still necessary or if it can be removed to simplify the deployment.

apps/agentstack-server/src/agentstack_server/api/dependencies.py (89)

medium

Logging e directly at critical level without sanitization might expose sensitive information (e.g., stack traces, token details) in logs. Consider logging a more generic error message or redacting sensitive parts of the exception to prevent information leakage.

apps/agentstack-cli/src/agentstack_cli/commands/platform/base_driver.py (183-184)

medium

The istio.install call is commented out. If Istio integration is no longer needed or is being deferred, please remove the commented-out code and the corresponding import statement (import agentstack_cli.commands.platform.istio) from the file to keep the codebase clean and avoid confusion.

helm/values.yaml (136-139)

medium

The nextauth_trust_host and nextauth_url are important for NextAuth configuration. The nextauth_url is now http://localhost:8334. Ensure this localhost URL is correctly handled in deployment environments where the UI might not be accessed directly via localhost, or if it's exposed through an ingress. It might need to be dynamically set based on the ingress URL.

helm/values.yaml (140)

medium

The providers list under auth.oidc has been removed. This aligns with the new single-provider configuration in the server and UI. Ensure that any existing deployments or configurations that relied on a list of OIDC providers are updated to reflect this change.

policy.yaml (1-54)

medium

The new policy.yaml defines a NetworkPolicy for restricted egress access. This is a good security practice to limit outbound connections from pods. Ensure that the podSelector (app: test-client) and the egress rules (DNS, agentstack-server, external internet with exceptions) are precisely configured to meet the application's communication requirements while minimizing the attack surface. Any misconfiguration could lead to connectivity issues.

tasks.toml (185)

medium

The __OLD_ADMIN_PASSWORD export has been removed. This aligns with the shift in authentication mechanisms. Ensure that any scripts or processes that previously relied on this environment variable for admin password access are updated or no longer require it, to prevent unexpected authentication failures.

@jezekra1 jezekra1 force-pushed the add-keycloak-poc branch 7 times, most recently from 9341001 to 53d6529 Compare January 22, 2026 14:50
@jezekra1 jezekra1 changed the title Add keycloak poc Add central keycloak OIDC Jan 22, 2026
@jezekra1 jezekra1 marked this pull request as ready for review January 22, 2026 15:25
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 22, 2026
Signed-off-by: Radek Ježek <radek.jezek@ibm.com>
@jezekra1
Copy link
Collaborator Author

/gemini review

@jezekra1
Copy link
Collaborator Author

Recreated as a new PR because gemini refuses to re-review: #1879

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive integration with Keycloak for OIDC authentication, centralizing user and role management. This is a significant architectural shift that improves security and scalability by offloading authentication concerns to a dedicated identity provider. The changes span across the server, CLI, UI, and Helm charts, reflecting a consistent approach to this new authentication mechanism. The removal of internal user role management and the introduction of Keycloak provisioning via Helm hooks are well-executed. Several hardcoded credentials have been replaced with more dynamic or configurable options, which is a positive security improvement. The addition of new e2e tests specifically for the authentication flow is also highly valuable.

host=os.getenv("HOST", "127.0.0.1"),
port=int(os.getenv("PORT", 8000)),
configure_telemetry=True,
self_registration_client_factory=lambda: PlatformClient(auth=("admin", "admin")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding credentials like ("admin", "admin") for PlatformClient is a security risk. These credentials should ideally be fetched from secure configuration, environment variables, or a secrets management system, especially in production environments. Consider making these configurable.

Suggested change
self_registration_client_factory=lambda: PlatformClient(auth=("admin", "admin")),
self_registration_client_factory=lambda: PlatformClient(auth=(os.getenv("PLATFORM_ADMIN_USERNAME", "admin"), os.getenv("PLATFORM_ADMIN_PASSWORD", "admin"))),

Comment on lines +126 to +127
"auth": {"adminPassword": "admin"},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding adminPassword: "admin" for Keycloak in the Helm values is a security concern. This default password should be dynamically generated or retrieved from a secure source, especially since it's used for an admin account. The Helm chart already has logic to generate random secrets for Keycloak admin passwords, which should be leveraged here.

Suggested change
"auth": {"adminPassword": "admin"},
},
"auth": {"adminPassword": "{{ .Values.keycloak.auth.adminPassword | default (randAlphaNum 32) }}"},

Comment on lines +116 to +124
if self.insecure_transport:
if self.issuer.scheme != "http" or self.issuer.host != "keycloak":
raise ValueError("Insecure transport is only allowed for internal keycloak!")

os.environ["AUTHLIB_INSECURE_TRANSPORT"] = "1"
logger.warning(
"Oauth Authentication is enabled with insecure transport! "
+ "Using mTLS (using istio service mesh) or exposing keycloak publicly is highly recommended!"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Setting os.environ["AUTHLIB_INSECURE_TRANSPORT"] = "1" when insecure_transport is enabled is a significant security risk. While a warning is logged, it's crucial to emphasize that this should only be used for internal, trusted Keycloak instances (as indicated by the issuer.host == "keycloak" check) and never in a production environment exposed to the public internet without proper TLS termination. Misuse could lead to token interception.

Comment on lines +114 to +120
# TODO: Implement keycloak integration
# app.add_typer(
# agentstack_cli.commands.user.app,
# name="user",
# no_args_is_help=True,
# help="Manage users. [Admin only]",
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The user management functionality has been commented out with a TODO for Keycloak integration. While the shift to Keycloak is understood, leaving core features commented out with a generic TODO can lead to them being forgotten or deprioritized. It would be beneficial to create a dedicated issue to track the re-implementation of user management via Keycloak, including a clear plan and timeline, or to remove the commented-out code if it's not intended to be re-enabled in the near future.

Comment on lines +330 to +363
async def validate_basic_auth(username: str, password: str, configuration: Configuration) -> str:
"""
Validate basic auth credentials against the OIDC provider using the Resource Owner Password Credentials Grant.
"""
provider = configuration.auth.oidc.provider
issuer = await discover_issuer(provider)
token_endpoint = issuer.get("token_endpoint")
if not isinstance(token_endpoint, str):
raise IssuerDiscoveryError(f"Provider {provider.issuer} does not support token endpoint discovery")

async with httpx.AsyncClient() as client:
try:
resp = await client.post(
token_endpoint,
data={
"grant_type": "password",
"username": username,
"password": password,
"client_id": provider.client_id,
"client_secret": provider.client_secret.get_secret_value(),
"scope": "openid email profile", # Request standard scopes
},
)
access_token: str = cast(str, resp.raise_for_status().json()["access_token"])
return access_token
except httpx.HTTPStatusError as e:
if e.response.status_code == 401:
# Invalid credentials
raise ValueError("Invalid credentials") from e
logger.warning(f"Basic auth validation failed for user {username}: {e}")
raise ValueError(f"Basic auth validation failed: {e}") from e
except Exception as e:
logger.warning(f"Basic auth validation unexpected error for user {username}: {e}")
raise ValueError(f"Basic auth validation failed: {e}") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validate_basic_auth function implements the Resource Owner Password Credentials Grant. While this grant type can be useful for certain trusted clients or legacy integrations, it is generally discouraged for new applications due to security concerns, as it involves sending user credentials directly to the authorization server. Ensure that the use of this grant type is justified by specific requirements and that appropriate security measures are in place to mitigate the associated risks.

Comment on lines +136 to +141
if configuration.auth.basic.enabled and basic_auth:
try: # get access_token and continue with bearer_auth
access_token = await validate_basic_auth(basic_auth.username, basic_auth.password, configuration)
bearer_auth = HTTPAuthorizationCredentials(scheme="Bearer", credentials=access_token)
except Exception as e:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid credentials") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The authorized_user function now converts basic authentication into a bearer token and then proceeds with the bearer token authentication flow. While this approach consolidates the authentication logic, the overall structure of the authorized_user function has become quite complex with nested if statements and try-except blocks. Consider refactoring this function to improve readability and maintainability, perhaps by extracting parts of the logic into smaller, more focused helper functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants