Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Architecture and Authentication

## Component map

| Component | Provided by | Role |
| --- | --- | --- |
| Bedrock Agent (Claude / Nova) | AWS | Conversational LLM, calls `query_genie` as a tool |
| AgentCore Gateway | AWS | Hosts the Genie MCP target, brokers tool calls |
| AgentCore Identity | AWS | OAuth credential provider — mints Databricks tokens for the Gateway |
| CloudWatch | AWS | Audit and tracing |
| Managed MCP Server | Databricks | `{DATABRICKS_HOST}/api/2.0/mcp/genie/{space_id}` |
| Genie Space + Trusted Assets | Databricks | NL → SQL with metric definitions |
| Unity Catalog | Databricks | Governance, lineage, audit attribution |
| Delta Lake | Databricks | Governed tables backing the Genie space |

## End-to-end flow (OBO mode)

1. End user sends an NL question to the Bedrock agent
2. The agent decides to call `query_genie` (an MCP tool registered via AgentCore Gateway)
3. AgentCore Gateway looks up the credential provider configured on the target
4. AgentCore Identity exchanges the user's session for a Databricks bearer token via the OAuth `authorization_code` flow against the Databricks workspace OAuth app
5. The Gateway calls `POST {DATABRICKS_HOST}/api/2.0/mcp/genie/{space_id}/tools/query_genie` with that bearer token
6. Databricks validates the token, executes the Genie conversation, runs the resulting SQL through Unity Catalog with the end-user's grants enforced
7. Result (SQL, narrative, rows) flows back to the agent
8. UC audit log attributes the SQL to the human end user

## Two auth modes

### `obo` (recommended for production)

End-user identity propagates via OAuth U2M.

- `grant_type=authorization_code`
- Scopes: `all-apis offline_access`
- Requires: a Databricks workspace OAuth app (custom integration) with a redirect URI matching what AgentCore Identity exposes

What you can claim:
- "Genie answers reflect each user's UC permissions" — true
- "UC audit logs name the human end user" — true
- "No data movement" — true

Caveats to disclose:
- The redirect URI is a two-pass setup (placeholder first, real URI after AgentCore provisions the credential provider). Use `scripts/sync_oauth_redirect.py` to close the loop.
- Refresh-token reuse / per-user TTL is roadmap (v0.2 in the reference repo)

### `m2m` (booth-demo quick start)

Every Genie call uses a Databricks service principal. **Do not claim user-level governance in this mode.**

- `grant_type=client_credentials`
- Scopes: `all-apis`
- Requires: a Databricks SP with workspace + UC read on the Genie space's underlying tables

What you can claim:
- "Bedrock agent answers governed numerical questions" — true (the SP has UC grants)
- "Metric definitions are consistent" — true (Trusted Assets remain authoritative)

Caveats to disclose:
- All callers see the same rows the SP can see; no per-user differentiation
- UC audit attributes SQL to the SP, not to the end user
- Acceptable for single-tenant testing; never ship as the production posture

## OAuth credential provider configuration

The AgentCore Identity OAuth credential provider is the linchpin. In Terraform (`awscc_bedrockagentcore_oauth_credential_provider`) or CloudFormation (`AWS::BedrockAgentCore::OAuthCredentialProvider`):

```hcl
resource "awscc_bedrockagentcore_oauth_credential_provider" "databricks" {
name = "${local.name}-databricks-oauth"
credential_provider_vendor = "CustomOauth2"
oauth2_provider_config_input = {
custom_oauth2_provider_config = {
client_id = var.auth_mode == "obo" ? var.databricks_oauth_client_id : var.databricks_client_id
client_secret = var.auth_mode == "obo" ? var.databricks_oauth_client_secret : var.databricks_client_secret
authorization_endpoint = "${var.databricks_host}/oidc/v1/authorize"
token_endpoint = "${var.databricks_host}/oidc/v1/token"
scopes = var.auth_mode == "obo" ? ["all-apis", "offline_access"] : ["all-apis"]
grant_type = var.auth_mode == "obo" ? "authorization_code" : "client_credentials"
}
}
}
```

The Databricks OAuth credentials live in Secrets Manager so the Gateway can fetch them at tool-invocation time. The Gateway target IAM role needs `secretsmanager:GetSecretValue` on the secret ARNs and `s3:GetObject` on the schema bucket.

## Identity flow gotchas

- **Redirect URI mismatch (OBO):** the most common cause of `403` on UC tables. After Terraform/CFN provisions the credential provider, run `aws bedrock-agentcore-control get-oauth-credential-provider` to read the redirect URI, then update the Databricks OAuth app to match. The reference repo's `scripts/sync_oauth_redirect.py` automates this via the Databricks Account API.
- **`DATABRICKS_OAUTH_PROVIDER_ARN` confusion:** this env var is the AgentCore Identity OAuth credential provider ARN, **not** a Secrets Manager ARN. Treating these as the same thing is a common bug — `register_gateway.py` requires the credential provider ARN.
- **AgentCore schema naming:** `AWS::BedrockAgentCore::OAuthCredentialProvider` and the corresponding `awscc_*` attribute names evolved during preview. If `terraform init` or `cfn deploy` errors on a renamed property, check the current CFN registry.

## Governance validation

After deploying in OBO mode, validate that user-level governance actually works before claiming it in a customer demo:

1. Run the same question as **two distinct end users** with different UC grants on the underlying tables
2. Confirm they get **different rows** (or `403` for the user without grants)
3. Inspect UC audit logs in `system.access.audit` — confirm the SQL is attributed to the **human end user**, not to the OAuth app's client ID

If any of those checks fail, the OBO wiring is wrong even if smoke tests pass with one user.
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Deployment: Terraform vs CloudFormation

Two IaC paths, picked based on which AWS account you're deploying into.

## Decision matrix

| Account type | Path | Why |
| --- | --- | --- |
| External customer AWS account | Terraform | Caller has direct IAM/Bedrock-AgentCore privileges; `awscc` provider gives Terraform-native AgentCore primitives |
| FE Sandbox AWS account | CloudFormation | Caller's SSO role lacks `iam:CreateRole` and `bedrock-agentcore:*` — must deploy via a pre-blessed exec role |
| Other restricted AWS account | CloudFormation | Same reason as FE Sandbox; the exec-role pattern generalizes |
| Customer is mandated CloudFormation-only | CloudFormation | Some enterprises prohibit Terraform — meet them where they are |

## What both paths provision

| Resource | Purpose |
| --- | --- |
| S3 bucket | Holds the Genie MCP OpenAPI schema |
| Secrets Manager (M2M) | Databricks SP credentials |
| Secrets Manager (OBO) | Databricks workspace OAuth app credentials |
| IAM role for Gateway target | `secretsmanager:GetSecretValue`, `s3:GetObject` |
| AgentCore OAuth credential provider | Mints Databricks tokens (M2M or OBO depending on `AUTH_MODE`) |
| AgentCore Gateway | Hosts the Genie MCP target |
| IAM role for Bedrock agent | `bedrock:InvokeModel`, `bedrock-agentcore:InvokeGateway` |
| Bedrock agent + alias | The conversational runtime |

## Terraform path (external AWS)

Uses the `awscc` provider for AgentCore primitives because the AWS provider doesn't ship native `aws_bedrockagentcore_*` resources at the time of writing.

```hcl
terraform {
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.60" }
awscc = { source = "hashicorp/awscc", version = "~> 1.0" }
}
}
```

Deploy:

```bash
cd terraform
terraform init
terraform apply \
-var "databricks_host=$DATABRICKS_HOST" \
-var "databricks_client_id=$DATABRICKS_CLIENT_ID" \
-var "databricks_client_secret=$DATABRICKS_CLIENT_SECRET" \
-var "databricks_oauth_client_id=$DATABRICKS_OAUTH_CLIENT_ID" \
-var "databricks_oauth_client_secret=$DATABRICKS_OAUTH_CLIENT_SECRET" \
-var "genie_space_id=$GENIE_SPACE_ID" \
-var "auth_mode=$AUTH_MODE"
```

Outputs to capture into `.env`: `gateway_id`, `oauth_provider_arn`, `schema_s3_bucket`, `bedrock_agent_id`, `bedrock_agent_alias_id`.

## CloudFormation path (FE Sandbox / restricted accounts)

### The pre-blessed exec-role pattern

The problem: on FE Sandbox the SSO role typically has

- ✅ `cloudformation:*`, `s3:*`
- ❌ `iam:CreateRole`, `iam:PutRolePolicy`
- ❌ `bedrock-agentcore:CreateGateway`, `CreateOauthCredentialProvider`
- ❌ `bedrock:CreateAgent`

That blocks Terraform-as-you AND `aws cloudformation deploy --capabilities CAPABILITY_NAMED_IAM` from the SSO role. The workaround (Ioannis Papadopoulos's pattern, originated for the Agent Bricks ↔ Bedrock/AgentCore demo):

1. **One-time, by an account admin:** create an IAM role `cfn-bedrock-agentcore-deployer` with all the privileges CFN needs. Trust policy lets your SSO role call `sts:AssumeRole`.
2. **Each deploy:** `aws cloudformation deploy --role-arn $CFN_DEPLOYER_ROLE_ARN ...` — CFN executes as the pre-blessed role.
3. **You only need:** `cloudformation:*` and `sts:AssumeRole` on the deployer role. Both usually allowed on FE Sandbox.

### IAM policy for the deployer role

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3SchemaBucket",
"Effect": "Allow",
"Action": [
"s3:CreateBucket", "s3:DeleteBucket", "s3:GetBucketLocation",
"s3:GetBucketPolicy", "s3:GetBucketPublicAccessBlock",
"s3:GetBucketVersioning", "s3:ListBucket", "s3:PutBucketPolicy",
"s3:PutBucketPublicAccessBlock", "s3:PutBucketVersioning",
"s3:DeleteObject", "s3:GetObject", "s3:PutObject"
],
"Resource": [
"arn:aws:s3:::dbx-genie-mcp-*",
"arn:aws:s3:::dbx-genie-mcp-*/*"
]
},
{
"Sid": "SecretsManager",
"Effect": "Allow",
"Action": [
"secretsmanager:CreateSecret", "secretsmanager:DeleteSecret",
"secretsmanager:DescribeSecret", "secretsmanager:GetSecretValue",
"secretsmanager:PutSecretValue", "secretsmanager:TagResource",
"secretsmanager:UntagResource", "secretsmanager:UpdateSecret"
],
"Resource": "arn:aws:secretsmanager:*:*:secret:dbx-genie-mcp-*"
},
{
"Sid": "IAM",
"Effect": "Allow",
"Action": [
"iam:AttachRolePolicy", "iam:CreateRole", "iam:DeleteRole",
"iam:DeleteRolePolicy", "iam:DetachRolePolicy", "iam:GetRole",
"iam:GetRolePolicy", "iam:ListAttachedRolePolicies",
"iam:ListRolePolicies", "iam:PassRole", "iam:PutRolePolicy",
"iam:TagRole", "iam:UntagRole", "iam:UpdateAssumeRolePolicy"
],
"Resource": ["arn:aws:iam::*:role/dbx-genie-mcp-*"]
},
{
"Sid": "BedrockAgentCore",
"Effect": "Allow",
"Action": [
"bedrock-agentcore:CreateGateway", "bedrock-agentcore:DeleteGateway",
"bedrock-agentcore:GetGateway", "bedrock-agentcore:ListGateways",
"bedrock-agentcore:UpdateGateway",
"bedrock-agentcore:CreateGatewayTarget",
"bedrock-agentcore:DeleteGatewayTarget",
"bedrock-agentcore:GetGatewayTarget",
"bedrock-agentcore:ListGatewayTargets",
"bedrock-agentcore:UpdateGatewayTarget",
"bedrock-agentcore:CreateOauthCredentialProvider",
"bedrock-agentcore:DeleteOauthCredentialProvider",
"bedrock-agentcore:GetOauthCredentialProvider",
"bedrock-agentcore:ListOauthCredentialProviders",
"bedrock-agentcore:UpdateOauthCredentialProvider"
],
"Resource": "*"
},
{
"Sid": "BedrockAgent",
"Effect": "Allow",
"Action": [
"bedrock:CreateAgent", "bedrock:CreateAgentAlias",
"bedrock:DeleteAgent", "bedrock:DeleteAgentAlias",
"bedrock:GetAgent", "bedrock:GetAgentAlias",
"bedrock:ListAgents", "bedrock:ListAgentAliases",
"bedrock:PrepareAgent", "bedrock:UpdateAgent",
"bedrock:UpdateAgentAlias",
"bedrock:AssociateAgentKnowledgeBase",
"bedrock:DisassociateAgentKnowledgeBase",
"bedrock:TagResource", "bedrock:UntagResource"
],
"Resource": "*"
}
]
}
```

The policy is resource-scoped to `dbx-genie-mcp-*` so the deployer role can't be reused against unrelated infra. A copy-paste Slack template for the admin ask is at `cloudformation/REQUEST_DEPLOYER_ROLE.md` in the reference repo.

### Deploy

```bash
export CFN_DEPLOYER_ROLE_ARN=arn:aws:iam::<account>:role/cfn-bedrock-agentcore-deployer
cd cloudformation
./deploy.sh
```

`deploy.sh` reads `.env`, then runs:

```bash
aws cloudformation deploy \
--region "$AWS_REGION" \
--stack-name "$STACK_NAME" \
--template-file stack.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--role-arn "$CFN_DEPLOYER_ROLE_ARN" \
--parameter-overrides ...
```

Outputs are surfaced via `aws cloudformation describe-stacks --query 'Stacks[0].Outputs' --output table` — capture into `.env`.

## Picking between the two

- If you might run this on FE Sandbox at any point, **maintain the CloudFormation path** as the source of truth and treat Terraform as the convenience layer. Sandbox is the harder constraint.
- If you're only deploying to external customer AWS accounts, **stick with Terraform** — it's the more expressive of the two, and customers running Terraform shops will prefer it.
- The reference repo (`databricks-aws-integrations/genie_with_bedrock_agentcore`) keeps both paths to cover both audiences. Drift between them is the primary maintenance cost — when changing one, change the other.

## Schema-name caveat

`AWS::BedrockAgentCore::*` resource names and properties evolved through preview. The IaC referenced here was authored against the 2026-05 CFN registry. If `terraform init` errors on `awscc_bedrockagentcore_*`, or `aws cloudformation validate-template` errors on a `BedrockAgentCore::*` property, check the current registry schema and patch.
Loading