Skip to content

Migrate agent tools from runtime to individual Lambda functions behind AgentCore Gateway#139

Open
msumari wants to merge 7 commits intomainfrom
msumari/tools-out
Open

Migrate agent tools from runtime to individual Lambda functions behind AgentCore Gateway#139
msumari wants to merge 7 commits intomainfrom
msumari/tools-out

Conversation

@msumari
Copy link
Copy Markdown
Contributor

@msumari msumari commented Mar 1, 2026

Issue number: closes #

Summary

Moves all agent tools (storage, journal, and the generic use_aws) out of the agent runtime into standalone Lambda functions behind the Bedrock AgentCore Gateway. Each Lambda has its own scoped IAM role, replacing the previous "overlord permissions" pattern where the runtime held broad access to S3, DynamoDB, Lambda, CloudWatch, and Pricing APIs.

Changes

New Lambda tools (7 handlers): storage_tool, journal_tool, interceptor, lambda_discovery_tool, cloudwatch_metrics_tool, cloudwatch_logs_tool, pricing_tool — each with tool-name routing, structured JSON responses, and scoped IAM.

CDK construct:
gateway.ts — Cognito OAuth2 (client credentials), Lambda targets with tool schemas, interceptor escape hatch, cdk-nag suppressions.

MCP client: src/mcp_client/ — OAuth2 token caching, MCPClient factory for the agent runtime to call tools via the gateway.

Agent runtime (main.py): Replaced storage, journal, and use_aws imports with a single mcp_client.

Infrastructure (infra-stack.ts, agent.ts): Removed direct S3 bucket policy, S3_BUCKET_NAME env var, and monitoringPolicy from the runtime.

Prompt updates: analysis_prompt.md references new tool names. report_prompt.md adds ASCII-only output constraint to prevent UTF-8 mojibake.

Deleted:
journal.py, storage.py, old tests, agent-iam-policies.spec.ts.

Security
Path traversal validation on storage filenames (allowlist regex)
Interceptor blocks prompt injection and redacts PII

Tests
210 Python tests (99% coverage), 41 TypeScript tests (100% statement coverage).

Please provide a summary of what's being changed

User experience

Please share what the user experience looks like before and after this change


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@msumari msumari requested a review from a team as a code owner March 1, 2026 18:19
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 1, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/agents
  main.py
  src/mcp_client
  __init__.py
  client.py
  src/shared
  config.py
  src/tools
  __init__.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Copy Markdown
Member

@matteofigus matteofigus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I've gone through the diff and have some concerns about the overall approach.

The core premise doesn't hold up

The PR is framed as moving tools from the AgentCore runtime to Lambda for better permission scoping, but AgentCore runtime is Lambda under the hood. So this effectively goes from 1 Lambda with in-process tools to 8 Lambdas + Cognito + AgentCore Gateway + OAuth2 — without a meaningful security improvement to justify that complexity.

The permission improvements are marginal and achievable without this change

Comparing old vs new IAM policies side by side:

Tool Old (runtime role) New (per-Lambda role)
Lambda discovery lambda:ListFunctions, GetFunction, etc → * Same actions → *
CloudWatch metrics cloudwatch:GetMetricStatistics, ListMetrics* Same actions → *
CloudWatch logs logs:StartQuery/aws/lambda/*, StopQuery/GetQueryResults* Same
Pricing pricing:GetProducts* Same
S3 storage GetObject/PutObject on bucket grantReadWrite (broader — adds Abort*, DeleteObject*, GetBucket*, List*)
DynamoDB journal Full table access on runtime PutItem only ✅

The only real win is scoping journal to PutItem. The S3 change is actually a regression. The monitoring permissions are identical because those APIs don't support resource-level scoping. Both improvements could be achieved by tightening the existing runtime role — no new infrastructure needed.

New costs and tradeoffs

  • Latency: Every tool call now goes runtime → HTTP → Gateway → OAuth2 check → Lambda invoke → response, vs a direct in-process function call
  • Reliability: More failure modes — gateway outages, OAuth token endpoint issues, cold starts on tool Lambdas. If the gateway is down, the agent has zero tools
  • Cost: 7 additional Lambdas, Cognito user pool, AgentCore Gateway for what was previously free (in-process calls)
  • Complexity: 752 lines of CDK for gateway.ts alone, plus MCP client, OAuth token caching, etc.
  • Debugging: Tool call failures now span multiple Lambda invocations and log groups

Other issues

  • gateway.clientSecret.unsafeUnwrap() passes the Cognito client secret as a plaintext environment variable. Should use Secrets Manager or SSM SecureString.
  • The interceptor's regex-based prompt injection blocking is trivially bypassable and gives a false sense of security.
  • use_aws (generic AWS API tool) was removed without replacement — potential functional regression if the agent used it for anything beyond the 5 specific APIs now covered.
  • Token cache (_token_cache module-level dict) has no thread safety.

Suggestion

If the goal is tighter permissions, I'd suggest just scoping the existing runtime role more tightly (e.g., DynamoDB PutItem only, remove any excess S3 actions). That achieves the same or better security posture without the architectural overhead.


This review was done with the help of Claude Opus 4.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants