Skip to content

Conversation

@leocavalcante
Copy link

Summary

Adds multi-provider support allowing the proxy to route requests to either GitHub Copilot API or AWS Bedrock based on the model name.

Routing Logic

  • Opus models → GitHub Copilot API
  • Sonnet/Haiku models → AWS Bedrock (if configured)

If AWS Bedrock credentials are not configured, all models automatically fall back to GitHub Copilot API.

Configuration

AWS Bedrock is configured via environment variables:

export AWS_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_key_id
export AWS_SECRET_ACCESS_KEY=your_secret_key

Implementation Details

  • ✅ AWS Bedrock Converse API integration
  • ✅ Model-based provider routing (transparent to clients)
  • ✅ Support for both streaming and non-streaming responses
  • ✅ Automatic translation between OpenAI/Anthropic and Bedrock formats
  • ✅ Graceful fallback to Copilot if Bedrock not configured

Files Changed

  • src/services/bedrock/create-chat-completions.ts - New Bedrock service
  • src/services/provider-router.ts - Provider routing logic
  • src/lib/state.ts - Added AWS credentials to state
  • src/start.ts - Load AWS credentials from environment
  • src/routes/*/handler.ts - Updated to use provider router

Test Plan

  • Test with Opus model → should route to GitHub Copilot
  • Test with Sonnet model → should route to AWS Bedrock (if configured)
  • Test with Haiku model → should route to AWS Bedrock (if configured)
  • Test fallback when AWS not configured
  • Test both streaming and non-streaming modes

Implements multi-provider support allowing the proxy to route requests
to either GitHub Copilot API or AWS Bedrock based on the model name.

Routing logic:
- Opus models → GitHub Copilot API
- Sonnet/Haiku models → AWS Bedrock (if configured)

AWS Bedrock is configured via environment variables:
- AWS_REGION
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY

If AWS credentials are not configured, all models fall back to GitHub
Copilot API.

Features:
- AWS Bedrock Converse API integration
- Model-based provider routing
- Support for both streaming and non-streaming responses
- Automatic translation between OpenAI and Bedrock formats

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
- Implement round-robin load balancing for Sonnet/Haiku models
- Keep Opus models exclusively on Copilot API
- Add provider logging to show routing decisions
- Fix empty content block handling in Bedrock translation
- Add session token support for AWS temporary credentials

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
- Fix model IDs from invalid us.anthropic IDs to correct anthropic.claude-*-4-0-v2:0 format
- Add default maxTokens of 4096 to prevent undefined values
- Add comprehensive debug logging for troubleshooting
- Improve stream error handling with try-catch to gracefully handle interruptions
- Always send [DONE] marker even if messageStop event not received
- Refactor to extract helper functions for chunk creation
- Ensures streams complete properly for Claude Code clients

Resolves issue where Bedrock was generating very short responses (10-22 tokens)
due to using global inference profiles instead of standard model IDs.

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
Claude 4 models are not yet available on AWS Bedrock, so map to
Claude 3.5 Sonnet and Haiku models instead:
- anthropic.claude-3-5-sonnet-20241022-v2:0
- anthropic.claude-3-5-haiku-20241022-v1:0

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
Use the full ARN format with correct Claude 4.5 model IDs:
- arn:aws:bedrock:*::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0
- arn:aws:bedrock:*::foundation-model/anthropic.claude-haiku-4-5-20251001-v1:0

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
Use the correct global inference profile IDs for Claude 4.5:
- global.anthropic.claude-sonnet-4-5-20250929-v1:0
- global.anthropic.claude-haiku-4-5-20251001-v1:0

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
The bug was in translateToBedrockFormat() - system messages with array
content (Anthropic format) were being converted to empty strings and
filtered out, losing the entire conversation context.

This caused Bedrock to generate very short responses (~32 tokens) because
it wasn't receiving the full context from continuation sessions.

Now properly extracts text from array-format content blocks and joins them.

Fixes the issue where Bedrock worked fine directly but failed through proxy.

Signed-off-by: leocavalcante <leonardo.cavalcante@picpay.com>
@leocavalcante leocavalcante closed this by deleting the head repository Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant