Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 46 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@
# Without a provider key, agentmemory runs in noop mode: observations are
# indexed via zero-LLM synthetic compression, hybrid search still works,
# but LLM-backed summarisation / reflection / consolidation are disabled.
# The detection order is OPENAI_API_KEY → MINIMAX_API_KEY → ANTHROPIC_API_KEY
# → GEMINI_API_KEY → OPENROUTER_API_KEY → noop.
# The detection order is AWS_BEDROCK → OPENAI_API_KEY → MINIMAX_API_KEY →
# ANTHROPIC_API_KEY → GEMINI_API_KEY → OPENROUTER_API_KEY → noop. Bedrock is
# first but only fires on the explicit AWS_BEDROCK=true opt-in flag.

# OPENAI_API_KEY=sk-... # Used for OpenAI-compatible embeddings today. PR #307 will extend this to chat completions (DeepSeek, SiliconFlow, vLLM, LM Studio, Ollama via `/v1`).
# OPENAI_BASE_URL=https://api.openai.com # Override for OpenAI-compatible providers
Expand All @@ -43,6 +44,32 @@
# MINIMAX_API_KEY=...
# MINIMAX_MODEL=MiniMax-M2.7

# AWS Bedrock (Anthropic models on Bedrock). Opt in with AWS_BEDROCK=true; takes
# precedence over the keys above when set. Credentials come from the standard AWS
# provider chain — environment creds, IAM roles, or an SSO profile cached under
# ~/.aws/sso/cache/ (select with AWS_PROFILE). NOTE: agentmemory reads the cached
# SSO token but cannot perform the login — run `aws sso login --profile <name>`
# first, and re-run it when the session expires.
# AWS_BEDROCK=true
# AWS_REGION=us-east-1 # Required for Bedrock
# AWS_PROFILE=my-sso-profile # Optional; consumed by the AWS SDK directly
# AWS_BEDROCK_MODEL=anthropic.claude-haiku-4-5-20251001-v1:0 # Default: Claude Haiku 4.5 (bare on-demand ID)
# The bare ID above only works in Regions that offer the model on-demand AND
# where model access is enabled in the Bedrock console. In other Regions, use
# the geo-prefixed cross-region inference profile, e.g.:
# AWS_BEDROCK_MODEL=us.anthropic.claude-haiku-4-5-20251001-v1:0 (or eu.…)
# AWS_ACCESS_KEY_ID=... # Optional explicit static creds (CI escape hatch); both must be set
# AWS_SECRET_ACCESS_KEY=... # to take effect, else the provider chain is used
# Optional auth-refresh hook: when a Bedrock call fails with an expired-token
# error, agentmemory runs this command (no shell — argv split on whitespace,
# quotes honored) and retries once. Use it to re-establish an expired SSO
# session unattended. SECURITY: only the literal string below is ever executed;
# no model/memory data is interpolated. Note `aws sso login` is interactive
# (opens a browser) — in a headless daemon there is no approver, so the command
# is bounded by AWS_AUTH_REFRESH_TIMEOUT_MS.
# AWS_AUTH_REFRESH=aws sso login --profile my-sso-profile
# AWS_AUTH_REFRESH_TIMEOUT_MS=120000 # Default: 120 000 ms (2 min)

# MAX_TOKENS=4096 # Cap LLM completion tokens for compression / summarise calls

# Outbound LLM / embedding timeout — shared across every raw-fetch provider
Expand All @@ -67,7 +94,7 @@
# OPENAI_API_KEY → VOYAGE_API_KEY → COHERE_API_KEY → OPENROUTER_API_KEY →
# local (Xenova/all-MiniLM-L6-v2, 384-dim).

# EMBEDDING_PROVIDER=local # local | openai | voyage | cohere | gemini | openrouter
# EMBEDDING_PROVIDER=local # local | openai | voyage | cohere | gemini | openrouter | bedrock

# VOYAGE_API_KEY=pa-... # Optimised for code embeddings

Expand All @@ -79,6 +106,22 @@

# OPENROUTER_EMBEDDING_MODEL=openai/text-embedding-3-small # When EMBEDDING_PROVIDER=openrouter

# AWS Bedrock embeddings (Cohere / Amazon Titan). Set EMBEDDING_PROVIDER=bedrock
# to use it — NOT auto-selected by AWS_BEDROCK=true (that opts into the Bedrock
# LLM only; embeddings stay on their current provider). Credentials come from the
# AWS provider chain (env / IAM role / SSO cache via AWS_PROFILE) — no key var —
# and the region is the shared AWS_REGION.
# AWS_BEDROCK_EMBEDDING_MODEL=cohere.embed-v4:0 # Default. Also: amazon.titan-embed-text-v2:0, cohere.embed-*-v3
# Some models are INFERENCE_PROFILE-only in a given region (e.g. cohere.embed-v4:0
# is not on-demand in us-east-2) and must use the geo-prefixed profile ID, e.g.:
# AWS_BEDROCK_EMBEDDING_MODEL=us.cohere.embed-v4:0 (or global.…). Titan v2 is
# on-demand and works with the bare ID. The us./eu./apac./global. prefix is
# stripped for model-family + known-dimensions detection.
# AWS_BEDROCK_EMBEDDING_DIMENSIONS=1024 # Default 1024. Cohere v4: 256/512/1024/1536; Titan v2: 256/512/1024.
# NOTE: the dimension is baked into the vector index — changing it later
# requires re-embedding all stored memories. Required for models not in the
# built-in known-dimensions table.

# -----------------------------------------------------------------------------
# 3. Auth & security
# -----------------------------------------------------------------------------
Expand Down
24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -882,6 +882,7 @@ npm install @xenova/transformers
| Voyage AI | `voyage-code-3` | Paid | Optimized for code |
| Cohere | `embed-english-v3.0` | Free trial | General purpose |
| OpenRouter | Any model | Varies | Multi-model proxy |
| AWS Bedrock | `cohere.embed-v4:0` (default), `amazon.titan-embed-text-v2:0` | Paid (AWS) | Set `EMBEDDING_PROVIDER=bedrock`; creds via AWS chain / SSO; default 1024-dim. See [AWS Bedrock](#aws-bedrock). |

---

Expand Down Expand Up @@ -1153,13 +1154,36 @@ agentmemory auto-detects from your environment. By default, no LLM calls are mad
|----------|--------|-------|
| **No-op (default)** | No config needed | LLM-backed compress/summarize is DISABLED. Synthetic BM25 compression + recall still work. See `AGENTMEMORY_ALLOW_AGENT_SDK` below if you used to rely on the Claude-subscription fallback. |
| Anthropic API | `ANTHROPIC_API_KEY` | Per-token billing |
| AWS Bedrock | `AWS_BEDROCK=true` + `AWS_REGION` | Anthropic models on Bedrock. Opt-in flag, takes precedence when set. Creds from the AWS provider chain — env / IAM role / SSO cache (`AWS_PROFILE`). Default model Claude Haiku 4.5; see [AWS Bedrock](#aws-bedrock) below. |
| MiniMax | `MINIMAX_API_KEY` | Anthropic-compatible |
| Gemini | `GEMINI_API_KEY` | Also enables embeddings |
| OpenRouter | `OPENROUTER_API_KEY` | Any model |
| OpenAI API | `OPENAI_API_KEY` | Default `gpt-4o-mini`, override with `OPENAI_MODEL` |
| **Local (Ollama / LM Studio / vLLM / llama.cpp)** | `OPENAI_API_KEY=local` + `OPENAI_BASE_URL=http://localhost:11434/v1` (Ollama) or `http://localhost:1234/v1` (LM Studio) + `OPENAI_MODEL=<your model>` | Anything OpenAI-API-compatible. Zero cost, runs on your hardware. See [Local models](#local-models-ollama-lm-studio-vllm) below. |
| Claude subscription fallback | `AGENTMEMORY_ALLOW_AGENT_SDK=true` | Opt-in only. Spawns `@anthropic-ai/claude-agent-sdk` sessions — used to cause unbounded Stop-hook recursion (#149 follow-up) so it is no longer the default. |

### AWS Bedrock

Run Anthropic models hosted on AWS Bedrock as the LLM provider. Opt in with `AWS_BEDROCK=true`; when set it takes precedence over the other provider keys.

```bash
AWS_BEDROCK=true
AWS_REGION=us-east-1
AWS_PROFILE=my-sso-profile # optional
AWS_BEDROCK_MODEL=anthropic.claude-haiku-4-5-20251001-v1:0 # optional; this is the default
```

- **Credentials** come from the standard AWS credential provider chain — environment credentials, IAM roles, or an SSO profile cached under `~/.aws/sso/cache/` (select the profile with `AWS_PROFILE`). No static keys are required. To force static keys (e.g. in CI), set **both** `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`.
- **SSO** works out of the box, but agentmemory only *reads* the cached token — it cannot perform the login. Run `aws sso login --profile <name>` first. When the session expires, either re-run it manually or configure the auth-refresh hook (below) to automate re-authentication.
- **Auth-refresh hook** (optional): when a Bedrock call fails with an expired-token error, agentmemory can run a command of your choosing and retry once:
```bash
AWS_AUTH_REFRESH=aws sso login --profile my-sso-profile
AWS_AUTH_REFRESH_TIMEOUT_MS=120000 # optional, default 2 min
```
The command is single-flighted (concurrent calls trigger it once), rate-limited by a short cooldown, and bounded by the timeout. **Security:** only the literal configured string is executed — via `execFile`, no shell, and no model or memory data is ever interpolated into it. Note that `aws sso login` is interactive (opens a browser), so this is best suited to setups where someone can approve the login or where the configured command refreshes credentials non-interactively.
- **Model ID** defaults to Claude Haiku 4.5 (`anthropic.claude-haiku-4-5-20251001-v1:0`) — fast and cost-efficient for background compression. The bare on-demand ID only works in Regions that offer the model on-demand and where model access is enabled in the Bedrock console. In other Regions, set `AWS_BEDROCK_MODEL` to the geo-prefixed cross-region inference profile, e.g. `us.anthropic.claude-haiku-4-5-20251001-v1:0` (or `eu.…`).
- **Embeddings on Bedrock** (separate from the LLM): set `EMBEDDING_PROVIDER=bedrock` to use Cohere / Titan embeddings via the same AWS credentials. It is *not* auto-enabled by `AWS_BEDROCK=true` — so you can run the Bedrock LLM with local (or any other) embeddings. Defaults to `cohere.embed-v4:0` at 1024 dims; override with `AWS_BEDROCK_EMBEDDING_MODEL` / `AWS_BEDROCK_EMBEDDING_DIMENSIONS`. As with the LLM, some embedding models aren't available on-demand in every Region — `cohere.embed-v4:0` is inference-profile-only in several Regions, so set the geo-prefixed ID there, e.g. `AWS_BEDROCK_EMBEDDING_MODEL=us.cohere.embed-v4:0` (Titan v2 works on-demand with the bare ID). The dimension is baked into the vector index, so changing it later means re-embedding stored memories.

### Local models (Ollama / LM Studio / vLLM)

agentmemory talks to any OpenAI-API-compatible server, so anything that exposes `/v1/chat/completions` works without code changes. No paid keys, no cloud, no rate limits — runs entirely on your hardware.
Expand Down
3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
},
"scripts": {
"build": "tsdown && (cp iii-config.yaml dist/ 2>/dev/null || true) && (cp iii-config.docker.yaml dist/ 2>/dev/null || true) && (cp docker-compose.yml dist/ 2>/dev/null || true) && (cp .env.example dist/ 2>/dev/null || true) && mkdir -p dist/viewer && cp src/viewer/index.html dist/viewer/ && cp src/viewer/favicon.svg dist/viewer/",
"prepare": "npm run build",
"dev": "tsx src/index.ts",
"start": "node dist/cli.mjs",
"migrate": "node dist/functions/migrate.js",
Expand Down Expand Up @@ -58,8 +59,10 @@
"url": "https://github.com/rohitg00/agentmemory"
},
"dependencies": {
"@anthropic-ai/bedrock-sdk": "^0.29.2",
"@anthropic-ai/claude-agent-sdk": "^0.3.142",
"@anthropic-ai/sdk": "^0.93.0",
"@aws-sdk/client-bedrock-runtime": "^3.1057.0",
"@clack/prompts": "^1.2.0",
"dotenv": "^17.4.2",
"iii-sdk": "0.11.2",
Expand Down
41 changes: 40 additions & 1 deletion src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,46 @@ function hasRealValue(v: string | undefined): v is string {
return typeof v === "string" && v.trim().length > 0;
}

function detectProvider(env: Record<string, string>): ProviderConfig {
/** Prevents AWS_BEDROCK=True / TRUE from silently disabling Bedrock. */
function isEnvTrue(v: string | undefined): boolean {
return typeof v === "string" && v.trim().toLowerCase() === "true";
}

/** Shared so detectProvider and isBedrockUsable gate on the same opt-in values. */
function isBedrockOptIn(env: Record<string, string>): boolean {
return isEnvTrue(env["AWS_BEDROCK"]);
}

/** A region is required to construct the client, so capability detection never reports an unbuildable config. */
function isBedrockUsable(env: Record<string, string>): boolean {
return isBedrockOptIn(env) && hasRealValue(env["AWS_REGION"]);
}

export function detectProvider(env: Record<string, string>): ProviderConfig {
const maxTokens = parseInt(env["MAX_TOKENS"] || "4096", 10);

// AWS Bedrock: explicit opt-in via AWS_BEDROCK=true. Placed first so a machine
// with both Ollama and Bedrock configured prefers Bedrock when opted in; the
// strict flag gate means it never fires for existing OpenAI/Ollama users.
// Credentials come from the AWS provider chain (env / IAM role / SSO cache),
// so we do NOT key detection on credential env vars — only the flag + region.
// Region is mandatory: without it Bedrock cannot be constructed, so we reject
// here and fall through rather than returning an unusable bedrock config.
if (isBedrockOptIn(env)) {
if (isBedrockUsable(env)) {
return {
provider: "bedrock",
model: env["AWS_BEDROCK_MODEL"] || "anthropic.claude-haiku-4-5-20251001-v1:0",
maxTokens,
};
}
process.stderr.write(
"[agentmemory] AWS_BEDROCK=true but AWS_REGION is unset — ignoring Bedrock " +
"and falling through to the next provider. Set AWS_REGION in " +
"~/.agentmemory/.env to enable Bedrock.\n",
);
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

// OpenAI-compatible: supports OpenAI, DeepSeek, SiliconFlow, Azure, vLLM, LM Studio
if (hasRealValue(env["OPENAI_API_KEY"]) && env["OPENAI_API_KEY_FOR_LLM"] !== "false") {
return {
Expand Down Expand Up @@ -191,6 +228,7 @@ export function isDropStaleIndexEnabled(): boolean {
export function detectLlmProviderKind(): "llm" | "noop" {
const env = getMergedEnv();
if (
isBedrockUsable(env) ||
hasRealValue(env["ANTHROPIC_API_KEY"]) ||
hasRealValue(env["GEMINI_API_KEY"]) ||
hasRealValue(env["GOOGLE_API_KEY"]) ||
Expand Down Expand Up @@ -389,6 +427,7 @@ export function getStandalonePersistPath(): string {

const VALID_PROVIDERS = new Set([
"anthropic",
"bedrock",
"gemini",
"openrouter",
"agent-sdk",
Expand Down
23 changes: 19 additions & 4 deletions src/functions/compress-file.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import type { ISdk } from "iii-sdk";
import type { MemoryProvider } from "../types.js";
import type { StateKV } from "../state/kv.js";
import { recordAudit } from "./audit.js";
import { logger } from "../logger.js";

const SENSITIVE_PATH_TERMS = [
"secret",
Expand Down Expand Up @@ -133,10 +134,24 @@ export function registerCompressFileFunction(
return { success: true, skipped: true, reason: "file is empty" };
}

const response = await provider.summarize(
COMPRESS_FILE_SYSTEM_PROMPT,
`Compress this markdown file while preserving structure and code blocks:\n\n${original}`,
);
let response: string;
try {
response = await provider.summarize(
COMPRESS_FILE_SYSTEM_PROMPT,
`Compress this markdown file while preserving structure and code blocks:\n\n${original}`,
);
} catch (err) {
// Surface the provider's message as a structured error. Without this the
// throw escapes the function and the engine serializes it as the opaque
// "[object Object]", hiding actionable hints (e.g. the Bedrock provider's
// model-access / inference-profile guidance).
const msg = err instanceof Error ? err.message : String(err);
logger.error("compress-file provider call failed", {
filePath: absolutePath,
error: msg,
});
return { success: false, error: msg };
}
const compressed = stripMarkdownFence(response);
const validationErrors = validateCompression(original, compressed);
if (validationErrors.length > 0) {
Expand Down
Loading