microsoft · liqul · Jan 23, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,292 @@
+# AGENTS.md - TaskWeaver Development Guide
+
+This document provides guidance for AI coding agents working on the TaskWeaver codebase.
+
+## Project Overview
+
+TaskWeaver is a **code-first agent framework** for data analytics tasks. It uses Python 3.10+ and follows a modular architecture with dependency injection (using `injector`).
+
+## Build & Development Commands
+
+### Installation
+```bash
+# Use the existing conda environment
+conda activate taskweaver
+
+# Or create a new one
+conda create -n taskweaver python=3.10
+conda activate taskweaver
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Install in editable mode
+pip install -e .
+```
+
+**Note**: The project uses a conda environment named `taskweaver`.
+
+### Running Tests
+```bash
+# Run all unit tests
+pytest tests/unit_tests -v
+
+# Run a single test file
+pytest tests/unit_tests/test_plugin.py -v
+
+# Run a specific test function
+pytest tests/unit_tests/test_plugin.py::test_load_plugin_yaml -v
+
+# Run tests with coverage
+pytest tests/unit_tests -v --cov=taskweaver --cov-report=html
+
+# Collect tests without running (useful for verification)
+pytest tests/unit_tests --collect-only
+```
+
+### Linting & Formatting
+```bash
+# Run pre-commit hooks (autoflake, isort, black, flake8)
+pre-commit run --all-files
+
+# Run individual tools
+black --config=.linters/pyproject.toml .
+isort --settings-path=.linters/pyproject.toml .
+flake8 --config=.linters/tox.ini taskweaver/
+```
+
+### Running the Application
+```bash
+# CLI mode
+python -m taskweaver -p ./project/
+
+# As a module
+python -m taskweaver
+```
+
+## Code Style Guidelines
+
+### Formatting Configuration
+- **Line length**: 120 characters (configured in `.linters/pyproject.toml`)
+- **Formatter**: Black with `--config=.linters/pyproject.toml`
+- **Import sorting**: isort with `profile = "black"`
+
+### Import Organization
+```python
+# Standard library imports first
+import os
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional
+
+# Third-party imports
+from injector import inject
+
+# Local imports (known_first_party = ["taskweaver"])
+from taskweaver.config.config_mgt import AppConfigSource
+from taskweaver.logging import TelemetryLogger
+```
+
+### Type Annotations
+- **Required**: All function parameters and return types must have type hints
+- **Use `Optional[T]`** for nullable types
+- **Use `List`, `Dict`, `Tuple`** from `typing` module
+- **Dataclasses** are preferred for structured data
+
+```python
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional
+
+@dataclass
+class Post:
+    id: str
+    send_from: str
+    send_to: str
+    message: str
+    attachment_list: List[Attachment]
+
+    @staticmethod
+    def create(
+        message: Optional[str],
+        send_from: str,
+        send_to: str = "Unknown",
+    ) -> Post:
+        ...
+```
+
+### Naming Conventions
+- **Classes**: PascalCase (`CodeGenerator`, `PluginRegistry`)
+- **Functions/methods**: snake_case (`compose_prompt`, `get_attachment`)
+- **Variables**: snake_case (`plugin_pool`, `chat_history`)
+- **Constants**: UPPER_SNAKE_CASE (`MAX_RETRY_COUNT`)
+- **Private members**: prefix with underscore (`_configure`, `_get_config_value`)
+- **Config classes**: suffix with `Config` (`PlannerConfig`, `RoleConfig`)
+
+### Dependency Injection Pattern
+TaskWeaver uses the `injector` library for DI. Follow this pattern:
+
+```python
+from injector import inject, Module, provider
+
+class MyConfig(ModuleConfig):
+    def _configure(self) -> None:
+        self._set_name("my_module")
+        self.some_setting = self._get_str("setting_name", "default_value")
+
+class MyService:
+    @inject
+    def __init__(
+        self,
+        config: MyConfig,
+        logger: TelemetryLogger,
+        other_dependency: OtherService,
+    ):
+        self.config = config
+        self.logger = logger
+```
+
+### Error Handling
+- Use specific exception types when possible
+- Log errors with context before re-raising
+- Use assertions for internal invariants
+
+```python
+try:
+    result = self.llm_api.chat_completion_stream(...)
+except (JSONDecodeError, AssertionError) as e:
+    self.logger.error(f"Failed to parse LLM output due to {str(e)}")
+    self.tracing.set_span_status("ERROR", str(e))
+    raise
+```
+
+### Docstrings
+Use triple-quoted docstrings for classes and public methods:
+
+```python
+def get_embeddings(self, strings: List[str]) -> List[List[float]]:
+    """
+    Embedding API
+
+    :param strings: list of strings to be embedded
+    :return: list of embeddings
+    """
+```
+
+### Trailing Commas
+Always use trailing commas in multi-line structures (enforced by `add-trailing-comma`):
+
+```python
+app_injector = Injector(
+    [LoggingModule, PluginModule],  # trailing comma
+)
+
+config = {
+    "key1": "value1",
+    "key2": "value2",  # trailing comma
+}
+```
+
+## Project Structure
+
+```
+taskweaver/
+├── app/              # Application entry points and session management
+├── ces/              # Code execution service
+├── chat/             # Chat interfaces (console, web)
+├── cli/              # CLI implementation
+├── code_interpreter/ # Code generation and interpretation
+├── config/           # Configuration management
+├── ext_role/         # Extended roles (web_search, image_reader, etc.)
+├── llm/              # LLM integrations (OpenAI, Anthropic, etc.)
+├── logging/          # Logging and telemetry
+├── memory/           # Conversation memory and attachments
+├── misc/             # Utilities and component registry
+├── module/           # Core modules (tracing, events)
+├── planner/          # Planning logic
+├── plugin/           # Plugin system
+├── role/             # Role base classes
+├── session/          # Session management
+├── utils/            # Helper utilities
+└── workspace/        # Workspace management
+
+tests/
+└── unit_tests/       # Unit tests (pytest)
+    ├── data/         # Test fixtures (plugins, prompts, examples)
+    └── ces/          # Code execution tests
+```
+
+### Module and Role Overview (what lives where)
+
+- **app/**: Bootstraps dependency injection; wires TaskWeaverApp, SessionManager, config binding.
+- **session/**: Orchestrates Planner + worker roles, memory, workspace management, event emitter, tracing.
+- **planner/**: Planner role; LLM-powered task decomposition and planning logic.
+- **code_interpreter/**: Code generation and execution (full, CLI-only, plugin-only); code verification/AST checks.
+- **memory/**: Conversation history, rounds, posts, attachments, experiences; RoundCompressor utilities.
+- **llm/**: LLM API facades; providers include OpenAI/Azure, Anthropic, Ollama, Google GenAI, Qwen, ZhipuAI, Groq, Azure ML, mock; embeddings via OpenAI/Azure, Ollama, Google GenAI, sentence_transformers, Qwen, ZhipuAI.
+- **plugin/**: Plugin base classes and registry/context for function-style plugins.
+- **role/**: Core role abstractions, RoleRegistry, PostTranslator.
+- **ext_role/**: Extended roles (web_search, web_explorer, image_reader, document_retriever, recepta, echo).
+- **module/**: Core modules like tracing and event_emitter wiring.
+- **logging/**: TelemetryLogger and logging setup.
+- **workspace/**: Session-scoped working directories and execution cwd helpers.
+
+## Testing Patterns
+
+### Using Fixtures
+```python
+import pytest
+from injector import Injector
+
+@pytest.fixture()
+def app_injector(request: pytest.FixtureRequest):
+    from taskweaver.config.config_mgt import AppConfigSource
+    config = {"llm.api_key": "test_key"}
+    app_injector = Injector([LoggingModule, PluginModule])
+    app_config = AppConfigSource(config=config)
+    app_injector.binder.bind(AppConfigSource, to=app_config)
+    return app_injector
+```
+
+### Test Markers
+```python
+@pytest.mark.app_config({"custom.setting": "value"})
+def test_with_custom_config(app_injector):
+    ...
+```
+
+## Flake8 Ignores
+The following are intentionally ignored (see `.linters/tox.ini`):
+- `E402`: Module level import not at top of file
+- `W503`: Line break before binary operator
+- `W504`: Line break after binary operator
+- `E203`: Whitespace before ':'
+- `F401`: Import not used (only in `__init__.py`)
+
+## Key Patterns
+
+### Creating Unique IDs
+```python
+from taskweaver.utils import create_id
+post_id = "post-" + create_id()  # Format: post-YYYYMMDD-HHMMSS-<random>
+```
+
+### Reading/Writing YAML
+```python
+from taskweaver.utils import read_yaml, write_yaml
+data = read_yaml("path/to/file.yaml")
+write_yaml("path/to/file.yaml", data)
+```
+
+### Configuration Access
+```python
+class MyConfig(ModuleConfig):
+    def _configure(self) -> None:
+        self._set_name("my_module")
+        self.enabled = self._get_bool("enabled", False)
+        self.path = self._get_path("base_path", "/default/path")
+        self.model = self._get_str("model", None, required=False)
+```
+
+## CI/CD
+- Tests run on Python 3.11 via GitHub Actions
+- Pre-commit hooks include: autoflake, isort, black, flake8, gitleaks, detect-secrets
+- All PRs to `main` trigger the pytest workflow
diff --git a/docs/design/code-interpreter-vars.md b/docs/design/code-interpreter-vars.md
@@ -0,0 +1,62 @@
+# Code Interpreter Visible Variable Surfacing
+
+## Problem
+The code interpreter generates Python in a persistent kernel but the prompt does not explicitly remind the model which variables already exist in that kernel. This can lead to redundant redefinitions or missed reuse of prior results. We want to surface only the newly defined (non-library) variables to the model in subsequent turns.
+
+## Goals
+- Capture the current user/kernel-visible variables after each execution (excluding standard libs and plugins).
+- Propagate these variables to the code interpreter’s prompt so it can reuse them.
+- Keep noise low: skip modules/functions and internal/builtin names; truncate large reprs.
+- Maintain backward compatibility; do not break existing attachments or execution flow.
+
+## Non-Goals
+- Full introspection of module internals or large data snapshots.
+- Persisting variables across sessions beyond current conversation.
+
+## Design Overview
+1) **Collect kernel variables after execution**
+   - In the IPython magics layer (`_taskweaver_exec_post_check`) call a context helper to extract visible variables from `local_ns`.
+   - Filtering rules:
+     - Skip names starting with `_`.
+     - Skip builtins and common libs: `__builtins__`, `In`, `Out`, `get_ipython`, `exit`, `quit`, `pd`, `np`, `plt`.
+     - Skip modules and any defined functions (only keep data-bearing variables).
+     - For other values, store `(name, repr(value))`, truncated to 500 chars and fall back to `<unrepresentable>` on repr errors.
+   - Store the snapshot on `ExecutorPluginContext.latest_variables`.
+
+2) **Return variables with execution result**
+   - `Executor.get_post_execution_state` now includes `variables` (list of `(name, repr)` tuples).
+   - `Environment._parse_exec_result` copies these into `ExecutionResult.variables` (added to dataclass).
+
+3) **Surface variables to user and prompt**
+   - `CodeExecutor.format_code_output` renders available variables when there is no explicit result/output, using `pretty_repr` to keep lines concise.
+   - `CodeInterpreter.reply` attaches a new `session_variables` attachment (JSON list of tuples) when variables are present.
+   - `CodeGenerator.compose_conversation` ignores this attachment in assistant-message rendering but includes it in feedback via `format_code_feedback`, adding an “Available Variables” section for the model’s context.
+
+4) **Attachment type**
+   - Added `AttachmentType.session_variables` to carry the variable snapshot per execution.
+
+## Open Items / Next Steps
+- Wire the variables directly into the final user turn’s prompt text (e.g., under a “Currently available variables” block) to make reuse even clearer.
+- Revisit filtering to ensure we skip large data/DF previews (could add size/type caps).
+- Validate end-to-end with unit tests for: variable capture, attachment propagation, prompt inclusion, and formatting.
+
+## Files Touched
+- `taskweaver/ces/runtime/context.py` — collect and store visible variables.
+- `taskweaver/ces/runtime/executor.py` — expose variables in post-execution state.
+- `taskweaver/ces/environment.py` — carry variables into `ExecutionResult`.
+- `taskweaver/ces/common.py` — add `variables` to `ExecutionResult` dataclass.
+- `taskweaver/memory/attachment.py` — add `session_variables` attachment type.
+- `taskweaver/code_interpreter/code_interpreter/code_interpreter.py` — attach captured vars to posts.
+- `taskweaver/code_interpreter/code_interpreter/code_generator.py` — ignore var attachments in assistant text; include in feedback.
+- `taskweaver/code_interpreter/code_executor.py` — display available variables when no explicit output.
+- `taskweaver/utils/__init__.py` — add `pretty_repr` helper for safe truncation.
+
+## Rationale
+- Keeps the model aware of live state without inflating prompts with full outputs.
+- Avoids re-importing/recomputing when variables already exist.
+- Uses attachments so downstream consumers (UI/logs) can also show the state.
+
+## Risks / Mitigations
+- **Large values**: truncated repr and filtered types keep prompt size bounded; consider type-based caps later.
+- **Noise from libs**: explicit ignore list for common imports; can expand as needed.
+- **Compatibility**: new attachment type is additive; existing flows remain unchanged.
diff --git a/taskweaver/ces/common.py b/taskweaver/ces/common.py
@@ -68,6 +68,7 @@ class ExecutionResult:
 
     log: List[Tuple[str, str, str]] = dataclasses.field(default_factory=list)
     artifact: List[ExecutionArtifact] = dataclasses.field(default_factory=list)
+    variables: List[Tuple[str, str]] = dataclasses.field(default_factory=list)
 
 
 class Client(ABC):

diff --git a/taskweaver/ces/environment.py b/taskweaver/ces/environment.py
@@ -697,6 +697,8 @@ def _parse_exec_result(
                             preview=artifact_dict["preview"],
                         )
                         result.artifact.append(artifact_item)
+                elif key == "variables":
+                    result.variables = value
                 else:
                     pass
 

diff --git a/taskweaver/ces/kernel/ctx_magic.py b/taskweaver/ces/kernel/ctx_magic.py
@@ -58,6 +58,7 @@ def _taskweaver_exec_pre_check(self, line: str):
     def _taskweaver_exec_post_check(self, line: str, local_ns: Dict[str, Any]):
         if "_" in local_ns:
             self.executor.ctx.set_output(local_ns["_"])
+        self.executor.ctx.extract_visible_variables(local_ns)
         return fmt_response(True, "", self.executor.get_post_execution_state())
 
     @cell_magic