Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
292 changes: 292 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
# AGENTS.md - TaskWeaver Development Guide

This document provides guidance for AI coding agents working on the TaskWeaver codebase.

## Project Overview

TaskWeaver is a **code-first agent framework** for data analytics tasks. It uses Python 3.10+ and follows a modular architecture with dependency injection (using `injector`).

## Build & Development Commands

### Installation
```bash
# Use the existing conda environment
conda activate taskweaver

# Or create a new one
conda create -n taskweaver python=3.10
conda activate taskweaver

# Install dependencies
pip install -r requirements.txt

# Install in editable mode
pip install -e .
```

**Note**: The project uses a conda environment named `taskweaver`.

### Running Tests
```bash
# Run all unit tests
pytest tests/unit_tests -v

# Run a single test file
pytest tests/unit_tests/test_plugin.py -v

# Run a specific test function
pytest tests/unit_tests/test_plugin.py::test_load_plugin_yaml -v

# Run tests with coverage
pytest tests/unit_tests -v --cov=taskweaver --cov-report=html

# Collect tests without running (useful for verification)
pytest tests/unit_tests --collect-only
```

### Linting & Formatting
```bash
# Run pre-commit hooks (autoflake, isort, black, flake8)
pre-commit run --all-files

# Run individual tools
black --config=.linters/pyproject.toml .
isort --settings-path=.linters/pyproject.toml .
flake8 --config=.linters/tox.ini taskweaver/
```

### Running the Application
```bash
# CLI mode
python -m taskweaver -p ./project/

# As a module
python -m taskweaver
```

## Code Style Guidelines

### Formatting Configuration
- **Line length**: 120 characters (configured in `.linters/pyproject.toml`)
- **Formatter**: Black with `--config=.linters/pyproject.toml`
- **Import sorting**: isort with `profile = "black"`

### Import Organization
```python
# Standard library imports first
import os
from dataclasses import dataclass
from typing import Any, Dict, List, Optional

# Third-party imports
from injector import inject

# Local imports (known_first_party = ["taskweaver"])
from taskweaver.config.config_mgt import AppConfigSource
from taskweaver.logging import TelemetryLogger
```

### Type Annotations
- **Required**: All function parameters and return types must have type hints
- **Use `Optional[T]`** for nullable types
- **Use `List`, `Dict`, `Tuple`** from `typing` module
- **Dataclasses** are preferred for structured data

```python
from dataclasses import dataclass
from typing import Any, Dict, List, Optional

@dataclass
class Post:
id: str
send_from: str
send_to: str
message: str
attachment_list: List[Attachment]

@staticmethod
def create(
message: Optional[str],
send_from: str,
send_to: str = "Unknown",
) -> Post:
...
```

### Naming Conventions
- **Classes**: PascalCase (`CodeGenerator`, `PluginRegistry`)
- **Functions/methods**: snake_case (`compose_prompt`, `get_attachment`)
- **Variables**: snake_case (`plugin_pool`, `chat_history`)
- **Constants**: UPPER_SNAKE_CASE (`MAX_RETRY_COUNT`)
- **Private members**: prefix with underscore (`_configure`, `_get_config_value`)
- **Config classes**: suffix with `Config` (`PlannerConfig`, `RoleConfig`)

### Dependency Injection Pattern
TaskWeaver uses the `injector` library for DI. Follow this pattern:

```python
from injector import inject, Module, provider

class MyConfig(ModuleConfig):
def _configure(self) -> None:
self._set_name("my_module")
self.some_setting = self._get_str("setting_name", "default_value")

class MyService:
@inject
def __init__(
self,
config: MyConfig,
logger: TelemetryLogger,
other_dependency: OtherService,
):
self.config = config
self.logger = logger
```

### Error Handling
- Use specific exception types when possible
- Log errors with context before re-raising
- Use assertions for internal invariants

```python
try:
result = self.llm_api.chat_completion_stream(...)
except (JSONDecodeError, AssertionError) as e:
self.logger.error(f"Failed to parse LLM output due to {str(e)}")
self.tracing.set_span_status("ERROR", str(e))
raise
```

### Docstrings
Use triple-quoted docstrings for classes and public methods:

```python
def get_embeddings(self, strings: List[str]) -> List[List[float]]:
"""
Embedding API

:param strings: list of strings to be embedded
:return: list of embeddings
"""
```

### Trailing Commas
Always use trailing commas in multi-line structures (enforced by `add-trailing-comma`):

```python
app_injector = Injector(
[LoggingModule, PluginModule], # trailing comma
)

config = {
"key1": "value1",
"key2": "value2", # trailing comma
}
```

## Project Structure

```
taskweaver/
├── app/ # Application entry points and session management
├── ces/ # Code execution service
├── chat/ # Chat interfaces (console, web)
├── cli/ # CLI implementation
├── code_interpreter/ # Code generation and interpretation
├── config/ # Configuration management
├── ext_role/ # Extended roles (web_search, image_reader, etc.)
├── llm/ # LLM integrations (OpenAI, Anthropic, etc.)
├── logging/ # Logging and telemetry
├── memory/ # Conversation memory and attachments
├── misc/ # Utilities and component registry
├── module/ # Core modules (tracing, events)
├── planner/ # Planning logic
├── plugin/ # Plugin system
├── role/ # Role base classes
├── session/ # Session management
├── utils/ # Helper utilities
└── workspace/ # Workspace management

tests/
└── unit_tests/ # Unit tests (pytest)
├── data/ # Test fixtures (plugins, prompts, examples)
└── ces/ # Code execution tests
```

### Module and Role Overview (what lives where)

- **app/**: Bootstraps dependency injection; wires TaskWeaverApp, SessionManager, config binding.
- **session/**: Orchestrates Planner + worker roles, memory, workspace management, event emitter, tracing.
- **planner/**: Planner role; LLM-powered task decomposition and planning logic.
- **code_interpreter/**: Code generation and execution (full, CLI-only, plugin-only); code verification/AST checks.
- **memory/**: Conversation history, rounds, posts, attachments, experiences; RoundCompressor utilities.
- **llm/**: LLM API facades; providers include OpenAI/Azure, Anthropic, Ollama, Google GenAI, Qwen, ZhipuAI, Groq, Azure ML, mock; embeddings via OpenAI/Azure, Ollama, Google GenAI, sentence_transformers, Qwen, ZhipuAI.
- **plugin/**: Plugin base classes and registry/context for function-style plugins.
- **role/**: Core role abstractions, RoleRegistry, PostTranslator.
- **ext_role/**: Extended roles (web_search, web_explorer, image_reader, document_retriever, recepta, echo).
- **module/**: Core modules like tracing and event_emitter wiring.
- **logging/**: TelemetryLogger and logging setup.
- **workspace/**: Session-scoped working directories and execution cwd helpers.

## Testing Patterns

### Using Fixtures
```python
import pytest
from injector import Injector

@pytest.fixture()
def app_injector(request: pytest.FixtureRequest):
from taskweaver.config.config_mgt import AppConfigSource
config = {"llm.api_key": "test_key"}
app_injector = Injector([LoggingModule, PluginModule])
app_config = AppConfigSource(config=config)
app_injector.binder.bind(AppConfigSource, to=app_config)
return app_injector
```

### Test Markers
```python
@pytest.mark.app_config({"custom.setting": "value"})
def test_with_custom_config(app_injector):
...
```

## Flake8 Ignores
The following are intentionally ignored (see `.linters/tox.ini`):
- `E402`: Module level import not at top of file
- `W503`: Line break before binary operator
- `W504`: Line break after binary operator
- `E203`: Whitespace before ':'
- `F401`: Import not used (only in `__init__.py`)

## Key Patterns

### Creating Unique IDs
```python
from taskweaver.utils import create_id
post_id = "post-" + create_id() # Format: post-YYYYMMDD-HHMMSS-<random>
```

### Reading/Writing YAML
```python
from taskweaver.utils import read_yaml, write_yaml
data = read_yaml("path/to/file.yaml")
write_yaml("path/to/file.yaml", data)
```

### Configuration Access
```python
class MyConfig(ModuleConfig):
def _configure(self) -> None:
self._set_name("my_module")
self.enabled = self._get_bool("enabled", False)
self.path = self._get_path("base_path", "/default/path")
self.model = self._get_str("model", None, required=False)
```

## CI/CD
- Tests run on Python 3.11 via GitHub Actions
- Pre-commit hooks include: autoflake, isort, black, flake8, gitleaks, detect-secrets
- All PRs to `main` trigger the pytest workflow
62 changes: 62 additions & 0 deletions docs/design/code-interpreter-vars.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Code Interpreter Visible Variable Surfacing

## Problem
The code interpreter generates Python in a persistent kernel but the prompt does not explicitly remind the model which variables already exist in that kernel. This can lead to redundant redefinitions or missed reuse of prior results. We want to surface only the newly defined (non-library) variables to the model in subsequent turns.

## Goals
- Capture the current user/kernel-visible variables after each execution (excluding standard libs and plugins).
- Propagate these variables to the code interpreter’s prompt so it can reuse them.
- Keep noise low: skip modules/functions and internal/builtin names; truncate large reprs.
- Maintain backward compatibility; do not break existing attachments or execution flow.

## Non-Goals
- Full introspection of module internals or large data snapshots.
- Persisting variables across sessions beyond current conversation.

## Design Overview
1) **Collect kernel variables after execution**
- In the IPython magics layer (`_taskweaver_exec_post_check`) call a context helper to extract visible variables from `local_ns`.
- Filtering rules:
- Skip names starting with `_`.
- Skip builtins and common libs: `__builtins__`, `In`, `Out`, `get_ipython`, `exit`, `quit`, `pd`, `np`, `plt`.
- Skip modules and any defined functions (only keep data-bearing variables).
- For other values, store `(name, repr(value))`, truncated to 500 chars and fall back to `<unrepresentable>` on repr errors.
- Store the snapshot on `ExecutorPluginContext.latest_variables`.

2) **Return variables with execution result**
- `Executor.get_post_execution_state` now includes `variables` (list of `(name, repr)` tuples).
- `Environment._parse_exec_result` copies these into `ExecutionResult.variables` (added to dataclass).

3) **Surface variables to user and prompt**
- `CodeExecutor.format_code_output` renders available variables when there is no explicit result/output, using `pretty_repr` to keep lines concise.
- `CodeInterpreter.reply` attaches a new `session_variables` attachment (JSON list of tuples) when variables are present.
- `CodeGenerator.compose_conversation` ignores this attachment in assistant-message rendering but includes it in feedback via `format_code_feedback`, adding an “Available Variables” section for the model’s context.

4) **Attachment type**
- Added `AttachmentType.session_variables` to carry the variable snapshot per execution.

## Open Items / Next Steps
- Wire the variables directly into the final user turn’s prompt text (e.g., under a “Currently available variables” block) to make reuse even clearer.
- Revisit filtering to ensure we skip large data/DF previews (could add size/type caps).
- Validate end-to-end with unit tests for: variable capture, attachment propagation, prompt inclusion, and formatting.

## Files Touched
- `taskweaver/ces/runtime/context.py` — collect and store visible variables.
- `taskweaver/ces/runtime/executor.py` — expose variables in post-execution state.
- `taskweaver/ces/environment.py` — carry variables into `ExecutionResult`.
- `taskweaver/ces/common.py` — add `variables` to `ExecutionResult` dataclass.
- `taskweaver/memory/attachment.py` — add `session_variables` attachment type.
- `taskweaver/code_interpreter/code_interpreter/code_interpreter.py` — attach captured vars to posts.
- `taskweaver/code_interpreter/code_interpreter/code_generator.py` — ignore var attachments in assistant text; include in feedback.
- `taskweaver/code_interpreter/code_executor.py` — display available variables when no explicit output.
- `taskweaver/utils/__init__.py` — add `pretty_repr` helper for safe truncation.

## Rationale
- Keeps the model aware of live state without inflating prompts with full outputs.
- Avoids re-importing/recomputing when variables already exist.
- Uses attachments so downstream consumers (UI/logs) can also show the state.

## Risks / Mitigations
- **Large values**: truncated repr and filtered types keep prompt size bounded; consider type-based caps later.
- **Noise from libs**: explicit ignore list for common imports; can expand as needed.
- **Compatibility**: new attachment type is additive; existing flows remain unchanged.
1 change: 1 addition & 0 deletions taskweaver/ces/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ class ExecutionResult:

log: List[Tuple[str, str, str]] = dataclasses.field(default_factory=list)
artifact: List[ExecutionArtifact] = dataclasses.field(default_factory=list)
variables: List[Tuple[str, str]] = dataclasses.field(default_factory=list)


class Client(ABC):
Expand Down
2 changes: 2 additions & 0 deletions taskweaver/ces/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -697,6 +697,8 @@ def _parse_exec_result(
preview=artifact_dict["preview"],
)
result.artifact.append(artifact_item)
elif key == "variables":
result.variables = value
else:
pass

Expand Down
1 change: 1 addition & 0 deletions taskweaver/ces/kernel/ctx_magic.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def _taskweaver_exec_pre_check(self, line: str):
def _taskweaver_exec_post_check(self, line: str, local_ns: Dict[str, Any]):
if "_" in local_ns:
self.executor.ctx.set_output(local_ns["_"])
self.executor.ctx.extract_visible_variables(local_ns)
return fmt_response(True, "", self.executor.get_post_execution_state())

@cell_magic
Expand Down
Loading