Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
<!--
AGENTS.md — Instructions for AI coding assistants (Claude, Cursor, Copilot, Codex, Roo, etc.)
-->

# Agent Guidelines for Mellea Contributors

> **Which guide?** Modifying `mellea/`, `cli/`, or `test/` → this file. Writing code that imports Mellea → [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md).

## 1. Quick Reference
```bash
pre-commit install # Required: install git hooks
uv sync --all-extras --all-groups # Install all deps (required for tests)
ollama serve # Start Ollama (required for most tests)
uv run pytest -m "not qualitative" # Skips LLM quality tests (~2 min)
uv run pytest # Full suite (includes LLM quality tests)
uv run ruff format . && uv run ruff check . # Lint & format
```
**Branches**: `feat/topic`, `fix/issue-id`, `docs/topic`

## 2. Directory Structure
| Path | Contents |
|------|----------|
| `mellea/stdlib` | Core: Sessions, Genslots, Requirements, Sampling, Context |
| `mellea/backends` | Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM |
| `mellea/helpers` | Utilities, logging, model ID tables |
| `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) |
| `test/` | All tests (run from repo root) |
| `scratchpad/` | Experiments (git-ignored) |

## 3. Test Markers
- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI via `CICD=1`)
- **Unmarked** — Unit tests (may still require Ollama running locally)

⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast.

## 4. Coding Standards
- **Types required** on all core functions
- **Docstrings are prompts** — be specific, the LLM reads them
- **Google-style docstrings**
- **Ruff** for linting/formatting
- Use `...` in `@generative` function bodies
- Prefer primitives over classes

## 5. Commits & Hooks
[Angular format](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit): `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `release:`

Pre-commit runs: ruff, mypy, uv-lock, codespell

## 6. Timing
> **Don't cancel**: `pytest` (full) and `pre-commit --all-files` may take minutes. Canceling mid-run can corrupt state.

## 7. Common Issues
| Problem | Fix |
|---------|-----|
| `ComponentParseError` | Add examples to docstring |
| `uv.lock` out of sync | Run `uv sync` |
| Ollama refused | Run `ollama serve` |

## 8. Self-Review (before notifying user)
1. `uv run pytest -m "not qualitative"` passes?
2. `ruff format` and `ruff check` clean?
3. New functions typed with concise docstrings?
4. Unit tests added for new functionality?
5. Avoided over-engineering?

## 9. Writing Tests
- Place tests in `test/` mirroring source structure
- Name files `test_*.py` (required for pydocstyle)
- Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`)
- Mark tests checking LLM output quality with `@pytest.mark.qualitative`
- If a test fails, fix the **code**, not the test (unless the test was wrong)

## 10. Feedback Loop
Found a bug, workaround, or pattern? Update the docs:
- **Issue/workaround?** → Add to Section 7 (Common Issues) in this file
- **Usage pattern?** → Add to [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md)
- **New pitfall?** → Add warning near relevant section
183 changes: 183 additions & 0 deletions docs/AGENTS_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
<!--
AGENTS_TEMPLATE.md — Copy into your project's AGENTS.md to teach AI assistants Mellea patterns.
-->

# Mellea Usage Guidelines

> **This file**: For code that *imports* Mellea. For Mellea internals, see [`../AGENTS.md`](../AGENTS.md).

Copy below into your `AGENTS.md` or system prompt.

---

### Library: Mellea
Use `mellea` for LLM interactions. No direct OpenAI/Anthropic calls or LangChain OutputParsers.

**Prerequisites**: `pip install mellea` · [Docs](https://mellea.ai) · [Repo](https://github.com/generative-computing/mellea)

#### 1. The `@generative` Pattern
**Don't** write prompt templates or regex parsers:
```python
# BAD - don't do this
response = openai.chat.completions.create(...)
age = int(re.search(r"\d+", response).group())
```
**Do** use typed function signatures:
```python
from mellea import generative, start_session

@generative
def extract_age(text: str) -> int:
"""Extract the user's age from text."""
...

m = start_session()
age = extract_age(m, text="Alice is 30") # Returns int(30)
```

#### 2. Complex Types
```python
from pydantic import BaseModel
from mellea import generative

class UserProfile(BaseModel):
name: str
age: int
interests: list[str]

@generative
def parse_profile(bio: str) -> UserProfile: ...
```

#### 3. Chain-of-Thought
Add `reasoning` field to force the LLM to "think" before answering:
```python
from typing import Literal
from pydantic import BaseModel, Field

class AnalysisResult(BaseModel):
reasoning: str # LLM fills first
conclusion: Literal["approve", "reject"]
confidence: float = Field(ge=0.0, le=1.0)

@generative
def analyze_document(doc: str) -> AnalysisResult: ...
```

#### 4. Control Flow
Use Python `if/for/while`. No graph frameworks needed:
```python
if analyze_sentiment(m, email) == "negative":
draft = draft_apology(m, email)
else:
draft = draft_response(m, email)
```

#### 5. Instruct-Validate-Repair
For strict requirements, use `m.instruct()`:
```python
from mellea.stdlib.requirements import req, simple_validate
from mellea.stdlib.sampling import RejectionSamplingStrategy

email = m.instruct(
"Write an invite for {{name}}",
requirements=[
req("Must be formal"),
req("Lowercase only", validation_fn=simple_validate(lambda x: x.islower()))
],
strategy=RejectionSamplingStrategy(loop_budget=3),
user_variables={"name": "Alice"}
)
```

#### 6. Small Model Fix
Small models (1B-8B) can't calculate. Extract params with LLM, compute in Python:
```python
from pydantic import BaseModel

class PhysicsParams(BaseModel):
speed_a: float
speed_b: float
delay_hours: float

@generative
def extract_params(text: str) -> PhysicsParams:
"""EXTRACT numbers only. Do not calculate."""
...

def calculate_gap(p: PhysicsParams) -> float:
return p.speed_a * p.delay_hours
```

#### 7. One-Shot Examples
If model struggles, add examples to docstring:
```python
@generative
def identify_fruit(text: str) -> str | None:
"""
Extract fruit from text, or None if none mentioned.
Ex: "I ate an apple" -> "apple"
Ex: "The sky is blue" -> None
"""
...
```

#### 8. Backend Config
```python
from mellea import start_session
from mellea.backends.model_options import ModelOption

m = start_session(
model_id="granite3.3:8b",
model_options={ModelOption.TEMPERATURE: 0.0, ModelOption.MAX_NEW_TOKENS: 500}
)
```
Options: `TEMPERATURE`, `MAX_NEW_TOKENS`, `SYSTEM_PROMPT`, `SEED`, `TOOLS`, `CONTEXT_WINDOW`, `THINKING`, `STREAM`

#### 9. Async
```python
@generative
async def extract_age(text: str) -> int:
"""Extract age."""
...

result = await extract_age(m, text="Alice is 30")
```
Session methods: `ainstruct`, `achat`, `aact`, `avalidate`, `aquery`, `atransform`

#### 10. Auth
- **Ollama**: `start_session()` (no setup)
- **OpenAI**: `export OPENAI_API_KEY="..."`
- **Watsonx**: `export WATSONX_API_KEY="..."`, `WATSONX_URL`, `WATSONX_PROJECT_ID`

**Never hardcode API keys.**

#### 11. Anti-Patterns
- **Don't** retry `@generative` calls — Mellea handles retries internally
- **Don't** use `json.loads()` — use typed returns
- **Don't** wrap single functions in classes
- **Do** use `try/except` at app boundaries for network errors

#### 12. Debugging
```python
from mellea.core import FancyLogger
FancyLogger.get_logger().setLevel("DEBUG")
```
- `m.last_prompt()` — see exact prompt sent

#### 13. Common Errors
| Error | Fix |
|-------|-----|
| `ComponentParseError` | LLM output didn't match type—add docstring examples |
| `TypeError: missing positional argument` | First arg must be session `m` |
| `ConnectionRefusedError` | Run `ollama serve` |
| Output wrong/None | Model too small—try larger or add `reasoning` field |

#### 14. Testing
```bash
uv run pytest -m "not qualitative" # Fast loop
uv run pytest # Full (verify prompts work)
```

#### 15. Feedback
Found a workaround or pattern? Add it to Section 13 (Common Errors) above, or update this file with new guidance.