Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 31 additions & 23 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,24 +29,26 @@ quantmind/
Key principle: QuantMind does NOT rebuild Agent runtime, lifecycle hooks, tracing,
multi-agent handoff, or tool framework. Those come from `openai-agents`.

## Current Repository State (transitional, after PR #70 / #72)

Surviving modules — these still work but will be replaced or migrated in PR3-PR6:

| Module | Status | Replacement |
|--------|--------|-------------|
| `quantmind/flow/` | active | `flows/` in PR5 |
| `quantmind/parsers/` | active | `preprocess/format/` in PR4 |
| `quantmind/sources/` | active | `preprocess/fetch/` in PR4 |
| `quantmind/config/` | active | `configs/` in PR3 |
| `quantmind/llm/` | active | deleted in PR5 (use SDK + `openai` directly) |
| `quantmind/models/{content,paper,analysis}.py` | active | move to `knowledge/` in PR3 |
| `quantmind/utils/logger.py` | active | permanent |
| `quantmind/utils/tmp.py` | active | deleted in PR3-4 alongside the templating refactor |

These transitional modules are excluded from `basedpyright` (see `pyproject.toml`)
to keep the harness green during migration; new modules (`knowledge/`, `configs/`,
`preprocess/`, `flows/`, `mind/`, `magic.py`) are auto-included at standard mode.
## Current Repository State (transitional, after PR #70 / #73 / PR3)

| Module | Status | Notes |
|--------|--------|-------|
| `quantmind/knowledge/` | landed (PR3) | data standard with three shapes: `FlattenKnowledge` (`News` / `Earnings` / `PaperKnowledgeCard`), `TreeKnowledge` (`Paper`), `GraphKnowledge` (placeholder); shared base = `BaseKnowledge` with typed `SourceRef` / `ExtractionRef` provenance + `embedding_text()` contract |
| `quantmind/configs/` | landed (PR3) | `BaseFlowCfg` / `BaseInput` + per-flow cfg + discriminated-union input types |
| `quantmind/utils/logger.py` | permanent | only general-purpose utility |
| `quantmind/flow/` | transitional | replaced by `flows/` in PR5 |
| `quantmind/parsers/` | transitional | replaced by `preprocess/format/` in PR4 |
| `quantmind/sources/` | transitional | replaced by `preprocess/fetch/` in PR4 |
| `quantmind/config/` | transitional | superseded by `quantmind/configs/` (PR3); deletion when consumers (`flow/`, `llm/`) migrate in PR5 |
| `quantmind/llm/` | transitional | deleted in PR5 (use SDK + `openai` directly) |
| `quantmind/models/{content,paper,analysis}.py` | transitional | superseded by `quantmind/knowledge/` (PR3); deletion when consumers (`parsers/`, `sources/`, `flow/`) migrate in PR4-PR5 |
| `quantmind/utils/tmp.py` | transitional | deleted in PR3-4 alongside the templating refactor |

Transitional modules are excluded from `basedpyright` (see `pyproject.toml`)
to keep the harness green during migration. New modules (`knowledge/`,
`configs/`, `preprocess/`, `flows/`, `mind/`, `magic.py`) are auto-included
at standard mode and gated by additional `import-linter` contracts so they
cannot accidentally pull in a transitional module.

## Development Commands

Expand Down Expand Up @@ -107,8 +109,14 @@ issue instead.

## Conventions When Editing

- **Schemas**: Pydantic, `extra="forbid"`, `frozen=True`. All `KnowledgeItem`
- **Schemas**: Pydantic, `extra="forbid"`, `frozen=True`. All `BaseKnowledge`
subclasses must require `as_of: datetime` (financial time-sensitivity is mandatory)
and provide a typed `source: SourceRef` (no bare strings). Subclasses MUST
override `embedding_text()` so the store layer knows what to embed.
- **Knowledge shapes**: pick one of `FlattenKnowledge` (atomic card),
`TreeKnowledge` (hierarchical artifact), or wait for `GraphKnowledge`
(placeholder). Whole-document objects are `TreeKnowledge` even when a
flatten card exists alongside (e.g. `Paper` vs `PaperKnowledgeCard`).
- **Configs**: Extend `BaseFlowCfg` (lands in PR2); never use `Dict[str, Any]` in
init signatures
- **Tools**: SDK's `@function_tool` decorator; do NOT subclass anything
Expand Down Expand Up @@ -156,9 +164,9 @@ issue instead.
| PR | Focus |
|----|-------|
| #70 (merged) | Clean removal of self-built agent runtime |
| #72 (this PR) | Golden Harness — `scripts/verify.sh` with ruff + basedpyright + import-linter + pytest --cov, plus matching CI |
| PR3 | `knowledge/` + `configs/` skeleton |
| PR4 | `preprocess/` (fetch + format two layers) |
| PR5 | `flows/` + `paper_flow` + `batch_run` + `magic.py`; drop old `flow/` `llm/` |
| #73 (merged) | Golden Harness — `scripts/verify.sh` with ruff + basedpyright + import-linter + pytest --cov, plus matching CI |
| PR3 (this PR) | `knowledge/` data standard (Flatten / Tree / Graph shapes) + `configs/` skeleton; `openai-agents>=0.14` introduced for `BaseFlowCfg.model_settings`. Storage layer (`mind/store/` + SQLite + `sqlite-vec`) lands separately. |
| PR4 | `preprocess/` (fetch + format two layers); migrate `parsers/` + `sources/`; delete `quantmind/models/{content,paper,analysis}.py` |
| PR5 | `flows/` + `paper_flow` + `batch_run` + `magic.py`; delete `quantmind/flow/`, `quantmind/llm/`, `quantmind/config/` |
| PR6 | `mind/memory/filesystem` MVP + trajectory archive |
| PR7+ | Second flow (news/earnings) / observability cookbook / longer-term modules |
37 changes: 37 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies = [
"pyyaml",
"numpy>=2.2.4",
"openai>=1.68.2",
"openai-agents>=0.14",
"pillow>=10.1.0,<11.0.0",
"llama-cloud-services>=0.6.12",
"ipykernel>=6.29.5",
Expand Down Expand Up @@ -121,6 +122,12 @@ pythonVersion = "3.10"
typeCheckingMode = "standard"
reportMissingImports = "error"
reportMissingTypeStubs = "none"
# Pydantic v2 discriminated unions tighten a `str` field to `Literal["..."]`
# in subclasses (e.g. `Paper.item_type: Literal["paper"]`). Python's variance
# rules say mutable attributes must be invariant, so basedpyright flags this
# even though Pydantic supports it. We use this idiom throughout knowledge/
# and configs/, so disable the diagnostic globally.
reportIncompatibleVariableOverride = "none"

# ----------------------------------------------------------------------------
# import-linter: architectural boundary contracts
Expand All @@ -137,6 +144,36 @@ root_packages = ["quantmind"]
name = "utils is a leaf (no inbound deps from quantmind packages)"
type = "forbidden"
source_modules = ["quantmind.utils"]
forbidden_modules = [
"quantmind.config",
"quantmind.configs",
"quantmind.flow",
"quantmind.knowledge",
"quantmind.llm",
"quantmind.models",
"quantmind.parsers",
"quantmind.sources",
]

[[tool.importlinter.contracts]]
name = "knowledge is a leaf (no inbound deps from quantmind packages)"
type = "forbidden"
source_modules = ["quantmind.knowledge"]
forbidden_modules = [
"quantmind.config",
"quantmind.configs",
"quantmind.flow",
"quantmind.llm",
"quantmind.models",
"quantmind.parsers",
"quantmind.sources",
"quantmind.utils",
]

[[tool.importlinter.contracts]]
name = "configs only depends on knowledge (transitional modules forbidden)"
type = "forbidden"
source_modules = ["quantmind.configs"]
forbidden_modules = [
"quantmind.config",
"quantmind.flow",
Expand Down
12 changes: 10 additions & 2 deletions quantmind/config/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,16 @@ def auto_discover_flows(self, search_paths: list[Path]) -> None:

def _discover_flows_in_path(self, path: Path) -> None:
"""Discover flows in a specific path."""
# Look for flow.py files in subdirectories
for flow_file in path.rglob("flow.py"):
# Look for flow.py files in subdirectories.
# `rglob` on macOS tmpdirs can hit AppTranslocation paths that raise
# OSError mid-iteration; swallow scan errors so registry probing never
# crashes the caller.
try:
flow_files = list(path.rglob("flow.py"))
except OSError as e:
logger.warning(f"Failed to scan {path} for flows: {e}")
return
for flow_file in flow_files:
try:
self._load_flow_from_file(flow_file)
except Exception as e:
Expand Down
25 changes: 25 additions & 0 deletions quantmind/configs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""quantmind.configs — flow configuration + input types.

Each flow has a `<Name>FlowCfg` (extends `BaseFlowCfg`) and a `<Name>Input`
discriminated-union type. All cfg / input classes live here so that:

- YAML / CLI users see a single import surface,
- JSON schemas can be exported uniformly (for IDE autocomplete),
- the magic-input resolver (PR5) has one introspection target.
"""

from quantmind.configs.base import BaseFlowCfg, BaseInput
from quantmind.configs.earnings import EarningsFlowCfg, EarningsInput
from quantmind.configs.news import NewsFlowCfg, NewsInput
from quantmind.configs.paper import PaperFlowCfg, PaperInput

__all__ = [
"BaseFlowCfg",
"BaseInput",
"EarningsFlowCfg",
"EarningsInput",
"NewsFlowCfg",
"NewsInput",
"PaperFlowCfg",
"PaperInput",
]
49 changes: 49 additions & 0 deletions quantmind/configs/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""Base flow-cfg + input types shared across all flows.

`BaseFlowCfg` is the data contract for everything a flow exposes to YAML / CLI
users. Each `<Name>FlowCfg` subclasses it and adds domain knobs; nothing here
encodes flow behaviour. `BaseInput` is the parent of every flow's input
discriminated-union member; subclasses set a `Literal` discriminator field.
"""

from agents import ModelSettings
from pydantic import BaseModel, ConfigDict


class BaseFlowCfg(BaseModel):
"""Base configuration shared by all flows."""

model_config = ConfigDict(extra="forbid")

# Model & execution
model: str = "gpt-4o"
model_settings: ModelSettings | None = None
max_turns: int = 10
timeout_seconds: float = 300.0

# Output persistence
output_dir: str | None = None
overwrite: bool = False

# Mind / memory (filesystem-backed when set)
memory_dir: str | None = None

# Observability (consumed by flows/_runner in PR5)
workflow_name: str | None = None
trace_metadata: dict[str, str] | None = None
trace_include_sensitive_data: bool = True
tracing_disabled: bool = False
archive_trajectory: bool = True

# Cost / budget guardrails (enforced in PR5+)
max_total_input_tokens: int | None = None
max_total_cost_usd: float | None = None

# Default guardrails (populated in PR8+)
enable_default_guardrails: bool = True


class BaseInput(BaseModel):
"""Parent of every flow's discriminated-union input member."""

model_config = ConfigDict(extra="forbid")
42 changes: 42 additions & 0 deletions quantmind/configs/earnings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Earnings-flow configuration + input discriminated union."""

from typing import Annotated, Literal, Union

from pydantic import Field

from quantmind.configs.base import BaseFlowCfg, BaseInput


class TickerPeriod(BaseInput):
"""Ticker + reporting period (e.g. ``AAPL`` / ``2026Q1``)."""

type: Literal["ticker_period"] = "ticker_period"
ticker: str
period: str # e.g. "2026Q1"


class TranscriptText(BaseInput):
"""Raw earnings-call transcript pasted inline."""

type: Literal["transcript"] = "transcript"
text: str


class HttpUrl(BaseInput):
"""URL to an earnings release / IR filing."""

type: Literal["http"] = "http"
url: str


EarningsInput = Annotated[
Union[TickerPeriod, TranscriptText, HttpUrl],
Field(discriminator="type"),
]


class EarningsFlowCfg(BaseFlowCfg):
"""Knobs specific to earnings_flow."""

detect_surprises: bool = True
include_guidance: bool = True
41 changes: 41 additions & 0 deletions quantmind/configs/news.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
"""News-flow configuration + input discriminated union."""

from typing import Annotated, Literal, Union

from pydantic import Field

from quantmind.configs.base import BaseFlowCfg, BaseInput


class RssFeed(BaseInput):
"""RSS/Atom feed URL to be polled for items."""

type: Literal["rss"] = "rss"
url: str


class HttpUrl(BaseInput):
"""Single news article URL."""

type: Literal["http"] = "http"
url: str


class Headline(BaseInput):
"""Inline headline text (no body fetching)."""

type: Literal["headline"] = "headline"
text: str


NewsInput = Annotated[
Union[RssFeed, HttpUrl, Headline],
Field(discriminator="type"),
]


class NewsFlowCfg(BaseFlowCfg):
"""Knobs specific to news_flow."""

materiality_threshold: Literal["low", "medium", "high"] = "medium"
entities_hint: list[str] = Field(default_factory=list)
65 changes: 65 additions & 0 deletions quantmind/configs/paper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""Paper-flow configuration + input discriminated union.

`PaperInput` is one of:
- `ArxivIdentifier`: arxiv id or full URL parsed by preprocess.fetch.arxiv
- `HttpUrl`: any web URL (PDF or HTML; routed by content-type)
- `LocalFilePath`: filesystem path to a PDF / HTML / Markdown file
- `RawText`: an inline string (for tests or LLM-pre-cleaned inputs)
- `DoiIdentifier`: a DOI to be resolved via preprocess.fetch.doi
"""

from pathlib import Path
from typing import Annotated, Literal, Union

from pydantic import Field

from quantmind.configs.base import BaseFlowCfg, BaseInput


class ArxivIdentifier(BaseInput):
"""Arxiv id (e.g. ``2604.12345``) or full arxiv URL."""

type: Literal["arxiv"] = "arxiv"
id: str


class HttpUrl(BaseInput):
"""Any web URL; PDF vs HTML is decided by content-type."""

type: Literal["http"] = "http"
url: str


class LocalFilePath(BaseInput):
"""Filesystem path to a PDF / HTML / Markdown file."""

type: Literal["local"] = "local"
path: Path


class RawText(BaseInput):
"""Inline text input (tests / pre-cleaned content)."""

type: Literal["text"] = "text"
text: str


class DoiIdentifier(BaseInput):
"""A DOI to be resolved by ``preprocess.fetch.doi``."""

type: Literal["doi"] = "doi"
doi: str


PaperInput = Annotated[
Union[ArxivIdentifier, HttpUrl, LocalFilePath, RawText, DoiIdentifier],
Field(discriminator="type"),
]


class PaperFlowCfg(BaseFlowCfg):
"""Knobs specific to paper_flow."""

extract_methodology: bool = True
extract_limitations: bool = True
asset_class_hint: str | None = None
Loading
Loading