Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 26 additions & 24 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,29 +29,31 @@ quantmind/
Key principle: QuantMind does NOT rebuild Agent runtime, lifecycle hooks, tracing,
multi-agent handoff, or tool framework. Those come from `openai-agents`.

## Current Repository State (transitional, after PR #70 / #73 / #74 / PR4)
## Current Repository State (after PR #70 / #73 / #74 / #75 / PR5)

| Module | Status | Notes |
|--------|--------|-------|
| `quantmind/knowledge/` | landed (PR3) | data standard with three shapes: `FlattenKnowledge` (`News` / `Earnings` / `PaperKnowledgeCard`), `TreeKnowledge` (`Paper`), `GraphKnowledge` (placeholder); shared base = `BaseKnowledge` with typed `SourceRef` / `ExtractionRef` provenance + `embedding_text()` contract |
| `quantmind/configs/` | landed (PR3) | `BaseFlowCfg` / `BaseInput` + per-flow cfg + discriminated-union input types |
| `quantmind/preprocess/` | landed (PR4) | `fetch/` (`fetch_arxiv` / `fetch_url` / `resolve_doi` / `read_local_file` returning `Fetched` / `RawPaper` / `CrossrefMetadata` frozen dataclasses) + `format/` (`pdf_to_markdown` via PyMuPDF, `html_to_markdown` via trafilatura) + `clean.py` (`normalize_unicode` / `collapse_whitespace` / `dedupe_lines`) + `time.py` (`to_utc` / `parse_filing_date` / `business_days_between`); leaf module — only depends on `quantmind.utils` |
| `quantmind/preprocess/` | landed (PR4) | `fetch/` (`fetch_arxiv` / `fetch_url` / `resolve_doi` / `read_local_file` returning `Fetched` / `RawPaper` / `CrossrefMetadata` frozen dataclasses) + `format/` (`pdf_to_markdown` via PyMuPDF, `html_to_markdown` via trafilatura) + `clean.py` + `time.py`; leaf module — only depends on `quantmind.utils` |
| `quantmind/flows/` | landed (PR5) | apex layer: `paper_flow` (`PaperInput` → `Paper` via SDK Agent), `batch_run` + `BatchResult` (bounded-concurrency fan-out, `memory=` rejected by design), `_runner.run_with_observability` + `_compose_hooks` + `_archive_run_artifacts` (PR6 stub); only depends on configs/knowledge/preprocess/utils + `agents` SDK |
| `quantmind/magic.py` | landed (PR5) | `resolve_magic_input(natural_language, *, target_flow, ...) -> (input, cfg)` plus `preview_resolve` debug helper; introspects flow signatures and runs a lightweight resolver Agent with `output_type=ResolvedFlowConfig[InputT, CfgT]` |
| `quantmind/utils/logger.py` | permanent | only general-purpose utility |
| `quantmind/flow/` | transitional | replaced by `flows/` in PR5 |
| `quantmind/config/` | transitional | superseded by `quantmind/configs/` (PR3); deletion when consumers (`flow/`, `llm/`) migrate in PR5 |
| `quantmind/llm/` | transitional | deleted in PR5 (use SDK + `openai` directly) |
| `quantmind/models/{content,paper,analysis}.py` | transitional | superseded by `quantmind/knowledge/` (PR3); deletion when consumers (`flow/`) migrate in PR5 |

`quantmind/parsers/`, `quantmind/sources/`, and `quantmind/utils/tmp.py`
were removed in PR4 (replaced by `preprocess/format/`, `preprocess/fetch/`,
and deletion respectively).

Transitional modules are excluded from `basedpyright` AND from
`coverage.run` (see `pyproject.toml`) to keep the harness green during
migration. New modules (`knowledge/`, `configs/`, `preprocess/`, `flows/`,
`mind/`, `magic.py`) are auto-included at standard mode and gated by
`import-linter` contracts (4 contracts as of PR4) so they cannot
accidentally pull in a transitional module.

PR5 removed the transitional packages (`quantmind/{flow,llm,config,models}/`
and their tests under `tests/{config,models}/`); PR4 had already removed
`quantmind/parsers/`, `quantmind/sources/`, and `quantmind/utils/tmp.py`.
The codebase has now converged to the five permanent module roots
(`flows/`, `configs/`, `knowledge/`, `preprocess/`, `mind/`) plus
`magic.py` and `utils/`.

`basedpyright` runs in standard mode across the whole `quantmind/`
package — there are no per-module exclusions left. Five `import-linter`
contracts pin the dependency graph: `utils` and `knowledge` are leaves,
`configs` only depends on `knowledge`, `preprocess` only depends on
`utils`, and `flows + magic` is the apex (cannot import the deleted
transitional packages, which are listed in the contract as a tripwire
against accidental re-introduction).

## Development Commands

Expand Down Expand Up @@ -79,9 +81,8 @@ It runs five steps in fixed order, fast-failing on the first error:
2. `ruff check` — lint (D, E, F, I, W, B, W505) must pass
3. `basedpyright` — standard-mode type check on permanent + new modules
4. `lint-imports` — architectural boundary contracts must hold
5. `pytest --cov` — tests pass with ≥ 65% branch coverage (will ratchet up
to 75%+ in PR5 once `flow/` / `llm/` / `config/` and the transitional
`models/*.py` are removed)
5. `pytest --cov` — tests pass with ≥ 75% branch coverage (raised from 65
in PR5 after the transitional packages were deleted)

Pre-commit hooks (`.pre-commit-config.yaml`):
- pre-commit stage: trailing whitespace / EOF / ruff / ruff-format (fast)
Expand Down Expand Up @@ -148,7 +149,8 @@ issue instead.
- ❌ Add a CLI (`argparse`/`typer`/`click`); users run Python runbook scripts
- ❌ Introduce class-based `BaseFlow` / plugin registry / hook discovery
- ❌ Wrap `from agents import ...` in a QuantMind-side facade — use the SDK directly
- ❌ Mix `batch_run` and `memory` (they will be mutually exclusive in MVP; see PR5)
- ❌ Mix `batch_run` and `memory` (mutually exclusive in MVP; `batch_run` rejects
`memory=` at the signature layer — design doc §4.3.5)
- ❌ Use `Dict[str, Any]` in init functions; use Pydantic models
- ❌ Add hard deps on observability platforms (Langfuse / Logfire / etc.); document
integration via `add_trace_processor()` in user-facing cookbook only
Expand All @@ -170,8 +172,8 @@ issue instead.
| #70 (merged) | Clean removal of self-built agent runtime |
| #73 (merged) | Golden Harness — `scripts/verify.sh` with ruff + basedpyright + import-linter + pytest --cov, plus matching CI |
| #74 (merged) | `knowledge/` data standard (Flatten / Tree / Graph shapes) + `configs/` skeleton; `openai-agents>=0.14` introduced for `BaseFlowCfg.model_settings` |
| PR4 (this PR) | `preprocess/` (fetch + format two layers); deletes `parsers/` + `sources/` + `utils/tmp.py`; coverage floor 60→65; 4th import-linter contract (`preprocess` is a leaf) |
| PR5 | `flows/` + `paper_flow` + `batch_run` + `magic.py`; delete `quantmind/flow/`, `quantmind/llm/`, `quantmind/config/`, `quantmind/models/{content,paper,analysis}.py` |
| PR6 | `mind/memory/filesystem` MVP + trajectory archive |
| #75 (merged) | `preprocess/` (fetch + format two layers); deletes `parsers/` + `sources/` + `utils/tmp.py`; coverage floor 60→65; 4th import-linter contract |
| PR5 (this PR) | `flows/` (`paper_flow` + `batch_run` + `BatchResult` + `_runner`) + `magic.py`; deletes `quantmind/{flow,llm,config,models}/`; coverage floor 65→75; 5th import-linter contract pins `flows + magic` as apex |
| PR6 | `mind/memory/filesystem` MVP + trajectory archive (fills `_archive_run_artifacts` stub) |
| PR7 | `mind/store/` + SQLite + `sqlite-vec` MVP; introduces `preprocess/chunk.py` with `tiktoken` |
| PR8+ | Second flow (news/earnings) / observability cookbook / longer-term modules |
79 changes: 65 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,29 +154,80 @@ We use [uv](https://github.com/astral-sh/uv) for fast and reliable Python packag

### 📚 Usage Examples

#### Fetch and format an arXiv paper
#### Run a single paper through `paper_flow`

```python
import asyncio

from quantmind.preprocess import fetch_arxiv, pdf_to_markdown
from quantmind.configs import PaperFlowCfg
from quantmind.configs.paper import ArxivIdentifier
from quantmind.flows import paper_flow


async def main() -> None:
raw = await fetch_arxiv("arXiv:2401.12345")
markdown = await pdf_to_markdown(raw.bytes)
print(f"Title: {raw.title}")
print(f"Authors: {', '.join(raw.authors)}")
print(markdown[:500])
paper = await paper_flow(
ArxivIdentifier(id="2401.12345"),
cfg=PaperFlowCfg(model="gpt-4o-mini"),
)
print(paper.model_dump_json(indent=2))


asyncio.run(main())
```

#### Fan out a batch with `batch_run`

```python
import asyncio

from quantmind.configs import PaperFlowCfg
from quantmind.configs.paper import ArxivIdentifier
from quantmind.flows import batch_run, paper_flow


async def main() -> None:
inputs = [ArxivIdentifier(id=aid) for aid in (
"2401.12345", "2401.12346", "2401.12347",
)]
result = await batch_run(
paper_flow,
inputs,
cfg=PaperFlowCfg(model="gpt-4o-mini"),
concurrency=3,
on_error="skip",
on_progress=lambda done, total: print(f"{done}/{total}"),
)
print(f"ok={result.success_count} failed={result.failure_count}")


asyncio.run(main())
```

#### Resolve free-form intent with `magic`

```python
import asyncio

from quantmind.flows import paper_flow
from quantmind.magic import resolve_magic_input


async def main() -> None:
inp, cfg = await resolve_magic_input(
"Pull arXiv 2401.12345 about cross-sectional momentum; use gpt-4o-mini.",
target_flow=paper_flow,
)
paper = await paper_flow(inp, cfg=cfg)
print(paper.model_dump_json(indent=2))


asyncio.run(main())
```

> **Note**: QuantMind is mid-migration to OpenAI Agents SDK
> (see [#71](https://github.com/LLMQuant/quant-mind/issues/71)). The high-level
> flows/storage APIs land in upcoming PRs; for now the `preprocess/` and
> `knowledge/` layers are stable.
> (see [#71](https://github.com/LLMQuant/quant-mind/issues/71)). PR5 lands the
> apex layer (`flows/` + `magic.py`); the remaining work is the `mind/`
> memory + store layer scheduled for PR6 and PR7.

---

Expand All @@ -201,13 +252,13 @@ QuantMind is designed with a larger vision: to become a comprehensive intelligen
The foundation we're building today—starting with papers—will expand to encompass the entire financial information ecosystem.

> [!NOTE]
> **Future Conceptual Example:**
> **Future Conceptual Example (PR6 brings `FilesystemMemory`):**
>
> ```python
> # The future we are building towards
> from quantmind.flows import paper_flow, batch_run
> from quantmind.configs.paper import ArxivIdentifier
> from quantmind.flows import paper_flow
> from quantmind.knowledge import Paper
> from quantmind.mind.memory import FilesystemMemory
> from quantmind.mind.memory import FilesystemMemory # PR6
>
> memory = FilesystemMemory("./mem/factor-research/")
> for arxiv_id in arxiv_ids:
Expand Down
94 changes: 36 additions & 58 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -92,29 +92,17 @@ convention = "google"
# ----------------------------------------------------------------------------
# basedpyright: type checker
# ----------------------------------------------------------------------------
# Strategy: explicit include-list of files that survive PR3-PR5 plus the
# target-architecture module dirs (knowledge/, configs/, preprocess/, flows/,
# mind/). Transitional modules still scheduled for deletion in PR5
# (config/, flow/, llm/, models/{analysis,content,paper}.py) are NOT
# included; they get deleted soon and are not worth modernizing for type
# checks. parsers/, sources/, utils/tmp.py were removed in PR4.
# Every module in `quantmind/` is type-checked at "standard" mode. PR5
# deleted the transitional packages (`config/`, `flow/`, `llm/`,
# `models/`) so the exclude list is now down to the directories pyright
# always wants ignored.

[tool.basedpyright]
include = ["quantmind"]
# Transitional modules excluded from type checking — they get deleted in
# PR5 (per CLAUDE.md "Current Repository State" table). New modules
# (knowledge/, configs/, preprocess/, flows/, mind/, magic.py) automatically
# get type-checked at "standard" mode as they land — no further config needed.
exclude = [
"**/__pycache__",
"**/.venv",
"**/build",
"quantmind/config",
"quantmind/flow",
"quantmind/llm",
"quantmind/models/analysis.py",
"quantmind/models/content.py",
"quantmind/models/paper.py",
]
pythonVersion = "3.10"
typeCheckingMode = "standard"
Expand All @@ -131,8 +119,10 @@ reportIncompatibleVariableOverride = "none"
# import-linter: architectural boundary contracts
# ----------------------------------------------------------------------------
# Encodes the target architecture: utils, knowledge, and preprocess are
# leaves; configs depends only on knowledge. flows (PR5) and mind (PR6+)
# get their own contracts when they land.
# leaves; configs depends only on knowledge; flows + magic is the apex
# layer (PR5). The transitional packages (config/, flow/, llm/, models/)
# were deleted in PR5; they remain in the forbidden lists as a tripwire
# against accidental re-introduction during future refactors.

[tool.importlinter]
root_packages = ["quantmind"]
Expand All @@ -142,12 +132,10 @@ name = "utils is a leaf (no inbound deps from quantmind packages)"
type = "forbidden"
source_modules = ["quantmind.utils"]
forbidden_modules = [
"quantmind.config",
"quantmind.configs",
"quantmind.flow",
"quantmind.flows",
"quantmind.knowledge",
"quantmind.llm",
"quantmind.models",
"quantmind.magic",
"quantmind.preprocess",
]

Expand All @@ -156,36 +144,46 @@ name = "knowledge is a leaf (no inbound deps from quantmind packages)"
type = "forbidden"
source_modules = ["quantmind.knowledge"]
forbidden_modules = [
"quantmind.config",
"quantmind.configs",
"quantmind.flow",
"quantmind.llm",
"quantmind.models",
"quantmind.flows",
"quantmind.magic",
"quantmind.preprocess",
"quantmind.utils",
]

[[tool.importlinter.contracts]]
name = "configs only depends on knowledge (transitional modules forbidden)"
name = "configs only depends on knowledge"
type = "forbidden"
source_modules = ["quantmind.configs"]
forbidden_modules = [
"quantmind.config",
"quantmind.flow",
"quantmind.llm",
"quantmind.models",
"quantmind.flows",
"quantmind.magic",
"quantmind.preprocess",
]

[[tool.importlinter.contracts]]
name = "preprocess only depends on utils (no inbound deps on configs/knowledge/transitional)"
name = "preprocess only depends on utils (no inbound deps on configs/knowledge/flows)"
type = "forbidden"
source_modules = ["quantmind.preprocess"]
forbidden_modules = [
"quantmind.config",
"quantmind.configs",
"quantmind.flow",
"quantmind.flows",
"quantmind.knowledge",
"quantmind.magic",
]

[[tool.importlinter.contracts]]
name = "flows + magic is apex (no transitional package imports)"
type = "forbidden"
source_modules = [
"quantmind.flows",
"quantmind.magic",
]
# These packages were deleted in PR5; the contract guards against any
# future code re-introducing them under the same names.
forbidden_modules = [
"quantmind.config",
"quantmind.flow",
"quantmind.llm",
"quantmind.models",
]
Expand All @@ -196,39 +194,19 @@ forbidden_modules = [

[tool.pytest.ini_options]
testpaths = ["tests"]
# Coverage floor 65% after PR4 deleted parsers/sources (the low-coverage
# drag) and added preprocess/ at >85% line. Will ratchet to 75% in PR5
# once flow/, llm/, config/, models/{content,paper,analysis}.py are gone.
# Coverage floor 75% — PR5 deleted the transitional packages so every
# remaining module is in the target architecture and is well-tested.
addopts = [
"--cov=quantmind",
"--cov-report=term-missing",
"--cov-fail-under=65",
"--cov-fail-under=75",
"-ra",
]
asyncio_mode = "auto"
filterwarnings = [
# Pydantic v1 → v2 transition warnings on transitional model code that
# gets removed in PR5. Suppress until those modules are gone.
"ignore::DeprecationWarning:pydantic.*",
"ignore::pydantic.PydanticDeprecatedSince20",
]

[tool.coverage.run]
source = ["quantmind"]
branch = true
# Transitional modules slated for PR5 deletion. They lost their tests in
# PR4 (parsers/sources tests were dragging the floor down anyway and were
# the only callers of flow/, llm/, config/parsers, config/sources). Once
# PR5 deletes them, this omit list goes empty and the floor ratchets to 75.
omit = [
"quantmind/flow/*",
"quantmind/llm/*",
"quantmind/config/parsers.py",
"quantmind/config/sources.py",
"quantmind/config/flows.py",
"quantmind/config/registry.py",
"quantmind/models/analysis.py",
]

[tool.coverage.report]
exclude_lines = [
Expand All @@ -240,4 +218,4 @@ exclude_lines = [
# Branch coverage. Floor moves with the migration:
# PR3: 60% (parsers/sources still drag the average down)
# PR4: 65% (parsers/sources gone; preprocess >85%)
# PR5: 75% target (flow/, llm/, config/, transitional models/ gone)
# PR5: 75% (flow/, llm/, config/, transitional models/ gone)
Loading
Loading