- Package skeleton:
pyproject.toml,layoutscribe/withapi.py,cli.py,config.py,types.py - Minimal runnable CLI:
layoutscribe parse <file>with stub outputs - Formal JSON Schema validation wired into reviewer step
- Examples:
examples/inputs and expected outputs - Provider setup docs completeness and
.envexample validation
- Reviewer re-ask policy implementation (thresholds, retries, budget checks)
- Overlay generation for sampled pages
- Basic tests: schema validation, golden markdown, fake provider
- Bench harness: tiny dataset run with MLflow logging
- Security: ephemeral temp dirs,
--save-intermediategate
- CI: lint (ruff/black), tests, wheel build
- Docs site (MkDocs/Docusaurus) and GH Pages publish
- Provider matrix expansion (Azure, Anthropic, Google)
- Caching keyed by page image hash + prompt version
- Issue/PR templates, CODEOWNERS
- Rich table reconstruction options
- Hosted service wrapper (optional)
- Visualization tool for layout JSON
- Use GitHub Projects or Issues to track these tasks
- Link commits/PRs to tasks for traceability