-
Notifications
You must be signed in to change notification settings - Fork 22
feat(rules): agent rules layer with CLI-first directive #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
deanq
wants to merge
25
commits into
main
Choose a base branch
from
deanq/ae-3155-flash-agent-rules-layer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+382
−5
Open
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
ce13a8b
feat(rules): add static flash-rules.md and rules package
deanq 681f617
feat(rules): add rules engine with packaging and agent file generation
deanq b4f6ecf
feat(rules): add flash rules CLI command
deanq 271f931
feat(rules): integrate agent file generation into flash init
deanq 24fd85b
feat(rules): add dynamic context renderer from manifest data
deanq f4d9a85
feat(rules): wire dynamic context generation into flash rules command
deanq 7711260
feat(rules): add --no-rules flag to flash init and .gitignore entry
deanq 5d85bee
feat(rules): regenerate dynamic context on flash run and flash build
deanq a4d3bd3
fix(rules): wrap dynamic context in try/except, remove --disable flag…
deanq 5131fe7
chore: ignore entire .flash directory
deanq eb7cc4b
fix(rules): restore main run.py shape, use CliRunner in init tests
deanq 08dd26d
feat(rules): add CLI-first directive and surface generated files in R…
deanq 62718c0
feat(rules): trim flash-rules.md to AGENTS.md, CLI-first at top
deanq bc39e08
feat(rules): add minimal install_agent_files
deanq a3998ba
fix(rules): warn on broken CLAUDE.md symlink, clear error on missing …
deanq 8d0b1b3
chore(rules): replace heavy engine with minimal install_agent_files
deanq 0c3ec22
docs(rules): rewrite agent integration section for minimal design
deanq eec27bd
chore(rules): drop stale .flash/context.md skeleton gitignore entry
deanq e67edfc
fix(rules): use 'flash dev' not 'flash run' in AGENTS.md
deanq b457380
feat(rules): add Pattern D for pre-built container images (BYOI)
deanq f69eb4e
docs(rules): use runpod/worker-v1-vllm:v2.18.1 in Pattern D
deanq 3562c1e
chore(skeleton): drop legacy .runpod/ and duplicate dist/ from .gitig…
deanq 1492bf0
docs(rules): restore runpod/skills bundle pointer in README
deanq cb196b6
fix(rules): address Copilot PR review feedback (code)
deanq 635f7f1
docs(rules): heading rename and tighter opt-out wording
deanq File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,9 +37,8 @@ wheels/ | |
|
|
||
| # Flash | ||
| .flash/ | ||
| .runpod/ | ||
| dist/ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,131 @@ | ||
| # Flash Rules for AI Coding Agents | ||
|
|
||
| ## Use the Flash CLI — Do Not Call Runpod REST or GraphQL Directly | ||
|
|
||
| For anything Flash supports, use the `flash` CLI. **Do not** generate `curl`, `httpx`, `requests`, or `gql` calls against `api.runpod.io`, `api.runpod.ai`, or `*.runpod.net` to build, deploy, list, scale, log, or invoke endpoints. The CLI handles auth, hashing, drift detection, manifest generation, and image selection. Direct API calls bypass all of that and will silently desync from Flash state. | ||
|
|
||
| | Intent | Command | Do NOT | | ||
| |--------|---------|--------| | ||
| | Scaffold a project | `flash init <name>` | Hand-write `pyproject.toml` + manifest | | ||
| | Local dev server | `flash dev` | Run `uvicorn` against generated server manually | | ||
| | Package artifact | `flash build` | Tar `src/` and POST it | | ||
| | Deploy to Runpod | `flash deploy` | Call `saveEndpoint` / REST `POST /v1/endpoints` | | ||
| | Preview locally | `flash deploy --preview` | Hand-write `docker-compose.yml` | | ||
| | Tear down | `flash undeploy` | Call `deleteEndpoint` mutation | | ||
| | List apps/envs | `flash app list` / `flash env list` | Query GraphQL `myself.endpoints` | | ||
|
|
||
| If a Flash command does not exist for what the user is asking, surface that gap (`flash <area> --help` first), then ask before reaching for raw API calls. Raw Runpod SDK use (`runpod.Endpoint(...)`) is acceptable only for invoking already-deployed endpoints from non-Flash code — never for lifecycle operations. | ||
|
|
||
| ## Identity | ||
|
|
||
| Flash is a Python SDK for deploying AI workloads to Runpod GPUs. You write decorated Python functions, Flash handles infrastructure, scaling, and deployment. | ||
|
|
||
| ## Endpoint Patterns | ||
|
|
||
| ### Pattern A: Queue-based function endpoint | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuType | ||
|
|
||
| @Endpoint( | ||
| name="my-gpu-worker", | ||
| gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, | ||
| workers=(0, 3), | ||
| dependencies=["torch"], | ||
| ) | ||
| async def process(input_data: dict) -> dict: | ||
| import torch | ||
| return {"gpu": torch.cuda.get_device_name(0)} | ||
| ``` | ||
|
|
||
| ### Pattern B: Load-balanced routes | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint | ||
|
|
||
| api = Endpoint(name="my-api", cpu="cpu3c-1-2", workers=(1, 3)) | ||
|
|
||
| @api.get("/health") | ||
| async def health(): | ||
| return {"status": "ok"} | ||
|
|
||
| @api.post("/compute") | ||
| async def compute(numbers: list[float]) -> dict: | ||
| return {"sum": sum(numbers)} | ||
| ``` | ||
|
|
||
| ### Pattern C: Class-based worker (stateful) | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuType | ||
|
|
||
| @Endpoint( | ||
| name="my-model", | ||
| gpu=GpuType.NVIDIA_GEFORCE_RTX_4090, | ||
| workers=(1, 3), | ||
| dependencies=["torch", "transformers"], | ||
| ) | ||
| class MyModel: | ||
| def __init__(self): | ||
| import torch | ||
| from transformers import pipeline | ||
| self.pipe = pipeline("text-generation", device="cuda") | ||
|
|
||
| async def generate(self, prompt: str) -> dict: | ||
| return {"text": self.pipe(prompt)[0]["generated_text"]} | ||
| ``` | ||
|
|
||
| ### Pattern D: Pre-built container image (no decorated function) | ||
|
|
||
| For workloads that already serve HTTP — vLLM, TGI, ComfyUI, Ollama, custom images — provision the endpoint with an `image=` argument and call it as a client. No Python handler to write. Flash deploys the image and gives you HTTP + queue access to it. | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuGroup | ||
|
|
||
| vllm = Endpoint( | ||
| name="vllm", | ||
| image="runpod/worker-v1-vllm:v2.18.1", | ||
| gpu=GpuGroup.ADA_24, | ||
| workers=(0, 3), | ||
| env={"MODEL_NAME": "meta-llama/Llama-3.1-8B-Instruct"}, | ||
| ) | ||
|
|
||
| # QB-style — the Runpod vLLM worker speaks the queue protocol | ||
| result = await vllm.runsync({"input": {"prompt": "hello", "max_tokens": 64}}) | ||
|
|
||
| # Or LB-style HTTP if you've routed through a load-balanced front | ||
| models = await vllm.get("/v1/models") | ||
| ``` | ||
|
|
||
| When to use this pattern: the upstream project already publishes a serving image and you don't need to add any Python logic on top. If you need pre/post-processing, wrap the call inside a Pattern A or B `@Endpoint` instead. | ||
|
|
||
| To attach to an already-deployed endpoint (no provisioning), pass `id=` instead of `image=`: | ||
|
|
||
| ```python | ||
| ep = Endpoint(id="abc123") | ||
| result = await ep.runsync({"prompt": "hello"}) | ||
| ``` | ||
|
|
||
| ## Rules That Break If Violated | ||
|
|
||
| - `import torch` and heavy libraries INSIDE the function body, never at module level | ||
| - Declare runtime dependencies in `@Endpoint(dependencies=[...])`, not in `pyproject.toml` | ||
| - Endpoint functions can be sync (`def`) or async (`async def`). Use async when awaiting other endpoints or async I/O | ||
| - `workers=N` for fixed count, `workers=(min, max)` for auto-scaling range | ||
| - Class workers: model loading in `__init__`, request handling in instance methods | ||
| - Cross-worker calls use `await` — call `@Endpoint`-decorated functions as if local; Flash handles remote dispatch | ||
| - System-level packages (ffmpeg, libgl1) go in `system_dependencies`, not `dependencies` | ||
| - `@Endpoint` is the canonical decorator. `@remote` is the legacy alias | ||
|
|
||
| ## Common Agent Mistakes | ||
|
|
||
| | Mistake | Fix | | ||
| |---------|-----| | ||
| | Writing raw FastAPI instead of `@Endpoint` | Use `@Endpoint` decorator, Flash generates FastAPI | | ||
| | `import torch` at top of file | Move inside function body | | ||
| | Adding deps to `pyproject.toml` only | Add to `@Endpoint(dependencies=[...])` | | ||
| | Forcing `async def` on all endpoints | Both sync and async are valid; use async only when awaiting | | ||
| | Creating `main.py` or `app.py` | Not needed — Flash auto-discovers decorated functions | | ||
| | Using `docker-compose` manually | Use `flash deploy --preview` for local container testing | | ||
| | Wrapping vLLM/TGI/Comfy in a custom handler for no reason | Use `Endpoint(name=..., image=...)` and call via `.post()`/`.run()` — Pattern D | | ||
| | Calling Runpod REST/GraphQL directly | Use `flash` CLI — see top of this file | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| """Flash agent rules — install AGENTS.md and (best-effort) CLAUDE.md symlink.""" | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import logging | ||
| import os | ||
| from importlib import resources | ||
| from pathlib import Path | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| __all__ = ["install_agent_files"] | ||
|
|
||
|
|
||
| def _read_packaged_agents_md() -> str: | ||
| try: | ||
| return (resources.files("runpod_flash.rules") / "AGENTS.md").read_text( | ||
| encoding="utf-8" | ||
| ) | ||
| except FileNotFoundError as exc: | ||
| raise FileNotFoundError( | ||
| "AGENTS.md not found in runpod_flash.rules package data. " | ||
| "The installed wheel may be incomplete." | ||
| ) from exc | ||
|
|
||
|
|
||
| def install_agent_files(target_dir: Path) -> list[Path]: | ||
| """Write AGENTS.md and a CLAUDE.md symlink into target_dir if absent. | ||
|
|
||
| Returns the list of paths actually created. Idempotent: if both files | ||
| exist (or CLAUDE.md already exists in any form), they are left alone. | ||
|
|
||
| Symlink failure (e.g. Windows without developer mode) is non-fatal — | ||
| AGENTS.md is still written and the failure is logged. | ||
| """ | ||
| target_dir = Path(target_dir) | ||
| target_dir.mkdir(parents=True, exist_ok=True) | ||
| created: list[Path] = [] | ||
|
|
||
| agents = target_dir / "AGENTS.md" | ||
| if agents.is_symlink() and not agents.exists(): | ||
| logger.warning( | ||
| "AGENTS.md is a broken symlink at %s. Repair manually or remove it.", | ||
| agents, | ||
| ) | ||
| elif not agents.exists(): | ||
| agents.write_text(_read_packaged_agents_md(), encoding="utf-8") | ||
| created.append(agents) | ||
|
|
||
|
deanq marked this conversation as resolved.
|
||
| claude = target_dir / "CLAUDE.md" | ||
| if claude.is_symlink() and not claude.exists(): | ||
| logger.warning( | ||
| "CLAUDE.md is a broken symlink at %s. Repair manually or remove it.", | ||
| claude, | ||
| ) | ||
| elif not claude.exists(): | ||
| try: | ||
| os.symlink("AGENTS.md", claude) | ||
| created.append(claude) | ||
| except OSError as exc: | ||
| logger.warning( | ||
| "Could not create CLAUDE.md symlink (%s). " | ||
| "Claude Code users can run: ln -s AGENTS.md CLAUDE.md", | ||
| exc, | ||
| ) | ||
|
|
||
| return created | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.