Skip to content

feat: migrate to Praxis filter-based proxy architecture#27

Open
franciscojavierarceo wants to merge 1 commit into
mainfrom
feat/praxis-integration
Open

feat: migrate to Praxis filter-based proxy architecture#27
franciscojavierarceo wants to merge 1 commit into
mainfrom
feat/praxis-integration

Conversation

@franciscojavierarceo
Copy link
Copy Markdown
Collaborator

@franciscojavierarceo franciscojavierarceo commented May 15, 2026

Summary

Replaces the hand-rolled Axum proxy with Praxis, a composable filter-based reverse proxy framework built on Pingora. Each gateway concern — proxying, auth, state hydration, tool dispatch, agentic looping — is an independent filter wired together via YAML configuration.

Why Praxis

  • Each filter is self-contained — implements HttpFilter with hooks for request/response. Filters don't know about each other.
  • YAML-configured pipeline — adding, removing, or reordering filters requires no code changes.
  • Native SSE streaming — Praxis/Pingora proxies the upstream response stream directly to the client. No buffering, no reqwest intermediary.
  • Hot reload — filter pipelines can be reloaded without restarting the server.

What changed

Filters introduced:

  • responses_proxy — sets ctx.upstream to vLLM's /v1/responses endpoint and injects auth credentials. Praxis/Pingora handles the actual proxying and streaming natively.
  • state_hydration — stub filter for conversation-state hydration. Inspects request body for previous_response_id and will call the state store to hydrate conversation history.
  • agentic_loop — stub filter for agentic re-inference. Inspects response body for function_call output items and will re-enter the inference loop.
  • tool_dispatch — stub filter for tool execution. Inspects response body for tool calls and will dispatch them.

Removed:

  • src/app.rs, src/proxy.rs, src/server.rs — replaced by filters + Praxis server runtime
  • benches/proxy_bench.rs — benchmark harness for the old Axum proxy (will be re-added)

Dependencies:

  • Praxis crates (praxis, praxis-proxy-core, praxis-proxy-filter, praxis-test-utils) via git at rev 2f7ea31
  • Base URLs ending with /v1 are normalized to avoid /v1/v1/responses double-prefix

Docs:

  • Updated README.md with architecture diagram, filter table, and run instructions
  • Updated docs/index.md with architecture overview and Praxis context
  • Added docs/architecture/index.md with Mermaid diagram, filter pipeline reference, streaming details, and component descriptions
  • Added Architecture page to mkdocs nav

Health endpoint:

  • Provided by Praxis's built-in admin endpoint (admin: { address: "127.0.0.1:9901" } in config)

Filter pipeline

filter_chains:
  - name: agentic
    filters:
      - filter: state_hydration
        store_base_url: "http://localhost:8080"
      - filter: agentic_loop
        max_iterations: 10
      - filter: tool_dispatch
      - filter: responses_proxy
        vllm_base_url: "http://localhost:8000"

Test plan

  • cargo build succeeds
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt -- --check clean
  • pre-commit run --all-files clean
  • All 9 tests pass (2 unit + 7 integration):
    • test_non_stream_passthrough — JSON request/response round-trip
    • test_stream_passthrough — SSE streaming passthrough
    • test_auth_injection — API key injected from config
    • test_client_auth_precedence — client-supplied auth preserved
    • test_vllm_http_error_passthrough — upstream 429 forwarded
    • test_mid_stream_failure_closes_cleanly — partial stream handled
    • test_connect_error_maps_to_502 — unreachable vLLM returns 502

Replace the Axum HTTP server with Praxis as the core proxy runtime.
All request handling logic is now implemented as composable Praxis
filters (responses_proxy, ogx_state, agentic_loop, tool_dispatch),
wired together via YAML configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Comment thread config/agentic-api.yaml
@@ -0,0 +1,22 @@
admin:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need documentation on what each fields means in this yaml file and what are they used for.
some fields are confusing to guess what it meant to do. like store_base_url like related to storage but it's not database url?!

Overall the Praxis library is relatively new not sure if it is suitable to rely on it. the maintenance is costly.

it is difficult to review this PR and judge as would require the maintainer on agentic-ap to be familiar with Praxis.

meanwhile natively writing our own http request gateway would allow us flexibility especially in SSE stream and tool calls.
in terms of testing the agentic-api repo functionality as a whole system now it's relying entirely on Praxis is maintained and tested. What If we encounter bugs from Praxis we would need to wait for bug fixes there.

I thought based on the last community meeting we would use OGX for CRUD features as it is a well-maintained project?!

@noobHappylife
Copy link
Copy Markdown
Collaborator

Thanks for putting this together.

I like the direction of making gateway concerns more composable, and praxis does look like a strong proxy framework. the filter model is useful for auth, rate limit, tenant routing, quota, policy, request validation, header injection, deployment guardrails etc.

But i don’t think praxis should be the core boundary for agentic-api, my main concern is that this moves too much of the actual agentic runtime into proxy filters. things like:

  • previous_response_id rehydration
  • response/message store
  • stable response item ids
  • SSE event ordering/timing
  • tool call detection/execution/result injection
  • multi-turn loop continuation
  • cancellation/timeout/partial failure semantics
  • trajectory/trace capture for RL rollout
  • vLLM-native metadata, cache/session lineage, token/logprob metadata (which is important for RL usecase)

These, I feel, are not really generic proxy concerns. they are the main state machine of agentic-api. Splitting this into filters like state_hydration, agentic_loop, tool_dispatch, responses_proxy may look composable, but in practice they are tightly coupled by shared state, response semantics, and stream ordering. i worry we end up encoding the main transaction as middleware, which is harder to reason about and test.

Praxis can still be useful, in an architecture like:

client / codex cli / agent harness
-> praxis gateway, optional
auth / rate limit / tenant routing / policy / coarse guardrails
-> agentic-api
responses state machine, message store, SSE semantics,
tool loop, tool execution, trajectory capture
-> vLLM responses or internal vLLM-native backend

So praxis as an outer gateway in front of agentic-api, not where we decompose the agentic loop itself.

And for OGX. I think OGX can be a backend/service provider for built-in tools like file_search, vector stores, files, or other stateful services. but agentic-api should still decide when/how those tools participate in the responses loop.

So my recommendation is: don’t make praxis the core architecture boundary for agentic-api. if we support praxis, I’d rather make it an optional outer gateway integration, or a very thin adapter that delegates into an explicit agentic-api orchestration core.

leseb added a commit to leseb/agentic-api that referenced this pull request May 19, 2026
Captures the three-layer crate design (core library, axum server,
thin gateway adapters) and key architectural decisions:

- Agentic loop as explicit state machine, not proxy filters
- No Python (OGX) in core request paths
- Praxis routes to agentic-api as a backend service
- Standalone mode is first-class
- Gateway adapters are thin (one filter per gateway)

Shapes PR vllm-project#24 as the foundation for agentic-core/agentic-server.
Supersedes PR vllm-project#27's filter-based decomposition approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to leseb/agentic-api that referenced this pull request May 19, 2026
- Add note about superseding ADR-01 language decision (D3)
- Remove axum from Layer 3 diagram (it belongs in Layer 2)
- Soften PR vllm-project#27 language to "if accepted"
- Clarify PR vllm-project#24 relationship as forward-looking

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb
Copy link
Copy Markdown
Collaborator

leseb commented May 19, 2026

Thanks for putting this together.

I like the direction of making gateway concerns more composable, and praxis does look like a strong proxy framework. the filter model is useful for auth, rate limit, tenant routing, quota, policy, request validation, header injection, deployment guardrails etc.

But i don’t think praxis should be the core boundary for agentic-api, my main concern is that this moves too much of the actual agentic runtime into proxy filters. things like:

  • previous_response_id rehydration
  • response/message store
  • stable response item ids
  • SSE event ordering/timing
  • tool call detection/execution/result injection
  • multi-turn loop continuation
  • cancellation/timeout/partial failure semantics
  • trajectory/trace capture for RL rollout
  • vLLM-native metadata, cache/session lineage, token/logprob metadata (which is important for RL usecase)

These, I feel, are not really generic proxy concerns. they are the main state machine of agentic-api. Splitting this into filters like state_hydration, agentic_loop, tool_dispatch, responses_proxy may look composable, but in practice they are tightly coupled by shared state, response semantics, and stream ordering. i worry we end up encoding the main transaction as middleware, which is harder to reason about and test.

Praxis can still be useful, in an architecture like:

client / codex cli / agent harness -> praxis gateway, optional auth / rate limit / tenant routing / policy / coarse guardrails -> agentic-api responses state machine, message store, SSE semantics, tool loop, tool execution, trajectory capture -> vLLM responses or internal vLLM-native backend

So praxis as an outer gateway in front of agentic-api, not where we decompose the agentic loop itself.

And for OGX. I think OGX can be a backend/service provider for built-in tools like file_search, vector stores, files, or other stateful services. but agentic-api should still decide when/how those tools participate in the responses loop.

So my recommendation is: don’t make praxis the core architecture boundary for agentic-api. if we support praxis, I’d rather make it an optional outer gateway integration, or a very thin adapter that delegates into an explicit agentic-api orchestration core.

All great points, i'm proposing a design that should align all parties so let's discuss over this ADR #28 :)

leseb added a commit to leseb/agentic-api that referenced this pull request May 20, 2026
- Use correct Praxis terms: HttpFilter, filter chain, branch chains
  (not "pipeline nodes", "DAG", or "re-entrance")
- Each agentic-core function is wrapped in an HttpFilter, composed
  into a filter chain with branch support for tool-call looping
- Standalone mode uses execute() with plain Rust control flow
- PR vllm-project#27 aligns in direction but should delegate to agentic-core
  functions rather than implementing logic directly in filters

Signed-off-by: Sébastien Han <seb@redhat.com>
Copy link
Copy Markdown
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I see the flow #24 ships the proxy logic, #29 splits it into the layered crates, and #27 rebases on top to add the Praxis filter chain that wraps agentic-core functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants