Add TCP support for Python Executor and local debugging guide#1875
Add TCP support for Python Executor and local debugging guide#1875sehan-dissanayake wants to merge 6 commits intowso2:mainfrom
Conversation
Introduce a configurable Python executor (UDS or TCP) and wire it into the policy engine. Key changes: - Add [python_executor] section to config-template.toml (server mode/host/port and timeout). - Add PythonExecutorConfig types, defaults and validation in policy engine config. - Initialize pythonbridge from configuration in policy-engine main and use pythonbridge.IsAvailable when starting admin server. - Refactor pythonbridge StreamManager to use address + isTCP, add Init(cfg) to configure the global manager, support TCP vs UDS dialing, expose IsAvailable, and centralize timeout handling. Update tests accordingly. - Update Python executor server to accept a listen address (supports TCP or UDS), cleanup stale UDS files, and improve logging. - Update Python executor main to read TOML config (tomli/tomllib), add --listen flag and config-aware defaults; add tomli to requirements for Python <3.11. - Prevent overriding the executor listen address in docker-entrypoint; adjust docker-compose (comment out admin/metrics port mappings and set GATEWAY_CONTROLLER_HOST=host.docker.internal). - Skip common Python virtualenv/cache/.git directories when copying policy sources in the builder. These changes allow running the Python executor over TCP (useful for containerized setups) or UDS, provide configurable timeouts, improve startup robustness, and keep tests and tooling aligned.
Add a comprehensive PYTHON_DEBUG_GUIDE.md for running and debugging the Python Executor locally (TCP mode, VS Code/pdb tips, steps to build, run, and cleanup). Change pythonbridge factory to resolve the global StreamManager at GetPolicy time instead of carrying it on BridgeFactory: remove the StreamManager field, call GetStreamManager() in GetPolicy, and pass that instance to InitPolicy and the created bridge. Update the plugin registry template to stop injecting StreamManager into the generated BridgeFactory. This ensures the singleton StreamManager is configured after pythonbridge.Init(cfg) in main before use.
Add a new VS Code debug configuration for the Python Executor (debugpy) in .vscode/launch.json to launch the executor module with appropriate args and PYTHONPATH. Overhaul gateway/PYTHON_DEBUG_GUIDE.md: clarify the host-process debugging workflow, update the mermaid diagram, add warnings about Go module resolution and Rancher Desktop, instruct using host.docker.internal (or host IP) and specific docker-compose changes (set GATEWAY_CONTROLLER_HOST and comment Policy Engine ports), renumber and clarify steps, provide updated Docker Compose commands (run from gateway/), add cleanup guidance, and update the ports table and assorted wording tweaks.
Remove TOML config parsing and dependency; rely on environment variables for defaults instead. main.py no longer reads a --config TOML file at startup and uses PYTHON_EXECUTOR_LISTEN and PYTHON_POLICY_TIMEOUT (env) as defaults; help text updated to clarify listen can be a UDS path or host:port. requirements.txt drops tomli. VS Code launch.json entries no longer pass --config. PYTHON_DEBUG_GUIDE.md updated with Rancher Desktop (Lima) instructions to export HOST_IP and apply it to the relevant steps, plus adjusted docs to reflect that the executor binds to localhost:9010 when using TCP. These changes simplify startup and remove the toml parsing requirement.
📝 WalkthroughSummaryAdds dual-mode gRPC transport (Unix Domain Sockets and TCP) between the Go Policy Engine and the Python Executor, enabling the Python Executor to run as a standalone host process and improving local debugging and developer workflows. Core Changes
Documentation & Developer Experience
Miscellaneous
ImpactEnables running and debugging the Python Executor outside containers using TCP while preserving UDS behavior by default for production. Improves configuration flexibility, startup robustness, and developer tooling for local policy development. WalkthroughThis PR adds configurable Python Executor connectivity (UDS or TCP) and runtime/debugging support. It introduces PythonExecutor configuration types and validation, initializes the pythonbridge from configuration, and changes StreamManager to support address+isTCP with Init/IsAvailable/GetStreamManager APIs. The Python Executor and its CLI now accept a listen address (UDS path or host:port). The build generator ignores common venv/cache dirs and the policy factory template no longer embeds a StreamManager. A host-debugging guide and VS Code debug configuration were added; the Docker entrypoint blocks overriding the executor listen address. Sequence DiagramsequenceDiagram
participant Cfg as Config
participant PE as Policy Engine
participant Bridge as pythonbridge
participant SM as StreamManager
participant Executor as Python Executor
Cfg->>PE: Load full config (python_executor.server, timeout)
PE->>Bridge: Init(cfg.PythonExecutor)
activate Bridge
Bridge->>Bridge: determine configuredAddress & configuredIsTCP
Bridge->>SM: NewStreamManager(configuredAddress, configuredIsTCP)
deactivate Bridge
PE->>Bridge: IsAvailable(cfg.PythonExecutor)
alt mode == "tcp"
Bridge-->>PE: true
else mode == "uds"
Bridge->>Bridge: stat socket path
Bridge-->>PE: true/false
end
PE->>SM: GetStreamManager()
activate SM
SM->>Executor: Connect()
alt isTCP == true
SM->>Executor: Dial TCP (host:port)
else
SM->>Executor: Dial Unix socket (path)
end
Executor-->>SM: gRPC connection established
deactivate SM
PE->>SM: InitPolicy(ctx, req)
SM->>Executor: ExecuteStream RPC
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain modules listed in go.work or their selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
gateway/gateway-runtime/docker-entrypoint.sh (1)
126-127:⚠️ Potential issue | 🟠 Major | ⚡ Quick winExport
PYTHON_EXECUTOR_SOCKETand pass it explicitly to the Python Executor to prevent configuration drift and environment variable bypasses.
PYTHON_EXECUTOR_SOCKETis set as a local shell variable (Line 126) but never exported. The Python Executor is launched at Line 188 without an explicit--listenargument, relying instead on a hard-coded default path and thePYTHON_EXECUTOR_LISTENenvironment variable fallback. While the defaults currently match, this creates fragility: if the socket path configuration ever diverges, the socket-wait loop (Lines 197–208) will timeout silently.More critically, the CLI guard at Lines 8–10 blocks
--py.listenflag overrides, but an operator can bypass this by setting thePYTHON_EXECUTOR_LISTENenvironment variable, causing the Python Executor to listen on an unintended path and making the socket-wait loop hang indefinitely.Proposed fix
-PYTHON_EXECUTOR_SOCKET="/var/run/api-platform/python-executor.sock" +export PYTHON_EXECUTOR_SOCKET="/var/run/api-platform/python-executor.sock"python3 /app/python-executor/main.py "${PY_ARGS[@]}" \ + --listen "${PYTHON_EXECUTOR_SOCKET}" \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-runtime/docker-entrypoint.sh` around lines 126 - 127, PYTHON_EXECUTOR_SOCKET is defined but not exported and the Python Executor is started without an explicit --listen, allowing env var bypass; export the PYTHON_EXECUTOR_SOCKET variable (so it becomes part of the environment), ensure the Python Executor startup command uses that variable explicitly as the --listen argument (use PYTHON_EXECUTOR_SOCKET rather than relying on defaults or PYTHON_EXECUTOR_LISTEN), and update the socket-wait loop to check the same exported PYTHON_EXECUTOR_SOCKET value; optionally clear or ignore PYTHON_EXECUTOR_LISTEN before launch to prevent operators from overriding the intended socket path.
🧹 Nitpick comments (1)
gateway/gateway-runtime/policy-engine/internal/config/config_test.go (1)
65-71: ⚡ Quick winMissing test cases for the new
python_executorvalidation rules.
config.gointroducespython_executor.server.mode,python_executor.server.port, andpython_executor.timeoutvalidation, but no test cases exercise these paths. Consider adding aTestValidate_PythonExecutorConfigtable test covering at minimum:
mode = "tcp"with valid portmode = "tcp"with out-of-range port (0, 65536)mode = "invalid"(unknown string)timeout = 0/ negative (should error)This follows the same pattern as the existing
TestValidate_ServerModeandTestValidate_ExtProcPorttests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-runtime/policy-engine/internal/config/config_test.go` around lines 65 - 71, Add a new table-driven test TestValidate_PythonExecutorConfig in config_test.go that exercises the validation rules for PythonExecutorConfig/PythonExecutorServerConfig: create cases for server.mode="tcp" with a valid port, server.mode="tcp" with out-of-range ports (0 and 65536) expecting errors, server.mode="invalid" expecting an unknown-mode error, and timeout values of 0 and a negative duration expecting validation errors; for each case construct a config with PythonExecutor: PythonExecutorConfig{Server: PythonExecutorServerConfig{Mode: "<case>", Port: <port>}, Timeout: <timeout>} and call the existing Validate() (or the package's validation entrypoint used by other tests like TestValidate_ServerMode/TestValidate_ExtProcPort) asserting error vs no-error per case and matching error messages where applicable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/gateway-runtime/policy-engine/internal/config/config.go`:
- Around line 399-407: The validation for c.PythonExecutor.Server currently only
checks Mode and Port but does not ensure Server.Host is non-empty for tcp mode;
update the switch branch for case "tcp" in the validation function to
additionally check that c.PythonExecutor.Server.Host is not an empty string and
return a descriptive error (e.g., "invalid python_executor.server.host: must be
non-empty when mode = 'tcp'") if it is; reference the
c.PythonExecutor.Server.Mode, c.PythonExecutor.Server.Host and
c.PythonExecutor.Server.Port fields and mirror the same non-empty host check
used by the PolicyEngine.Server validation block for consistency.
In `@gateway/gateway-runtime/policy-engine/internal/pythonbridge/client.go`:
- Around line 485-513: Init, GetStreamManager and IsAvailable currently always
use DefaultSocketPath for UDS so non-default UDS listen addresses are ignored;
update the code to read the UDS path from the PythonExecutorConfig (e.g.,
cfg.Server.Listen or add a new cfg.Server.Path) when mode != "tcp" and pass that
value into NewStreamManager (and into any calls inside
GetStreamManager/IsAvailable) instead of DefaultSocketPath; ensure the
streamManagerOnce initialization and any checks reference the configured address
variable so a Python executor started with a custom UDS path is reachable.
In `@gateway/PYTHON_DEBUG_GUIDE.md`:
- Around line 147-163: The controller launch snippet omits
APIP_GW_ROUTER_POLICY__ENGINE_HOST so the router (running in Docker) cannot
reach the host Policy Engine; update the example environment block to include
APIP_GW_ROUTER_POLICY__ENGINE_HOST (e.g., host.docker.internal or the host IP)
alongside APIP_GW_ROUTER_POLICY__ENGINE_MODE=tcp so the router can connect to
the host Policy Engine when using TCP.
- Around line 257-260: Replace the hardcoded credentials in the curl example
(the `-u admin:admin` portion of the shown curl command) with a placeholder or
environment-variable expansion so secrets aren't committed; for example change
it to use env vars like `-u "$ADMIN_USER:$ADMIN_PASS"` or a generic placeholder
such as `-u "<USERNAME>:<PASSWORD>"` and update the surrounding text/examples to
document setting those env vars.
---
Outside diff comments:
In `@gateway/gateway-runtime/docker-entrypoint.sh`:
- Around line 126-127: PYTHON_EXECUTOR_SOCKET is defined but not exported and
the Python Executor is started without an explicit --listen, allowing env var
bypass; export the PYTHON_EXECUTOR_SOCKET variable (so it becomes part of the
environment), ensure the Python Executor startup command uses that variable
explicitly as the --listen argument (use PYTHON_EXECUTOR_SOCKET rather than
relying on defaults or PYTHON_EXECUTOR_LISTEN), and update the socket-wait loop
to check the same exported PYTHON_EXECUTOR_SOCKET value; optionally clear or
ignore PYTHON_EXECUTOR_LISTEN before launch to prevent operators from overriding
the intended socket path.
---
Nitpick comments:
In `@gateway/gateway-runtime/policy-engine/internal/config/config_test.go`:
- Around line 65-71: Add a new table-driven test
TestValidate_PythonExecutorConfig in config_test.go that exercises the
validation rules for PythonExecutorConfig/PythonExecutorServerConfig: create
cases for server.mode="tcp" with a valid port, server.mode="tcp" with
out-of-range ports (0 and 65536) expecting errors, server.mode="invalid"
expecting an unknown-mode error, and timeout values of 0 and a negative duration
expecting validation errors; for each case construct a config with
PythonExecutor: PythonExecutorConfig{Server: PythonExecutorServerConfig{Mode:
"<case>", Port: <port>}, Timeout: <timeout>} and call the existing Validate()
(or the package's validation entrypoint used by other tests like
TestValidate_ServerMode/TestValidate_ExtProcPort) asserting error vs no-error
per case and matching error messages where applicable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 97500baf-241f-4e12-a913-fad44f91cf50
📒 Files selected for processing (15)
.vscode/launch.jsongateway/PYTHON_DEBUG_GUIDE.mdgateway/configs/config-template.tomlgateway/gateway-builder/internal/policyengine/generator.gogateway/gateway-builder/templates/plugin_registry.go.tmplgateway/gateway-runtime/docker-entrypoint.shgateway/gateway-runtime/policy-engine/cmd/policy-engine/main.gogateway/gateway-runtime/policy-engine/internal/config/config.gogateway/gateway-runtime/policy-engine/internal/config/config_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/bridge_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/client.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/client_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/factory.gogateway/gateway-runtime/python-executor/executor/server.pygateway/gateway-runtime/python-executor/main.py
💤 Files with no reviewable changes (1)
- gateway/gateway-builder/templates/plugin_registry.go.tmpl
Add support for configuring the Python executor Unix socket path and TCP mode, and wire that through runtime, config validation, tests, docs and the Docker entrypoint. - config: add PythonExecutorServer.Path, validate host when mode=tcp and ensure timeout > 0 - pythonbridge: use a configured socket/path and TCP flag for StreamManager and availability checks (fall back to default path when unset) - entrypoint: export PYTHON_EXECUTOR_SOCKET and start the python executor with --listen using the socket; unset PYTHON_EXECUTOR_LISTEN before launching - tests: add TestValidate_PythonExecutorConfig covering UDS/tcp, host/port validation and timeouts - docs: update PYTHON_DEBUG_GUIDE to clarify controller env var usage and authentication placeholder for API deploy These changes allow switching between UDS and TCP modes and customizing the UDS path reliably across components.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/gateway-runtime/policy-engine/internal/pythonbridge/client.go`:
- Around line 500-505: The TCP address construction in the mode=="tcp" branch
(building the variable address) incorrectly uses fmt.Sprintf("%s:%d", host,
cfg.Server.Port) which breaks IPv6 literals; change it to use
net.JoinHostPort(host, strconv.Itoa(cfg.Server.Port)) to correctly format IPv4,
IPv6 and hostnames, and add imports for net and strconv if not already present;
keep the existing host defaulting logic (host == "" -> "localhost") and assign
the result back to address.
In `@gateway/PYTHON_DEBUG_GUIDE.md`:
- Around line 67-75: The guide incorrectly instructs the reader to run a second
"cd gateway" in Step 8 which will end up in gateway/gateway; update Step 8 to
either remove the redundant "cd gateway" or add a clear sentence telling the
reader to return to the repository root before running the commands so the
subsequent relative paths (e.g., "go run ./gateway-builder/cmd/builder ..." and
the later compose commands at lines 248-250) resolve correctly; apply the same
clarification or removal for the repeated "cd gateway" instances at 248-250.
- Around line 181-187: Update the two fenced code blocks that show Python
Executor logs (the blocks starting with "Python Executor starting
(listen=localhost:9010, workers=4, ...)" and the other block around "Python
Executor ready on localhost:9010") to include a language tag (e.g., change ```
to ```text or ```log) so markdownlint no longer flags them; make the same change
for the other occurrence noted (the snippet around the "Python Executor ready"
lines).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7a3109e5-7441-4d9f-976e-c6612c551cfa
📒 Files selected for processing (15)
.vscode/launch.jsongateway/PYTHON_DEBUG_GUIDE.mdgateway/configs/config-template.tomlgateway/gateway-builder/internal/policyengine/generator.gogateway/gateway-builder/templates/plugin_registry.go.tmplgateway/gateway-runtime/docker-entrypoint.shgateway/gateway-runtime/policy-engine/cmd/policy-engine/main.gogateway/gateway-runtime/policy-engine/internal/config/config.gogateway/gateway-runtime/policy-engine/internal/config/config_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/bridge_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/client.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/client_test.gogateway/gateway-runtime/policy-engine/internal/pythonbridge/factory.gogateway/gateway-runtime/python-executor/executor/server.pygateway/gateway-runtime/python-executor/main.py
💤 Files with no reviewable changes (1)
- gateway/gateway-builder/templates/plugin_registry.go.tmpl
Use net.JoinHostPort with strconv.Itoa for building the host:port address in the pythonbridge client (adds strconv import). Update PYTHON_DEBUG_GUIDE.md by changing fenced blocks to text and removing redundant `cd gateway` lines and minor whitespace cleanup to improve clarity.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@gateway/PYTHON_DEBUG_GUIDE.md`:
- Around line 29-33: Remove the blank line between the adjacent admonition
blocks so the two blockquotes "[!NOTE]" and "[!WARNING]" are consecutive; edit
the markdown in PYTHON_DEBUG_GUIDE.md to delete the empty line separating the
NOTE block and the WARNING block (the lines starting with "[!NOTE]" and
"[!WARNING]") so MD028 is satisfied.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f5d7a9cc-5413-494d-9edc-89324b709c10
📒 Files selected for processing (2)
gateway/PYTHON_DEBUG_GUIDE.mdgateway/gateway-runtime/policy-engine/internal/pythonbridge/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
- gateway/gateway-runtime/policy-engine/internal/pythonbridge/client.go
Fix:
Purpose
Previously, communication between the Go Policy Engine and the Python Executor was hardcoded to use Unix Domain Sockets (UDS). This architectural constraint prevented the Python Executor from running in a separate container (e.g., for future GPU-isolated workloads) and made local development and debugging of Python policies extremely difficult, as developers could not easily attach IDE debuggers (like VS Code) or use tools like
pdbwithout complex container modifications.Goals
Approach
main.pyandserver.pyto accept a flexible--listenargument that automatically detects and binds to either a UDS path or a TCP address (e.g.,localhost:9010).--listen,--workers,--timeout, etc.) and environment variables.client.goto dial dynamically based on the configuration provided inconfig.toml([python_executor.server]).generator.goand the Python registry continue to use major-version keys (e.g.,prompt-compressor:v0) to maintain consistency with the Go Policy Engine's internal registry conventions.PYTHON_DEBUG_GUIDE.mddetailing the "Option 2" style local process debugging setup.Documentation
Created
gateway/PYTHON_DEBUG_GUIDE.mdto document the new local debugging workflow.Security checks
Test environment