-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Node I/O Inspector and Robust Large Payload Handling #204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
studio/worker/src/temporal/input-resolver.ts
Lines 94 to 97 in ac99037
| if (portMetadata?.dataType) { | |
| const coercion = coerceValueForPort(portMetadata.dataType, resolved); | |
| if (coercion.ok) { | |
| params[targetKey] = coercion.value; |
When an upstream output is spilled, resolveInputValue returns a spill-marker object (with __spilled__/storageRef). The subsequent coerceValueForPort(...) runs before the activity unspills, so any non-JSON/any port (e.g., text, number, list) receives an object and fails coercion. That turns the input into a warning and ultimately a “missing required inputs” error, so large outputs (> spill threshold) can no longer flow to typed ports. Consider bypassing coercion when the resolved value is a spill marker (or deferring coercion until after unspill in the activity).
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ection Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
… fix variable editor types Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…or node I/O Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…e I/O - Add downloadFilePreview to StorageService for truncated previews - Add full=true query param to fetch complete spilled data - Add log message chunking in KafkaLogAdapter (100k chars per chunk) - Add E2E test for node-io-spilling verification - Fix organizationId handling in NodeIOIngestService for local dev - Add INodeIOService interface to component-sdk Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…for Docker - Mount /shipsec-output in containers for structured output - Components write results to SHIPSEC_OUTPUT_PATH instead of stdout - Stdout now used purely for logs (goes to log system with chunking) - Remove stdout->trace progress events (fixes trace pollution) - Remove RESULT_START/RESULT_END marker parsing hack This fixes the issue where large outputs created 90+ NODE_PROGRESS trace events by streaming stdout chunks to the trace system. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add shared constants.ts with KAFKA_SPILL_THRESHOLD_BYTES (100KB), TEMPORAL_SPILL_THRESHOLD_BYTES (2MB), LOG_CHUNK_SIZE_CHARS (100k) - Add standardized SpilledDataMarker interface with __spilled__ flag - Add isSpilledDataMarker() type guard and createSpilledMarker() helper - Update KafkaNodeIOAdapter to use standardized marker format - Update NodeIOService to detect both new and legacy marker formats - Update activity to use shared constant for temporal spill - Update log adapter to use shared chunk size constant - Clean up verbose debug console.log statements in activity Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Previously, preview fetch errors were silently swallowed. Now we log a warning so issues can be diagnosed. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Update readOutputFromFile to attempt file read first, then fall back to stdout - Allows legacy components (e.g. security tools writing to stdout) to still function - Captures stdout in both standard and PTY modes - Returns raw string if stdout is not JSON, simplifying parser logic for tools like subfinder Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ctor layout This commit adds a 'View Full' functionality for large/spilled node inputs and outputs using a MessageModal, updates the backend and API client to support full data retrieval, and improves the NodeIOInspector UI with a light-themed, single-column full-width layout. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
This commit ensures strict chronological ordering and database persistence for node I/O events. It uses the runId as a Kafka key for partition affinity, awaits recording operations in the worker, and implements conditional status logic in the repository to prevent out-of-order events from leaving nodes in a perceived 'running' state. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…e-based inputs - Fix node-pty spawn-helper permissions on macOS - Implement 'unspilling' logic in runComponentActivity to automatically resolve large outputs stored in object storage - Switch logic-script component to read inputs from a mounted file instead of env/stdin to avoid OS argument limits (E2BIG) - Add resolveDockerPath utility for more robust Docker detection - Enhance runner with support for explicit stdin disabling Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
1f90383 to
4a796fe
Compare
ShipSec Improvements: Node I/O Inspector & Workflow Overhaul
This branch (
feature/node-io-inspector-and-fixes) introduces the Node I/O Inspector, a major enhancement to workflow observability, along with critical robustness fixes for handling large data payloads.1. Node I/O Inspector & Registry
node_iodatabase schema to store inputs and outputs for every node execution.NodeIOServiceto manage the recording and retrieval of node data.2. Robust Large Payload Handling (Data Spilling)
3. Docker Runner & Reliability Fixes
input.json). This resolvesE2BIG: argument list too longerrors when passing massive datasets (e.g., thousands of Prowler findings) to components like the Script node.node-ptythat prevented terminal log streaming.stdinpollution in TTY mode to ensure clean logs.4. Component Enhancements
5. Maintenance
node-pty,@ai-sdk, and@aws-sdk.