feat: Node I/O Inspector and Robust Large Payload Handling #204

betterclever · 2026-01-06T16:29:57Z

ShipSec Improvements: Node I/O Inspector & Workflow Overhaul

This branch (feature/node-io-inspector-and-fixes) introduces the Node I/O Inspector, a major enhancement to workflow observability, along with critical robustness fixes for handling large data payloads.

1. Node I/O Inspector & Registry

Persistent Telemetry: Added a dedicated node_io database schema to store inputs and outputs for every node execution.
Backend Service: Developed a new NodeIOService to manage the recording and retrieval of node data.
Inspector API: Implemented REST endpoints to allow the frontend to visualize exactly what data passed through each node.
Optimized Tracing: Refined the trace recording logic to provide real-time updates on node status (Started, Completed, Failed).

2. Robust Large Payload Handling (Data Spilling)

Automatic Spilling: Implemented a "spill" mechanism for inputs and outputs exceeding 2MB. Large payloads are automatically moved to object storage (MinIO/S3), replacing the data in Temporal with a lightweight marker.
Transparent Rehydration: Updated the activity runner to automatically download and "unspill" data before passing it to downstream components. Components receive the full data without needing to know it was spilled.
UI Previews: Large spilled outputs now show truncated previews in the UI, with a "View Full Output" modal for deep inspection.

3. Docker Runner & Reliability Fixes

Volume-Based Input Delivery: Switched from environment variables to file-based input delivery (input.json). This resolves E2BIG: argument list too long errors when passing massive datasets (e.g., thousands of Prowler findings) to components like the Script node.
PTY Streaming Fixes:
- Resolved a critical macOS permission issue for node-pty that prevented terminal log streaming.
- Fixed stdin pollution in TTY mode to ensure clean logs.
Robust Docker Detection: Added a proactively resolved Docker path utility to support multiple Docker backends (OrbStack, Homebrew, etc.).

4. Component Enhancements

Prowler Scan: Integrated formal AWS credential contracts and optimized permissions for output volumes.
Logic Script: Upgraded the script runner to handle massive JSON inputs via mounted volumes, enabling complex post-processing of large security scans.

5. Maintenance

Standardized spill markers across the codebase.
Updated core dependencies: node-pty, @ai-sdk, and @aws-sdk.

chatgpt-codex-connector

💡 Codex Review

studio/worker/src/temporal/input-resolver.ts

Lines 94 to 97 in ac99037

    
           if (portMetadata?.dataType) { 
        
             const coercion = coerceValueForPort(portMetadata.dataType, resolved); 
        
             if (coercion.ok) { 
        
               params[targetKey] = coercion.value;

Skip coercion for spilled markers before unspill

When an upstream output is spilled, resolveInputValue returns a spill-marker object (with __spilled__/storageRef). The subsequent coerceValueForPort(...) runs before the activity unspills, so any non-JSON/any port (e.g., text, number, list) receives an object and fails coercion. That turns the input into a warning and ultimately a “missing required inputs” error, so large outputs (> spill threshold) can no longer flow to typed ports. Consider bypassing coercion when the resolved value is a spill marker (or deferring coercion until after unspill in the activity).

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…ection Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

… fix variable editor types Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…or node I/O Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…e I/O - Add downloadFilePreview to StorageService for truncated previews - Add full=true query param to fetch complete spilled data - Add log message chunking in KafkaLogAdapter (100k chars per chunk) - Add E2E test for node-io-spilling verification - Fix organizationId handling in NodeIOIngestService for local dev - Add INodeIOService interface to component-sdk Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…for Docker - Mount /shipsec-output in containers for structured output - Components write results to SHIPSEC_OUTPUT_PATH instead of stdout - Stdout now used purely for logs (goes to log system with chunking) - Remove stdout->trace progress events (fixes trace pollution) - Remove RESULT_START/RESULT_END marker parsing hack This fixes the issue where large outputs created 90+ NODE_PROGRESS trace events by streaming stdout chunks to the trace system. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

- Add shared constants.ts with KAFKA_SPILL_THRESHOLD_BYTES (100KB), TEMPORAL_SPILL_THRESHOLD_BYTES (2MB), LOG_CHUNK_SIZE_CHARS (100k) - Add standardized SpilledDataMarker interface with __spilled__ flag - Add isSpilledDataMarker() type guard and createSpilledMarker() helper - Update KafkaNodeIOAdapter to use standardized marker format - Update NodeIOService to detect both new and legacy marker formats - Update activity to use shared constant for temporal spill - Update log adapter to use shared chunk size constant - Clean up verbose debug console.log statements in activity Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Previously, preview fetch errors were silently swallowed. Now we log a warning so issues can be diagnosed. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

- Update readOutputFromFile to attempt file read first, then fall back to stdout - Allows legacy components (e.g. security tools writing to stdout) to still function - Captures stdout in both standard and PTY modes - Returns raw string if stdout is not JSON, simplifying parser logic for tools like subfinder Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…ctor layout This commit adds a 'View Full' functionality for large/spilled node inputs and outputs using a MessageModal, updates the backend and API client to support full data retrieval, and improves the NodeIOInspector UI with a light-themed, single-column full-width layout. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

This commit ensures strict chronological ordering and database persistence for node I/O events. It uses the runId as a Kafka key for partition affinity, awaits recording operations in the worker, and implements conditional status logic in the repository to prevent out-of-order events from leaving nodes in a perceived 'running' state. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

…e-based inputs - Fix node-pty spawn-helper permissions on macOS - Implement 'unspilling' logic in runComponentActivity to automatically resolve large outputs stored in object storage - Switch logic-script component to read inputs from a mounted file instead of env/stdin to avoid OS argument limits (E2BIG) - Add resolveDockerPath utility for more robust Docker detection - Enhance runner with support for explicit stdin disabling Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

chatgpt-codex-connector bot reviewed Jan 6, 2026

View reviewed changes

betterclever added 16 commits January 7, 2026 07:11

feat(backend): add node_io table schema for storing node inputs/outputs

617fe33

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

feat(backend): add NodeIO service and API endpoints for node I/O insp…

5647763

…ection Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

feat(worker): implement node I/O persistence and trace optimization

6fa1454

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

feat(telemetry): implement node I/O persistence and inspection

9e3e04a

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

feat(security): integrate AWS credential contract in prowler-scan and…

447b160

… fix variable editor types Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

feat(telemetry): implement large payload spilling to object storage f…

d5d05d9

…or node I/O Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

fix(node-io): add warning logs for preview fetch failures

86ce5ce

Previously, preview fetch errors were silently swallowed. Now we log a warning so issues can be diagnosed. Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

chore: Update node-pty, @ai-sdk, and @aws-sdk dependencies.

70892d9

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

fix(worker): bypass input coercion for spilled data markers

4a796fe

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>

betterclever force-pushed the feature/node-io-inspector-and-fixes branch from 1f90383 to 4a796fe Compare January 7, 2026 01:41

betterclever merged commit 963b482 into main Jan 8, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Node I/O Inspector and Robust Large Payload Handling #204

feat: Node I/O Inspector and Robust Large Payload Handling #204

Uh oh!

betterclever commented Jan 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (portMetadata?.dataType) {
	const coercion = coerceValueForPort(portMetadata.dataType, resolved);
	if (coercion.ok) {
	params[targetKey] = coercion.value;

feat: Node I/O Inspector and Robust Large Payload Handling #204

feat: Node I/O Inspector and Robust Large Payload Handling #204

Uh oh!

Conversation

betterclever commented Jan 6, 2026

ShipSec Improvements: Node I/O Inspector & Workflow Overhaul

1. Node I/O Inspector & Registry

2. Robust Large Payload Handling (Data Spilling)

3. Docker Runner & Reliability Fixes

4. Component Enhancements

5. Maintenance

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants