Skip to content

Conversation

@betterclever
Copy link
Contributor

ShipSec Improvements: Node I/O Inspector & Workflow Overhaul

This branch (feature/node-io-inspector-and-fixes) introduces the Node I/O Inspector, a major enhancement to workflow observability, along with critical robustness fixes for handling large data payloads.

1. Node I/O Inspector & Registry

  • Persistent Telemetry: Added a dedicated node_io database schema to store inputs and outputs for every node execution.
  • Backend Service: Developed a new NodeIOService to manage the recording and retrieval of node data.
  • Inspector API: Implemented REST endpoints to allow the frontend to visualize exactly what data passed through each node.
  • Optimized Tracing: Refined the trace recording logic to provide real-time updates on node status (Started, Completed, Failed).

2. Robust Large Payload Handling (Data Spilling)

  • Automatic Spilling: Implemented a "spill" mechanism for inputs and outputs exceeding 2MB. Large payloads are automatically moved to object storage (MinIO/S3), replacing the data in Temporal with a lightweight marker.
  • Transparent Rehydration: Updated the activity runner to automatically download and "unspill" data before passing it to downstream components. Components receive the full data without needing to know it was spilled.
  • UI Previews: Large spilled outputs now show truncated previews in the UI, with a "View Full Output" modal for deep inspection.

3. Docker Runner & Reliability Fixes

  • Volume-Based Input Delivery: Switched from environment variables to file-based input delivery (input.json). This resolves E2BIG: argument list too long errors when passing massive datasets (e.g., thousands of Prowler findings) to components like the Script node.
  • PTY Streaming Fixes:
    • Resolved a critical macOS permission issue for node-pty that prevented terminal log streaming.
    • Fixed stdin pollution in TTY mode to ensure clean logs.
  • Robust Docker Detection: Added a proactively resolved Docker path utility to support multiple Docker backends (OrbStack, Homebrew, etc.).

4. Component Enhancements

  • Prowler Scan: Integrated formal AWS credential contracts and optimized permissions for output volumes.
  • Logic Script: Upgraded the script runner to handle massive JSON inputs via mounted volumes, enabling complex post-processing of large security scans.

5. Maintenance

  • Standardized spill markers across the codebase.
  • Updated core dependencies: node-pty, @ai-sdk, and @aws-sdk.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if (portMetadata?.dataType) {
const coercion = coerceValueForPort(portMetadata.dataType, resolved);
if (coercion.ok) {
params[targetKey] = coercion.value;

P1 Badge Skip coercion for spilled markers before unspill

When an upstream output is spilled, resolveInputValue returns a spill-marker object (with __spilled__/storageRef). The subsequent coerceValueForPort(...) runs before the activity unspills, so any non-JSON/any port (e.g., text, number, list) receives an object and fails coercion. That turns the input into a warning and ultimately a “missing required inputs” error, so large outputs (> spill threshold) can no longer flow to typed ports. Consider bypassing coercion when the resolved value is a spill marker (or deferring coercion until after unspill in the activity).

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ection

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
… fix variable editor types

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…or node I/O

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…e I/O

- Add downloadFilePreview to StorageService for truncated previews
- Add full=true query param to fetch complete spilled data
- Add log message chunking in KafkaLogAdapter (100k chars per chunk)
- Add E2E test for node-io-spilling verification
- Fix organizationId handling in NodeIOIngestService for local dev
- Add INodeIOService interface to component-sdk

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…for Docker

- Mount /shipsec-output in containers for structured output
- Components write results to SHIPSEC_OUTPUT_PATH instead of stdout
- Stdout now used purely for logs (goes to log system with chunking)
- Remove stdout->trace progress events (fixes trace pollution)
- Remove RESULT_START/RESULT_END marker parsing hack

This fixes the issue where large outputs created 90+ NODE_PROGRESS
trace events by streaming stdout chunks to the trace system.

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Add shared constants.ts with KAFKA_SPILL_THRESHOLD_BYTES (100KB),
  TEMPORAL_SPILL_THRESHOLD_BYTES (2MB), LOG_CHUNK_SIZE_CHARS (100k)
- Add standardized SpilledDataMarker interface with __spilled__ flag
- Add isSpilledDataMarker() type guard and createSpilledMarker() helper
- Update KafkaNodeIOAdapter to use standardized marker format
- Update NodeIOService to detect both new and legacy marker formats
- Update activity to use shared constant for temporal spill
- Update log adapter to use shared chunk size constant
- Clean up verbose debug console.log statements in activity

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Previously, preview fetch errors were silently swallowed. Now we
log a warning so issues can be diagnosed.

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
- Update readOutputFromFile to attempt file read first, then fall back to stdout
- Allows legacy components (e.g. security tools writing to stdout) to still function
- Captures stdout in both standard and PTY modes
- Returns raw string if stdout is not JSON, simplifying parser logic for tools like subfinder

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…ctor layout

This commit adds a 'View Full' functionality for large/spilled node inputs and outputs using a MessageModal, updates the backend and API client to support full data retrieval, and improves the NodeIOInspector UI with a light-themed, single-column full-width layout.

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
This commit ensures strict chronological ordering and database persistence for node I/O events. It uses the runId as a Kafka key for partition affinity, awaits recording operations in the worker, and implements conditional status logic in the repository to prevent out-of-order events from leaving nodes in a perceived 'running' state.

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
…e-based inputs

- Fix node-pty spawn-helper permissions on macOS
- Implement 'unspilling' logic in runComponentActivity to automatically resolve large outputs stored in object storage
- Switch logic-script component to read inputs from a mounted file instead of env/stdin to avoid OS argument limits (E2BIG)
- Add resolveDockerPath utility for more robust Docker detection
- Enhance runner with support for explicit stdin disabling

Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
Signed-off-by: betterclever <paliwal.pranjal83@gmail.com>
@betterclever betterclever force-pushed the feature/node-io-inspector-and-fixes branch from 1f90383 to 4a796fe Compare January 7, 2026 01:41
@betterclever betterclever merged commit 963b482 into main Jan 8, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants