Skip to content

Blob storage artifact-data keys, binary sanitizer, and observability stripping#2744

Open
mike-inkeep wants to merge 1 commit intostack/history_shapefrom
stack/artifact_binary_sanitizer
Open

Blob storage artifact-data keys, binary sanitizer, and observability stripping#2744
mike-inkeep wants to merge 1 commit intostack/history_shapefrom
stack/artifact_binary_sanitizer

Conversation

@mike-inkeep
Copy link
Contributor

No description provided.

@changeset-bot
Copy link

changeset-bot bot commented Mar 18, 2026

⚠️ No Changeset found

Latest commit: c7043e0

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Mar 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-manage-ui Ready Ready Preview, Comment Mar 18, 2026 2:48pm

Request Review

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(6) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

🟠 1) artifact-binary-sanitizer.ts:53 Missing error handling for storage upload failures

Issue: The uploadInlinePart function calls storage.upload() without a try-catch block. If the blob storage upload fails (network error, quota exceeded, permission denied), the promise rejection propagates and could fail the entire artifact processing pipeline.

Why: The existing pattern in image-upload.ts (lines 90-98) gracefully catches upload failures and drops the problematic part while preserving the rest of the content. This inconsistency creates a reliability gap where a transient storage failure could cause complete artifact processing failure rather than graceful degradation.

Fix: Add error handling around the upload call that either (1) returns the original part with base64 data intact as a fallback, or (2) replaces the data with an error placeholder. The first option preserves data at the cost of larger observability payloads; the second maintains observability cleanliness but loses the data.

Refs:


🟠 2) sanitizeArtifactBinaryData Exported function is not used in production code

Issue: The PR adds two functions: stripBinaryDataForObservability (used in 4 places in AgentSession.ts) and sanitizeArtifactBinaryData (an async function that uploads binary data to blob storage). However, sanitizeArtifactBinaryData is exported and tested but never imported or called anywhere in the codebase outside its test file.

Why: This means the artifact-data storage key category is added but no code path actually uploads artifacts using it. The infrastructure exists but is not wired up to any data flow. This may be intentional preparatory work, but it's unclear from the PR.

Fix: Either (1) defer adding sanitizeArtifactBinaryData until the consuming code is ready, (2) document in the PR description that this is preparatory infrastructure with follow-up work planned, or (3) wire up the function to the appropriate data flow (e.g., artifact persistence in ArtifactService).

Refs:

Inline Comments:

  • 🟠 Major: artifact-binary-sanitizer.ts:53 Missing error handling for storage upload failures

🟡 Minor (4) 🟡

🟡 1) storage-keys.ts:37-45 Missing unit tests for new artifact-data storage key category

Issue: The existing storage-keys.test.ts tests buildStorageKey for the media category but was not updated to test the new artifact-data category.
Why: The storage key format is critical for data integrity and multi-tenant isolation. A regression in the key format could cause lookup failures or tenant data leakage.
Fix: Add a test case in storage-keys.test.ts:

it('builds versioned artifact-data storage key', () => {
  const key = buildStorageKey({
    category: 'artifact-data',
    tenantId: 'tenant-1',
    projectId: 'project-1',
    artifactId: 'artifact-1',
    contentHash: 'abc123',
    ext: 'png',
  });
  expect(key).toBe('v1/t_tenant-1/artifact-data/p_project-1/a_artifact-1/sha256-abc123.png');
});

🟡 2) artifact-binary-sanitizer.ts:27 Magic number threshold for binary detection

Issue: The 100 character threshold for detecting binary data is hardcoded without explanation.
Why: This represents only ~75 decoded bytes. Without documentation, it's unclear if this was deliberately chosen or if a larger threshold would be more appropriate.
Fix: Extract to a named constant: const BINARY_DATA_MIN_LENGTH = 100; // ~75 decoded bytes

🟡 3) artifact-binary-sanitizer.test.ts Missing test for data: URI handling

Issue: The implementation checks !v.data.startsWith('data:') but there's no explicit test.
Why: Data URIs are common for inline images; without a test, this check could be accidentally removed.
Fix: Add test case for data URIs.

🟡 4) artifact-binary-sanitizer.test.ts Missing test for upload failure scenario

Issue: Tests mock upload to always succeed but never test the failure path.
Why: Masks the missing error handling in the implementation.
Fix: Add failure test case.

Inline Comments:

  • 🟡 Minor: artifact-binary-sanitizer.ts:19 Magic number threshold lacks documentation
  • 🟡 Minor: artifact-binary-sanitizer.test.ts:109 Missing test for upload failure handling
  • 🟡 Minor: artifact-binary-sanitizer.test.ts:57 Missing test for data: URI handling

💭 Consider (2) 💭

💭 1) storage-keys.ts Missing companion parse/prefix functions for artifact-data category

Issue: The media category has parseMediaStorageKey and buildMediaStorageKeyPrefix helpers, but artifact-data only has the buildStorageKey case.
Why: May be needed later for cleanup, authorization checks, or debugging. If not needed, this is acceptable.
Fix: Add a comment explaining why parse/prefix helpers are not needed, or add them for consistency.

💭 2) AgentSession.ts Pretty-print JSON in span attributes

Issue: Uses JSON.stringify(..., null, 2) for span attributes which adds unnecessary whitespace.
Why: Span attributes are stored and indexed; compact format reduces storage overhead.
Fix: Remove null, 2 arguments: JSON.stringify(stripBinaryDataForObservability(artifactData.data))


💡 APPROVE WITH SUGGESTIONS

Summary: This PR adds valuable infrastructure for handling binary data in artifacts. The stripBinaryDataForObservability function is correctly integrated and will improve observability by preventing large binary payloads in spans. However, two issues warrant attention: (1) the upload function lacks error handling unlike the existing image-upload.ts pattern, and (2) sanitizeArtifactBinaryData appears to be unused — clarification on whether this is intentional preparatory work would be helpful. Adding tests for the new storage key category would also strengthen the PR.

Discarded (3)
Location Issue Reason Discarded
artifact-binary-sanitizer.test.ts Missing test for missing mimeType fallback Low severity (INFO level); the behavior is implicitly covered and mimeType presence is common
artifact-binary-sanitizer.test.ts Missing boundary test at exactly 100 chars Nice-to-have but unlikely to cause real bugs; current tests cover above/below threshold adequately
AgentSession.ts spans Redundant JSON.stringify pretty-print Moved to Consider as it's a minor optimization, not a correctness issue
Reviewers (6)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-precision 3 1 1 0 0 0 1
pr-review-tests 6 2 0 0 2 0 2
pr-review-sre 4 1 1 0 1 0 1
pr-review-consistency 2 1 0 0 0 0 1
pr-review-standards 0 0 0 0 0 0 0
pr-review-types 0 0 0 0 0 0 0
Total 15 5 2 0 3 0 5

Note: Some findings were merged across reviewers (e.g., error handling raised by both tests and SRE reviewers).

ext,
});

await storage.upload({ key, data: buffer, contentType: mimeType });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Missing error handling for storage upload failures

Issue: The uploadInlinePart function calls storage.upload() without a try-catch. If the upload fails (network error, quota exceeded, permission denied), the error propagates and could fail the entire artifact processing pipeline.

Why: The existing pattern in image-upload.ts (lines 90-98) catches upload failures gracefully. This inconsistency could cause unhandled errors and partial artifact state when storage is temporarily unavailable.

Fix: Consider wrapping the upload in a try-catch:

try {
  await storage.upload({ key, data: buffer, contentType: mimeType });
  return { ...part, data: toBlobUri(key) };
} catch (error) {
  // Option 1: Return original part (fallback to base64)
  return part;
  // Option 2: Return placeholder
  // return { ...part, data: `[upload failed, binary ~${buffer.length} bytes]` };
}

Refs:

[key: string]: unknown;
};

const CIRCULAR_REFERENCE_PLACEHOLDER = '[Circular Reference]';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Magic number threshold lacks documentation

Issue: The 100 character threshold for detecting binary data is hardcoded without explanation. This represents only ~75 decoded bytes, which may be smaller than intended.

Why: Without documentation, it's unclear whether this threshold was deliberately chosen or if a larger value (e.g., 1KB) would better distinguish binary content from short encoded strings.

Fix:

Suggested change
const CIRCULAR_REFERENCE_PLACEHOLDER = '[Circular Reference]';
const BINARY_DATA_MIN_LENGTH = 100; // ~75 decoded bytes; data below this is likely metadata, not binary payload

Then use the constant in the check below.

download: vi.fn(),
delete: vi.fn(),
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Missing test for upload failure handling

Issue: The test suite mocks upload to always succeed but never tests the failure path.

Why: Given that image-upload.test.ts has a test for 'drops file part when storage.upload throws' (line 148), this gap masks the missing error handling in the implementation.

Fix: Add a test case:

it('propagates upload errors when blob storage fails', async () => {
  mockUpload.mockRejectedValueOnce(new Error('Storage quota exceeded'));
  const input = { type: 'image', data: LARGE_BASE64, mimeType: 'image/png' };
  await expect(sanitizeArtifactBinaryData(input, CTX)).rejects.toThrow('Storage quota exceeded');
});

Refs:

it('leaves http URLs untouched', () => {
const input = { type: 'image', data: 'https://example.com/img.png', mimeType: 'image/png' };
const result = stripBinaryDataForObservability(input) as any;
expect(result.data).toBe('https://example.com/img.png');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Missing test for data: URI handling

Issue: Line 30 of the implementation checks !v.data.startsWith('data:') but there's no explicit test verifying data URIs are left untouched.

Why: Data URIs are a common format for inline images. Without a test, a refactor could accidentally remove the check, causing data URIs to be re-uploaded unnecessarily.

Fix: Add test cases:

it('leaves data: URIs untouched', () => {
  const dataUri = 'data:image/png;base64,' + LARGE_BASE64;
  const input = { type: 'image', data: dataUri, mimeType: 'image/png' };
  const result = stripBinaryDataForObservability(input) as any;
  expect(result.data).toBe(dataUri);
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant