Blob storage artifact-data keys, binary sanitizer, and observability stripping#2744
Blob storage artifact-data keys, binary sanitizer, and observability stripping#2744mike-inkeep wants to merge 1 commit intostack/history_shapefrom
Conversation
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
PR Review Summary
(6) Total Issues | Risk: Medium
🟠⚠️ Major (2) 🟠⚠️
🟠 1) artifact-binary-sanitizer.ts:53 Missing error handling for storage upload failures
Issue: The uploadInlinePart function calls storage.upload() without a try-catch block. If the blob storage upload fails (network error, quota exceeded, permission denied), the promise rejection propagates and could fail the entire artifact processing pipeline.
Why: The existing pattern in image-upload.ts (lines 90-98) gracefully catches upload failures and drops the problematic part while preserving the rest of the content. This inconsistency creates a reliability gap where a transient storage failure could cause complete artifact processing failure rather than graceful degradation.
Fix: Add error handling around the upload call that either (1) returns the original part with base64 data intact as a fallback, or (2) replaces the data with an error placeholder. The first option preserves data at the cost of larger observability payloads; the second maintains observability cleanliness but loses the data.
Refs:
🟠 2) sanitizeArtifactBinaryData Exported function is not used in production code
Issue: The PR adds two functions: stripBinaryDataForObservability (used in 4 places in AgentSession.ts) and sanitizeArtifactBinaryData (an async function that uploads binary data to blob storage). However, sanitizeArtifactBinaryData is exported and tested but never imported or called anywhere in the codebase outside its test file.
Why: This means the artifact-data storage key category is added but no code path actually uploads artifacts using it. The infrastructure exists but is not wired up to any data flow. This may be intentional preparatory work, but it's unclear from the PR.
Fix: Either (1) defer adding sanitizeArtifactBinaryData until the consuming code is ready, (2) document in the PR description that this is preparatory infrastructure with follow-up work planned, or (3) wire up the function to the appropriate data flow (e.g., artifact persistence in ArtifactService).
Refs:
Inline Comments:
- 🟠 Major:
artifact-binary-sanitizer.ts:53Missing error handling for storage upload failures
🟡 Minor (4) 🟡
🟡 1) storage-keys.ts:37-45 Missing unit tests for new artifact-data storage key category
Issue: The existing storage-keys.test.ts tests buildStorageKey for the media category but was not updated to test the new artifact-data category.
Why: The storage key format is critical for data integrity and multi-tenant isolation. A regression in the key format could cause lookup failures or tenant data leakage.
Fix: Add a test case in storage-keys.test.ts:
it('builds versioned artifact-data storage key', () => {
const key = buildStorageKey({
category: 'artifact-data',
tenantId: 'tenant-1',
projectId: 'project-1',
artifactId: 'artifact-1',
contentHash: 'abc123',
ext: 'png',
});
expect(key).toBe('v1/t_tenant-1/artifact-data/p_project-1/a_artifact-1/sha256-abc123.png');
});🟡 2) artifact-binary-sanitizer.ts:27 Magic number threshold for binary detection
Issue: The 100 character threshold for detecting binary data is hardcoded without explanation.
Why: This represents only ~75 decoded bytes. Without documentation, it's unclear if this was deliberately chosen or if a larger threshold would be more appropriate.
Fix: Extract to a named constant: const BINARY_DATA_MIN_LENGTH = 100; // ~75 decoded bytes
🟡 3) artifact-binary-sanitizer.test.ts Missing test for data: URI handling
Issue: The implementation checks !v.data.startsWith('data:') but there's no explicit test.
Why: Data URIs are common for inline images; without a test, this check could be accidentally removed.
Fix: Add test case for data URIs.
🟡 4) artifact-binary-sanitizer.test.ts Missing test for upload failure scenario
Issue: Tests mock upload to always succeed but never test the failure path.
Why: Masks the missing error handling in the implementation.
Fix: Add failure test case.
Inline Comments:
- 🟡 Minor:
artifact-binary-sanitizer.ts:19Magic number threshold lacks documentation - 🟡 Minor:
artifact-binary-sanitizer.test.ts:109Missing test for upload failure handling - 🟡 Minor:
artifact-binary-sanitizer.test.ts:57Missing test for data: URI handling
💭 Consider (2) 💭
💭 1) storage-keys.ts Missing companion parse/prefix functions for artifact-data category
Issue: The media category has parseMediaStorageKey and buildMediaStorageKeyPrefix helpers, but artifact-data only has the buildStorageKey case.
Why: May be needed later for cleanup, authorization checks, or debugging. If not needed, this is acceptable.
Fix: Add a comment explaining why parse/prefix helpers are not needed, or add them for consistency.
💭 2) AgentSession.ts Pretty-print JSON in span attributes
Issue: Uses JSON.stringify(..., null, 2) for span attributes which adds unnecessary whitespace.
Why: Span attributes are stored and indexed; compact format reduces storage overhead.
Fix: Remove null, 2 arguments: JSON.stringify(stripBinaryDataForObservability(artifactData.data))
💡 APPROVE WITH SUGGESTIONS
Summary: This PR adds valuable infrastructure for handling binary data in artifacts. The stripBinaryDataForObservability function is correctly integrated and will improve observability by preventing large binary payloads in spans. However, two issues warrant attention: (1) the upload function lacks error handling unlike the existing image-upload.ts pattern, and (2) sanitizeArtifactBinaryData appears to be unused — clarification on whether this is intentional preparatory work would be helpful. Adding tests for the new storage key category would also strengthen the PR.
Discarded (3)
| Location | Issue | Reason Discarded |
|---|---|---|
artifact-binary-sanitizer.test.ts |
Missing test for missing mimeType fallback | Low severity (INFO level); the behavior is implicitly covered and mimeType presence is common |
artifact-binary-sanitizer.test.ts |
Missing boundary test at exactly 100 chars | Nice-to-have but unlikely to cause real bugs; current tests cover above/below threshold adequately |
AgentSession.ts spans |
Redundant JSON.stringify pretty-print |
Moved to Consider as it's a minor optimization, not a correctness issue |
Reviewers (6)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-precision |
3 | 1 | 1 | 0 | 0 | 0 | 1 |
pr-review-tests |
6 | 2 | 0 | 0 | 2 | 0 | 2 |
pr-review-sre |
4 | 1 | 1 | 0 | 1 | 0 | 1 |
pr-review-consistency |
2 | 1 | 0 | 0 | 0 | 0 | 1 |
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-types |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 15 | 5 | 2 | 0 | 3 | 0 | 5 |
Note: Some findings were merged across reviewers (e.g., error handling raised by both tests and SRE reviewers).
| ext, | ||
| }); | ||
|
|
||
| await storage.upload({ key, data: buffer, contentType: mimeType }); |
There was a problem hiding this comment.
🟠 MAJOR: Missing error handling for storage upload failures
Issue: The uploadInlinePart function calls storage.upload() without a try-catch. If the upload fails (network error, quota exceeded, permission denied), the error propagates and could fail the entire artifact processing pipeline.
Why: The existing pattern in image-upload.ts (lines 90-98) catches upload failures gracefully. This inconsistency could cause unhandled errors and partial artifact state when storage is temporarily unavailable.
Fix: Consider wrapping the upload in a try-catch:
try {
await storage.upload({ key, data: buffer, contentType: mimeType });
return { ...part, data: toBlobUri(key) };
} catch (error) {
// Option 1: Return original part (fallback to base64)
return part;
// Option 2: Return placeholder
// return { ...part, data: `[upload failed, binary ~${buffer.length} bytes]` };
}Refs:
| [key: string]: unknown; | ||
| }; | ||
|
|
||
| const CIRCULAR_REFERENCE_PLACEHOLDER = '[Circular Reference]'; |
There was a problem hiding this comment.
🟡 Minor: Magic number threshold lacks documentation
Issue: The 100 character threshold for detecting binary data is hardcoded without explanation. This represents only ~75 decoded bytes, which may be smaller than intended.
Why: Without documentation, it's unclear whether this threshold was deliberately chosen or if a larger value (e.g., 1KB) would better distinguish binary content from short encoded strings.
Fix:
| const CIRCULAR_REFERENCE_PLACEHOLDER = '[Circular Reference]'; | |
| const BINARY_DATA_MIN_LENGTH = 100; // ~75 decoded bytes; data below this is likely metadata, not binary payload |
Then use the constant in the check below.
| download: vi.fn(), | ||
| delete: vi.fn(), | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🟡 Minor: Missing test for upload failure handling
Issue: The test suite mocks upload to always succeed but never tests the failure path.
Why: Given that image-upload.test.ts has a test for 'drops file part when storage.upload throws' (line 148), this gap masks the missing error handling in the implementation.
Fix: Add a test case:
it('propagates upload errors when blob storage fails', async () => {
mockUpload.mockRejectedValueOnce(new Error('Storage quota exceeded'));
const input = { type: 'image', data: LARGE_BASE64, mimeType: 'image/png' };
await expect(sanitizeArtifactBinaryData(input, CTX)).rejects.toThrow('Storage quota exceeded');
});Refs:
| it('leaves http URLs untouched', () => { | ||
| const input = { type: 'image', data: 'https://example.com/img.png', mimeType: 'image/png' }; | ||
| const result = stripBinaryDataForObservability(input) as any; | ||
| expect(result.data).toBe('https://example.com/img.png'); |
There was a problem hiding this comment.
🟡 Minor: Missing test for data: URI handling
Issue: Line 30 of the implementation checks !v.data.startsWith('data:') but there's no explicit test verifying data URIs are left untouched.
Why: Data URIs are a common format for inline images. Without a test, a refactor could accidentally remove the check, causing data URIs to be re-uploaded unnecessarily.
Fix: Add test cases:
it('leaves data: URIs untouched', () => {
const dataUri = 'data:image/png;base64,' + LARGE_BASE64;
const input = { type: 'image', data: dataUri, mimeType: 'image/png' };
const result = stripBinaryDataForObservability(input) as any;
expect(result.data).toBe(dataUri);
});
No description provided.