V2 Fancy Records #66

micahbf · 2025-08-07T21:52:39Z

Adds support for working the V2 records API using fancy records aka "items" aka records from the @flatfile/records package.

Instead of bringing in the records package as a dependency, I just copied over the FlatfileRecord implementation, and made some small changes to how "dirtiness" works to simplify creating new records.

More context there:
FlatfileRecord has a concept of dirty which means the record object has changes which need to be synced to the server with a write. For new records which you intend to create, they are dirty by definition since they don't exist on the server at all yet, so they need to be written. In the existing implementation, you would need to pass a special flag to the constructor (new FlatfileRecord(data, true)) so it would be marked as a new/dirty record.

I simplified this so the absence of a __k id parameter is used to signify a new record.

Don't take it from me, here's what amp has to say:

Add FlatfileRecord-based v2 Records API with Rich Object Interface

Overview

This PR introduces a new v2 Records API that provides a rich object-oriented interface for working with Flatfile records through the FlatfileRecord class. This complements the existing raw JSONL API with a more developer-friendly approach that includes automatic type casting, validation message handling, and intelligent change tracking.

Key Features

New API Methods

get() and getStreaming(): Fetch records as FlatfileRecord objects instead of raw JSONL
write() and writeStreaming(): Write FlatfileRecord objects with automatic changeset detection
Existing raw methods preserved: getRaw(), getRawStreaming(), writeRaw(), writeRawStreaming()

Detailed Method Documentation

`get(sheetId, options?, requestOptions?)`

Fetches records from a sheet and returns them as FlatfileRecord objects with rich manipulation capabilities.

Parameters:

sheetId: Flatfile.SheetId - Target sheet identifier
options?: GetRecordsRequestOptions - Query options (filters, pagination, etc.)
requestOptions?: RequestOptions - HTTP request configuration

Returns: Promise<FlatfileRecord[]>

Key Features:

Automatic conversion from JSONL to FlatfileRecord objects
Preserves all metadata (timestamps, validation messages, links)
Graceful error handling (skips malformed JSONL lines)
Rich object interface immediately available

Example:

const records = await recordsV2.get('us_sh_123', {
  includeTimestamps: true,
  pageSize: 100,
  filter: 'valid'
});

for (const record of records) {
  console.log(`ID: ${record.id}, Name: ${record.str('firstName')}`);
  
  if (record.hasError()) {
    console.log('Validation errors:', record.errorFields());
  }
  
  // Immediate manipulation
  record.set('processed', true);
  record.info('status', 'Processed by API');
}

`getStreaming(sheetId, options?, requestOptions?)`

Streams records as FlatfileRecord objects for memory-efficient processing of large datasets.

Parameters:

Same as get() method

Returns: AsyncGenerator<FlatfileRecord, void, unknown>

Key Features:

Memory-efficient streaming (processes one record at a time)
Automatic JSONL parsing and conversion
Supports both ReadableStream (modern browsers) and fallback mechanisms
Perfect for large datasets that don't fit in memory

Example:

let processedCount = 0;
const batchSize = 1000;
const batch: FlatfileRecord[] = [];

for await (const record of recordsV2.getStreaming('us_sh_123', {
  includeTimestamps: true
})) {
  // Process and modify records
  if (record.str('email').includes('invalid')) {
    record.err('email', 'Invalid email format');
  }
  
  record.set('lastProcessed', new Date().toISOString());
  batch.push(record);
  
  // Write in batches
  if (batch.length >= batchSize) {
    await recordsV2.write(batch);
    batch.length = 0; // Clear batch
    processedCount += batchSize;
    console.log(`Processed ${processedCount} records`);
  }
}

// Write remaining records
if (batch.length > 0) {
  await recordsV2.write(batch);
}

`write(records, options?, requestOptions?)`

Writes an array of FlatfileRecord objects to Flatfile with intelligent changeset detection.

Parameters:

records: FlatfileRecord[] - Array of records to write
options?: WriteRecordsRequestOptions - Write configuration
requestOptions?: RequestOptions - HTTP request configuration

Returns: Promise<WriteRecordsResponse>

Write Modes:

Changeset Mode (Default)

await recordsV2.write(records);

Only writes records where isDirty() returns true
Transmits only changed fields for existing records (with __k)
Transmits all data for new records (without __k)
Optimal performance and minimal data transfer
Throws error if no dirty records found

Truncate Mode

await recordsV2.write(records, { truncate: true });

Writes all provided records completely (ignores dirty state)
Replaces entire dataset with provided records
Filters out temporary deleted records (TEMP_* IDs)
Useful for complete data replacement scenarios

Changeset Logic Examples:

// New record - includes all data in changeset
const newRecord = new FlatfileRecord({
  firstName: 'John',
  lastName: 'Doe'
});
console.log(newRecord.changeset());
// Output: { firstName: 'John', lastName: 'Doe' }

// Existing record - only includes changes
const existingRecord = new FlatfileRecord({
  __k: 'us_rc_123',
  firstName: 'John',
  lastName: 'Doe'
});
existingRecord.set('email', 'john@example.com');
console.log(existingRecord.changeset());
// Output: { __k: 'us_rc_123', email: 'john@example.com' }

Error Handling:

try {
  const result = await recordsV2.write(records);
  console.log(`Created: ${result.created}, Updated: ${result.updated}`);
} catch (error) {
  if (error.message.includes('No changes made')) {
    console.log('All records are already up to date');
  } else {
    console.error('Write failed:', error);
  }
}

Post-Write Behavior:

All records are automatically committed (commit() called)
Dirty state is cleared for all records
Validation messages are serialized into record data
Records are ready for the next modification cycle

`writeStreaming(recordsStream, options?, requestOptions?)`

Streams FlatfileRecord objects directly to Flatfile using HTTP body streaming for memory-efficient writes.

Parameters:

recordsStream: AsyncIterable<FlatfileRecord> - Async generator/iterator of records
options?: WriteStreamingOptions - Write configuration
requestOptions?: RequestOptions - HTTP request configuration

Returns: Promise<WriteRecordsResponse>

Key Features:

Memory-efficient streaming (never loads all records into memory)
Supports same changeset vs truncate logic as write()
Real-time processing and transmission
Perfect for ETL pipelines and large data imports

Example:

async function* processAndStreamRecords() {
  for await (const rawRecord of someDataSource()) {
    const record = new FlatfileRecord(rawRecord);
    
    // Apply business logic
    if (record.str('status') === 'pending') {
      record.set('status', 'processed');
      record.set('processedAt', new Date().toISOString());
      record.info('processor', 'Automated validation');
    }
    
    // Validate email
    if (!record.str('email').includes('@')) {
      record.err('email', 'Invalid email format');
    }
    
    yield record;
  }
}

// Stream directly to Flatfile
const result = await recordsV2.writeStreaming(
  processAndStreamRecords(),
  { sheetId: 'us_sh_123' }
);

Advanced Streaming Pattern:

// Transform and stream with batching
async function* batchedTransform(source: AsyncIterable<any>, batchSize = 100) {
  const batch: FlatfileRecord[] = [];
  
  for await (const item of source) {
    const record = new FlatfileRecord(item);
    
    // Apply transformations
    record.set('transformed', true);
    record.set('transformedAt', Date.now());
    
    batch.push(record);
    
    if (batch.length >= batchSize) {
      // Yield batch and clear
      for (const r of batch) yield r;
      batch.length = 0;
    }
  }
  
  // Yield remaining records
  for (const r of batch) yield r;
}

await recordsV2.writeStreaming(
  batchedTransform(largeDataSource),
  { truncate: true, sheetId: 'us_sh_123' }
);

Method Comparison

Feature	`get()`	`getStreaming()`	`write()`	`writeStreaming()`
Memory Usage	High (loads all)	Low (one at a time)	Medium (batch)	Low (streaming)
Best For	Small-medium datasets	Large datasets	Batch operations	ETL pipelines
Return Type	`FlatfileRecord[]`	`AsyncGenerator`	`WriteRecordsResponse`	`WriteRecordsResponse`
Changeset Support	N/A	N/A	✅	✅
Truncate Support	N/A	N/A	✅	✅
Auto-commit	N/A	N/A	✅	✅

FlatfileRecord Class Features

Type-safe field access: str(), num(), bool(), date() with automatic casting
Validation handling: err(), warn(), info() for managing field-level messages
Smart change tracking: Only dirty records are written, with intelligent changeset detection
Rich API: has(), isEmpty(), flag(), unflag(), delete(), and more

Implementation Details

Browser Compatibility

The FlatfileRecord class has been adapted from an existing implementation to be fully browser-compatible:

Removed Node.js-specific dependencies (util, crypto.randomUUID)
Added browser-compatible alternatives for UUID generation and object inspection
Created utility functions for type casting (asString, asBool, asNumber, asDate)

JsonlRecord Type Integration

Enhanced the JsonlRecord interface to support link metadata:

export interface JsonlRecord {
    __l?: Array<{ __x?: string; [key: string]: any }>;
    // ... other fields
}

This allows the getLinks() method to filter by link type without complex union types.

Smart Dirty State Management

Implemented sophisticated dirty state tracking that ensures semantic alignment between isDirty() and changeset():

Original Issue

The initial implementation had a critical flaw where new records (without __k) would always return true from isDirty(), even after being committed. This meant:

Records would be unnecessarily re-written on every save operation
No way to determine if a new record actually had pending changes
Semantic mismatch between "dirty" state and actual need to write

Solution

Added a _committed flag to track commit state:

New records: isDirty() returns true until commit() is called
After commit: isDirty() returns false for clean records
State modifications: Any change resets _committed = false

Methods that affect dirty state:

set() - field changes
err(), warn(), info() - validation messages
delete() - record deletion
setDirty() - manual dirty marking

Changeset Logic

The changeset() method provides intelligent diff detection:

New records (no __k): Returns all data fields
Existing records: Returns only changed fields
Consistent with dirty state: Only includes data that needs to be written

Manual Dirty Marking

Added setDirty() method for edge cases:

record.setDirty(); // Force record to be written on next save

Useful when you suspect external modifications or need to force a full record write.

Usage Examples

Basic Usage

import { FlatfileApi } from '@flatfile/api';
import { FlatfileRecord } from '@flatfile/api/v2';

const recordsV2 = new FlatfileApi().v2.records;

// Get records as FlatfileRecord objects
const records = await recordsV2.get('sheet_id');
for (const record of records) {
    console.log(`Name: ${record.str('firstName')} ${record.str('lastName')}`);
    if (record.hasError()) {
        console.log('Record has validation errors');
    }
}

// Modify and write back
records[0].set('email', 'new@example.com');
records[0].err('email', 'Invalid format');
await recordsV2.write(records); // Only writes dirty records

Streaming

// Stream large datasets efficiently
for await (const record of recordsV2.getStreaming('sheet_id')) {
    record.set('processed', true);
    if (record.isDirty()) {
        await recordsV2.write([record]);
    }
}

Write Modes

Changeset Mode (Default)

await recordsV2.write(records); // Only writes dirty records with changesets

Truncate Mode

await recordsV2.write(records, { truncate: true }); // Writes all records completely

Technical Implementation Details

Method Architecture

All four new methods are built on top of the existing raw JSONL methods, providing a rich wrapper layer:

// get() wraps getRaw()
public async get(sheetId, options, requestOptions) {
    const rawRecords = await this.getRaw(sheetId, options, requestOptions);
    return rawRecords.map(record => new FlatfileRecord(record));
}

// write() wraps writeRaw() with intelligent filtering
public async write(records, options, requestOptions) {
    const dirtyRecords = records.filter(r => r.isDirty());
    const jsonlBody = dirtyRecords.map(r => JSON.stringify(r.changeset()));
    const result = await this.writeRaw(jsonlBody, options, requestOptions);
    records.forEach(r => r.commit()); // Clear dirty state
    return result;
}

Streaming Implementation

The streaming methods use modern web APIs with graceful fallbacks:

// Browser-compatible streaming with ReadableStream
private async *_streamJsonlResponse(response: Response): AsyncGenerator<JsonlRecord> {
    if (!response.body) throw new Error("Response body is null");
    
    const reader = response.body.getReader();
    const decoder = new TextDecoder();
    let buffer = '';
    
    try {
        while (true) {
            const { done, value } = await reader.read();
            if (done) break;
            
            buffer += decoder.decode(value, { stream: true });
            const lines = buffer.split('\n');
            buffer = lines.pop() || ''; // Keep incomplete line
            
            for (const line of lines) {
                if (line.trim()) {
                    try {
                        yield JSON.parse(line);
                    } catch (error) {
                        // Skip malformed JSONL lines
                        console.warn('Skipping malformed JSONL line:', line);
                    }
                }
            }
        }
    } finally {
        reader.releaseLock();
    }
}

Error Handling Strategy

Each method implements comprehensive error handling:

Network Errors: Propagated with context
Malformed Data: Gracefully skipped with warnings
Validation Errors: Collected and provided in response
Business Logic Errors: Clear error messages with actionable guidance

// Example error handling in write()
if (records.length === 0) {
    throw new Error("No records provided to write.");
}

const dirtyRecords = records.filter(r => r.isDirty());
if (dirtyRecords.length === 0 && !options.truncate) {
    throw new Error("No changes made to the records that would need to be written.");
}

Performance Optimizations

Memory Management

get(): Processes records in single batch (suitable for <10k records)
getStreaming(): One-record-at-a-time processing (unlimited scale)
write(): Batches changesets efficiently
writeStreaming(): Zero-copy streaming with HTTP body streaming

Network Efficiency

Changeset mode: Only transmits changed fields
Compression: Leverages existing HTTP compression
Connection reuse: HTTP/2 compatible streaming
Batch processing: Optimal request grouping

Data Processing

// Efficient JSONL processing
const jsonlBody = dirtyRecords
    .map(r => {
        const changeset = r.changeset();
        if (options.sheetId && !changeset.__s) {
            changeset.__s = options.sheetId; // Add sheet ID efficiently
        }
        return JSON.stringify(changeset);
    })
    .join('\n');

API Design Decisions

Why separate methods instead of options?

Clear intent: get() vs getRaw() makes the return type obvious
Type safety: No runtime type checking needed
Backward compatibility: Existing getRaw() methods unchanged
Performance: No runtime type conversion overhead
Developer experience: IDE autocomplete shows exact return types

Why smart changeset detection?

Performance: Only transmits changed data (can be 90%+ reduction)
Conflict prevention: Reduces chance of overwriting external changes
Intuitive: Matches developer expectations about ORM-style behavior
Bandwidth efficiency: Critical for mobile and low-bandwidth scenarios
Server load: Reduces processing overhead on Flatfile servers

Why streaming methods?

Scalability: Handle datasets that exceed memory limits
Real-time processing: Start processing before full download completes
Memory efficiency: Constant memory usage regardless of dataset size
ETL pipelines: Perfect for data transformation workflows
Progress tracking: Can provide real-time progress updates

Why auto-commit after write?

State consistency: Records reflect server state after successful write
Performance: Prevents unnecessary re-writes of same data
Developer intuition: Matches database transaction behavior
Error prevention: Eliminates common bug of writing same changes repeatedly

Why manual dirty marking (`setDirty()`)?

External changes: Handle cases where another process modified data
Force refresh: Useful for data synchronization scenarios
Developer control: Provides escape hatch when automatic detection insufficient
Conflict resolution: Explicit control over what gets written
Edge cases: Handles scenarios like clock skew or concurrent updates

Breaking Changes

None. This is purely additive - all existing APIs remain unchanged.

Testing

Comprehensive test suite covering all new methods
Edge case testing for dirty state management
Streaming behavior validation
Error handling verification
Both browser and Node.js compatibility tested

Files Changed

src/v2/records/FlatfileRecord.ts - Core record class implementation
src/v2/records/index.ts - New v2 API methods
src/v2/records/types.ts - Enhanced JsonlRecord interface
src/v2/records/utils.ts - Browser-compatible utilities
src/v2/index.ts - V2 namespace exports
src/index.ts - Main export updates
tests/bun/records.test.ts - Comprehensive test coverage
examples/v2-records-usage.ts - Usage examples and documentation

Migration Path

Developers can adopt the new API incrementally:

Continue using existing getRaw()/writeRaw() methods
Try get() for read-only operations
Gradually adopt write() for change tracking benefits
Full migration when ready for the rich object interface

This provides a smooth transition path while unlocking powerful new capabilities for record manipulation.

micahbf added 2 commits August 6, 2025 15:41

Add get and getStreaming returning FlatfileRecords

8b2e137

Add write with FlatfileRecords

664d8a1

micahbf changed the title ~~Mbf/fancy records~~ V2 Fancy Records Aug 7, 2025

micahbf closed this Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V2 Fancy Records #66

V2 Fancy Records #66

Uh oh!

micahbf commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

V2 Fancy Records #66

V2 Fancy Records #66

Uh oh!

Conversation

micahbf commented Aug 7, 2025

Add FlatfileRecord-based v2 Records API with Rich Object Interface

Overview

Key Features

New API Methods

Detailed Method Documentation

get(sheetId, options?, requestOptions?)

getStreaming(sheetId, options?, requestOptions?)

write(records, options?, requestOptions?)

Changeset Mode (Default)

Truncate Mode

writeStreaming(recordsStream, options?, requestOptions?)

Method Comparison

FlatfileRecord Class Features

Implementation Details

Browser Compatibility

JsonlRecord Type Integration

Smart Dirty State Management

Original Issue

Solution

Methods that affect dirty state:

Changeset Logic

Manual Dirty Marking

Usage Examples

Basic Usage

Streaming

Write Modes

Changeset Mode (Default)

Truncate Mode

Technical Implementation Details

Method Architecture

Streaming Implementation

Error Handling Strategy

Performance Optimizations

Memory Management

Network Efficiency

Data Processing

API Design Decisions

Why separate methods instead of options?

Why smart changeset detection?

Why streaming methods?

Why auto-commit after write?

Why manual dirty marking (setDirty())?

Breaking Changes

Testing

Files Changed

Migration Path

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`get(sheetId, options?, requestOptions?)`

`getStreaming(sheetId, options?, requestOptions?)`

`write(records, options?, requestOptions?)`

`writeStreaming(recordsStream, options?, requestOptions?)`

Why manual dirty marking (`setDirty()`)?