Skip to content

uhop/stream-csv-as-json

Repository files navigation

stream-csv-as-json NPM version

stream-csv-as-json is a micro-library of Node.js stream components for creating custom CSV processing pipelines with a minimal memory footprint. It can parse CSV files far exceeding available memory, streaming individual primitives using a SAX-inspired API.

stream-csv-as-json is a companion project for stream-json and stream-chain. It uses the same token protocol ({name, value} tokens) and works seamlessly with stream-json filters, streamers, and general infrastructure. This means you can combine CSV parsing with stream-json utilities like streamValues, Filter, Pick, and Ignore for powerful data processing pipelines.

Components

  • parser — streaming CSV parser producing a SAX-like token stream.
    • Optionally packs values into single tokens or streams them piece-wise.
    • The main module provides a convenience factory with event emission.
  • asObjects — uses the first row as field names, converts subsequent rows to object tokens.
  • stringer — converts a CSV token stream back to CSV text.

All components are building blocks for flexible pipelines. They can be combined with custom functions, stream-chain, and stream-json utilities.

Installation

npm install stream-csv-as-json

Quick start

Examples use ESM (import). CommonJS (require) is also supported — see Modules below.

import fs from 'node:fs';
import zlib from 'node:zlib';
import chain from 'stream-chain';
import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([
  fs.createReadStream('sample.csv.gz'),
  zlib.createGunzip(),
  parser(),
  asObjects(),
  data => {
    if (data.name === 'keyValue' && data.value === 'accounting') return data;
    if (data.name !== 'keyValue') return data;
    return null;
  }
]);

let counter = 0;
pipeline.on('data', data => {
  if (data.name === 'endObject') ++counter;
});
pipeline.on('end', () => console.log(`Found ${counter} matching rows.`));

Using .withParser() for a combined pipeline

import fs from 'node:fs';
import chain from 'stream-chain';
import asObjects from 'stream-csv-as-json/as-objects.js';

const pipeline = chain([fs.createReadStream('data.csv'), asObjects.withParser()]);

pipeline.on('data', token => console.log(token));

Using .asStream() for direct piping

import fs from 'node:fs';
import parser from 'stream-csv-as-json/parser.js';

fs.createReadStream('data.csv')
  .pipe(parser.asStream())
  .on('data', token => console.log(token));

Modules

This package supports both ESM (import) and CommonJS (require).

ESM (recommended):

import {parser} from 'stream-csv-as-json';
import asObjects from 'stream-csv-as-json/as-objects.js';
import stringer from 'stream-csv-as-json/stringer.js';

CommonJS:

const {parser} = require('stream-csv-as-json');
const asObjects = require('stream-csv-as-json/as-objects.js');
const stringer = require('stream-csv-as-json/stringer.js');

See the full documentation in the Wiki.

API at a glance

Module Factory Stream wrapper
stream-csv-as-json make(options) Returns a Duplex stream with event emission
stream-csv-as-json/parser.js parser(options) parser.asStream(options)
stream-csv-as-json/stringer.js stringer(options) stringer.asStream(options)
stream-csv-as-json/as-objects.js asObjects(options) asObjects.asStream(options)

parser options

Option Default Description
packStrings / packValues true Emit stringValue tokens with the complete value
streamStrings / streamValues true Emit startString/stringChunk/endString tokens
separator ',' Field separator character

stringer options

Option Default Description
useStringValues / useValues false Use packed stringValue tokens instead of streamed chunks
separator ',' Field separator character

asObjects options

Option Default Description
packKeys / packValues true Emit keyValue tokens
streamKeys / streamValues true Emit startKey/stringChunk/endKey tokens
useStringValues / useValues false Use packed stringValue tokens for header collection
fieldPrefix 'field' Prefix for unnamed/extra fields

TypeScript

TypeScript declarations (.d.ts) are included and provide full type information for all modules.

License

BSD-3-Clause

Release history

  • 2.0.1 Added direct dependency on stream-chain. Documentation updates.
  • 2.0.0 Major rewrite: functional API (stream-chain 3.x), source in src/, TypeScript declarations, tape-six tests. See Migration guide.
  • 1.0.5 technical release: updated deps.
  • 1.0.4 technical release: updated deps.
  • 1.0.3 technical release: updated deps.
  • 1.0.2 technical release: updated deps, updated license's year.
  • 1.0.1 minor readme tweaks, added TypeScript typings and the badge.
  • 1.0.0 the first 1.0 release.

About

Micro-library of Node stream components with minimal dependencies for creating custom data processors oriented on processing huge CSV files while requiring a minimal memory footprint.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

  •  

Contributors