codex2parquet

A command-line tool to convert Codex session logs to Parquet format for data analysis and AI applications.

Installation

npm install -g codex2parquet

Usage

# Export Codex logs for current directory to codex_logs.parquet
codex2parquet

# Export logs from all projects
codex2parquet --all

# Export to custom filename
codex2parquet --output logs.parquet

# Export logs for a specific project directory
codex2parquet --project ~/code/myapp

# Read from a non-default Codex data directory
codex2parquet --codex-dir ~/.codex

What Gets Exported

Codex stores local data under ~/.codex by default. This tool reads:

~/.codex/sessions/**/*.jsonl: current Codex rollout logs. Each line is a JSON object with timestamp, type, and payload.
~/.codex/sessions/rollout-*.json: legacy rollout logs. Each file contains a session object and an items array.
~/.codex/state_5.sqlite: thread metadata, including cwd, title, model, model provider, CLI version, sandbox policy, approval mode, token totals, git metadata, dynamic tools, and subagent parent/child edges.
~/.codex/history.jsonl: prompt history rows with session_id, Unix timestamp, and text.
~/.codex/logs_2.sqlite: diagnostic/runtime log rows when the current Node.js runtime includes node:sqlite.

The SQLite sources are optional. The exporter reads them through Node's native node:sqlite module and does not require a system sqlite3 command. If the SQLite files are missing or unreadable, the exporter still writes rollout and history rows.

Output Schema

The generated Parquet file is an event table. It includes one row per rollout event, legacy item, history prompt, or diagnostic log entry.

Important columns:

source_kind: rollout, history, or diagnostic_log
project: Project name derived from cwd
session_id: Codex thread/session identifier
item_index: Event index within its source
timestamp: ISO timestamp when available
rollout_path: Source rollout file path
top_level_type: Current JSONL top-level type, such as session_meta, event_msg, response_item, or turn_context
event_type: Nested event type for event_msg payloads
item_type: Response item type, such as message, reasoning, function_call, or function_call_output
role, name, status, call_id, item_id, turn_id: Common message and tool-call identifiers
text: The primary readable body for messages, user prompts, tool results, agent messages, and diagnostics
tool_input_json, tool_output: Tool/function call inputs and decoded outputs
model, model_provider, reasoning_effort, cwd, title, source, cli_version: Thread/session metadata
approval_mode, sandbox_policy, tokens_used, git_sha, git_branch, git_origin_url: Execution metadata from state_5.sqlite
input_tokens, cached_input_tokens, output_tokens, reasoning_output_tokens, total_tokens: Token usage when present in event payloads
rate_limits_json, metadata_json, content_json, payload_json, raw_json: Metadata and raw JSON preservation columns

All Parquet columns are written as strings to keep the schema stable across Codex log format changes. Rare or source-specific details, such as diagnostic log module paths, dynamic tools, and subagent metadata, are preserved in metadata_json instead of becoming mostly-empty top-level columns.

Options

--output <file>, -o <file>: Output parquet filename (default: codex_logs.parquet)
--project <path>: Filter logs to a specific project directory
--all: Export logs from all Codex projects
--codex-dir <path>: Codex data directory (default: ~/.codex)
--no-history: Skip prompt history rows
--no-diagnostics: Skip diagnostic log rows
--help, -h: Show help message

Requirements

Node.js 22.5.0 or newer. SQLite enrichment uses native node:sqlite; no sqlite3 CLI is required.
Codex local data in ~/.codex

Use Cases

Analyzing Codex usage patterns across projects
Building datasets from human-agent coding sessions
Auditing tool calls, command outputs, and runtime diagnostics
Creating dashboards over models, projects, token usage, and git branches

Hyperparam

Hyperparam is a tool for exploring and curating AI datasets, such as those produced by codex2parquet.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

codex2parquet

Installation

Usage

What Gets Exported

Output Schema

Options

Requirements

Use Cases

Hyperparam

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

codex2parquet

Installation

Usage

What Gets Exported

Output Schema

Options

Requirements

Use Cases

Hyperparam

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages