feat(forms): ingest Section 16 (3/4/5) and Form 144 ownership filings#116
Conversation
Implement parsing and storage for the EDGAR ownershipDocument schema shared by Forms 3, 4, 5 and their /A amendments. Adds a shared TypeBox schema, parsers on Form_3/4/5, a section16 storage tier (filing, transaction, holding repos), and wires the new "3"/"4"/"5" extractors into the dispatch task, extractor registry, and DI containers. Reporting owners are classified person-vs-company using the relationship flags (directors/officers are always individuals) with a cleaned-name company-ending fallback, since EDGAR carries no explicit entity flag and the generic name heuristics misfire on insider name formats.
…le rows Address code-review findings on the Section 16 ingestion: - setupAllDatabases() never created the section16_filings/transactions/ holdings tables, so `sec db setup` left them missing and the first real Form 3/4/5 would hit "no such table" in production (in-memory tests masked it). Register the three setupDatabase() calls. - Derive the observation extractor_id from formToExtractorId(form) — the same mapping the dispatch task records extractor_runs against — instead of re-deriving it from the XML documentType, so the two never diverge. - Clear a filing's transaction/holding rows before re-inserting; the positional (accession, index) keys would otherwise leave orphans when a re-extraction yields fewer rows. - Surface the three section16 tables in `sec db stats`.
Follow-up review fixes: - issuer_cik now comes solely from the XML <issuerCik>; drop the fallback to the filing's own CIK. Ownership filings are ingested from a submission feed that may belong to the reporting owner rather than the issuer, so the fallback could stamp the owner's CIK as the issuer (and propagate it to every transaction/holding row). - period_of_report stores null when <periodOfReport> is absent instead of substituting filing_date, which fabricated a plausible-but-wrong period.
Add parsing and storage for Form 144 / 144/A (Notice of Proposed Sale of Securities under Rule 144), filed electronically as XML since 2022. - Form_144.schema.ts models the edgarSubmission shape (issuer info, the single securitiesInformation/broker block, repeating securitiesToBeSold acquisition lots, and securitiesSoldInPast3Months sales); dates are kept as the US-format strings EDGAR emits. - New form144 storage tier: filing header (folding in the 1:1 proposed sale), acquisitions, and recent-sales tables, with orphan-safe clear-before-write on the two detail tables. - Observes the issuer and broker as companies and the account holder as a person via EntityObserver. - Wires the "144" extractor into the dispatch task, extractor registry, both DI containers, setupAllDatabases (table DDL), and db stats. Updates the existing extractor-count assertions (now 9 extractors).
Review follow-ups for the Form 144 ingestion:
- Type numeric leaves (units, market value, proceeds, amounts) as raw
strings and coerce in storage. Typed as Type.Number(), Value.Convert
turned an empty element ("") into a fabricated 0, indistinguishable from
a real zero. Storage num() maps "" -> null. Adds a regression test.
- is_gift: treat a present-but-empty <isGiftTransaction/> as null (unknown)
rather than false.
- Stop reusing the trailing-3-month sellerDetails address for the account
holder: that seller can be a different party and its name is in reversed
order, so it can't be matched safely.
- Guard securitiesInformation/broker reads with firstOf() so a filing that
ever repeats the (normally singular) block degrades to the first entry
instead of nulling the whole proposed-sale block.
There was a problem hiding this comment.
Pull request overview
Adds end-to-end ingestion for SEC insider-ownership XML filings by introducing a shared ownershipDocument parser for Forms 3/4/5 (and amendments), persisting extracted data into new Section 16 storage tables, and wiring new extractors into the processing/versioning infrastructure. It also introduces separate parsing/storage for Form 144.
Changes:
- Implement shared TypeBox schema + parsers for Forms 3/4/5
ownershipDocument, plus fixtures and tests. - Add new storage tier and repos for Section 16 filings/transactions/holdings and wire them into DI + DB setup.
- Register new extractor IDs (3/4/5/144), dispatch them in the processing task, and update related tests/CLI DB status reporting.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/task/forms/ProcessAccessionDocFormTask.ts | Dispatch storage handling for Forms 3/4/5 and 144 (+ amendments). |
| src/storage/versioning/extractorIds.ts | Add extractor IDs + form→extractor mapping for 3/4/5/144. |
| src/storage/versioning/extractorIds.test.ts | Update extractor ID expectations and add mapping tests for 3/4/5/144. |
| src/storage/versioning/componentRegistry.test.ts | Update registry expectations for new extractors and total component count. |
| src/storage/section16/Section16Schema.ts | Introduce TypeBox storage schemas + DI tokens for Section 16 tables. |
| src/storage/section16/Section16Repo.ts | Add repo wrapper for Section 16 filing/transaction/holding persistence. |
| src/storage/form144/Form144Schema.ts | Introduce TypeBox storage schemas + DI tokens for Form 144 tables. |
| src/storage/form144/Form144Repo.ts | Add repo wrapper for Form 144 filing/acquisition/recent-sale persistence. |
| src/sec/forms/insider-trading/OwnershipDocument.test.ts | Add parsing tests across real Form 3/4/5 fixtures and edge-cases. |
| src/sec/forms/insider-trading/OwnershipDocument.storage.ts | Implement Section 16 storage extraction + EntityObserver integration. |
| src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts | Add integration tests for Section 16 storage + entity classification + idempotency. |
| src/sec/forms/insider-trading/OwnershipDocument.schema.ts | Add shared ownershipDocument TypeBox schema (Forms 3/4/5). |
| src/sec/forms/insider-trading/mock_data/form-5/000158064226003205-primary_doc.xml | Add Form 5 fixture. |
| src/sec/forms/insider-trading/mock_data/form-5/000119312526225124-primary_doc.xml | Add Form 5 fixture. |
| src/sec/forms/insider-trading/mock_data/form-4/000149315226025476-primary_doc.xml | Add Form 4 fixture (derivative + non-derivative). |
| src/sec/forms/insider-trading/mock_data/form-4/000090266426002604-primary_doc.xml | Add multi-owner Form 4 fixture for owner classification. |
| src/sec/forms/insider-trading/mock_data/form-4-a/000089706926001273-primary_doc.xml | Add Form 4/A fixture. |
| src/sec/forms/insider-trading/mock_data/form-3/000212961626000003-primary_doc.xml | Add Form 3 fixture. |
| src/sec/forms/insider-trading/mock_data/form-3-a/000095010326007758-primary_doc.xml | Add Form 3/A fixture (holdings + footnote-only leaves). |
| src/sec/forms/insider-trading/mock_data/form-144/000195917326004038-primary_doc.xml | Add Form 144 fixture. |
| src/sec/forms/insider-trading/mock_data/form-144/000195917326004030-primary_doc.xml | Add Form 144 fixture. |
| src/sec/forms/insider-trading/mock_data/form-144/000166326626000003-primary_doc.xml | Add Form 144 fixture (repeating tables + recent sales). |
| src/sec/forms/insider-trading/mock_data/form-144-a/000166326626000004-primary_doc.xml | Add Form 144/A fixture. |
| src/sec/forms/insider-trading/Form_5.ts | Update Form 5 parser to use shared OwnershipDocument schema + Value.Convert. |
| src/sec/forms/insider-trading/Form_4.ts | Update Form 4 parser to use shared OwnershipDocument schema + Value.Convert. |
| src/sec/forms/insider-trading/Form_3.ts | Update Form 3 parser to use shared OwnershipDocument schema + Value.Convert. |
| src/sec/forms/insider-trading/Form_144.ts | Update Form 144 parser implementation + refresh description text. |
| src/sec/forms/insider-trading/Form_144.test.ts | Add parsing tests for Form 144 fixtures and coercion/array behavior. |
| src/sec/forms/insider-trading/Form_144.storage.ts | Implement Form 144 storage extraction + EntityObserver integration. |
| src/sec/forms/insider-trading/Form_144.storage.test.ts | Add integration tests for Form 144 storage + idempotency + null numeric handling. |
| src/sec/forms/insider-trading/Form_144.schema.ts | Add TypeBox schema for Form 144 edgarSubmission. |
| src/config/TestingDI.ts | Register new in-memory storages for Section 16 and Form 144 in tests. |
| src/config/setupAllDatabases.ts | Ensure new storages are setup during DB initialization. |
| src/config/DefaultDI.ts | Register production storages for Section 16 and Form 144 tables. |
| src/cli/queries/DbStatus.ts | Include new tables in DB status query output. |
| src/cli/groups/version.test.ts | Update expected extractor IDs in version CLI tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export const Section16FilingSchema = Type.Object({ | ||
| accession_number: Type.String({ maxLength: 25, description: "EDGAR accession number" }), | ||
| form: Type.String({ maxLength: 10, description: "Form symbol (3, 4, 5, 3/A, ...)" }), | ||
| document_type: Type.String({ maxLength: 10, description: "ownershipDocument documentType" }), | ||
| issuer_cik: Type.Integer({ minimum: 0, description: "Issuer CIK" }), | ||
| issuer_name: Type.String({ maxLength: 150, description: "Issuer name" }), |
There was a problem hiding this comment.
Applied in a9bd996 — switched issuer_cik on all three Section 16 tables (filing, transaction, holding) to TypeSecCik(). The Codec accepts the existing numeric values from parseCikSafely without ripple changes; tsc and the full suite stay green.
Generated by Claude Code
| export const Form144FilingSchema = Type.Object({ | ||
| accession_number: Type.String({ maxLength: 25 }), | ||
| form: Type.String({ maxLength: 10 }), | ||
| submission_type: TypeNullable(Type.String({ maxLength: 10 })), | ||
| issuer_cik: Type.Integer({ minimum: 0 }), | ||
| issuer_name: Type.String({ maxLength: 150 }), |
There was a problem hiding this comment.
Applied in a9bd996 — issuer_cik on all three Form 144 tables (filing, acquisition, recent-sale) now uses TypeSecCik().
Generated by Claude Code
| await processOwnershipForm({ ...storageArgs, form: form!, doc: parsed }); | ||
| break; | ||
| case "144": | ||
| case "144/A": | ||
| await processForm144({ ...storageArgs, form: form!, doc: parsed }); | ||
| break; |
There was a problem hiding this comment.
Updated the PR title and description to reflect that Form 144 / 144-A is now included in this PR (not a follow-on).
Generated by Claude Code
Align Section16 and Form144 schemas with the newer observation/filing/issuer schemas that use the TypeSecCik() codec (format: cik, max 9999999999) instead of plain Type.Integer for CIK fields. Per inline review on PR #116.
… distinction (#116 follow-up)
…3/4/5 (#116 follow-up) (#117) * fix(section16): type VALUE_NUMBER as string to preserve empty-vs-zero distinction (#116 follow-up) * test(section16): regression tests for null-vs-0 on empty numeric leaves * test(section16): expect string-valued numeric leaves after VALUE_NUMBER schema change * refactor(ownership): reorganize imports and clean up code structure in OwnershipDocument.storage.ts - Consolidated and reordered import statements for better readability. - Adjusted the extractor version to align with recent changes. - Improved formatting of function parameters for clarity. - Ensured consistent handling of transaction and holding saving logic. * remove
Summary
Implements parsing and storage for SEC insider-ownership filings, covering the EDGAR
ownershipDocumentschema shared by Forms 3 / 4 / 5 (+/Aamendments) and the separate Form 144 / 144-A "Notice of Proposed Sale" schema.Section 16 (Forms 3 / 4 / 5)
OwnershipDocument.schema.ts(TypeBox);Form_3/4/5.parse()delegate to it. Array paths are declared so single-element tables still parse as arrays, and{ value }-wrapped leaves coerce automatically.src/storage/section16/tier:section16_filings— one row per filingsection16_transactions— Form 4 / 5 transactions,is_derivativeflag, full transaction + underlying-security detailsection16_holdings— Form 3 / 5 holdingsclearTransactions/clearHoldingson the positional-key detail tables for idempotent re-extraction.EntityObserver. Owners are classified on the relationship flags (directors/officers are always individuals) with a cleaned-name fallback for the 10%-owner / other cases.Form 144 / 144-A
Form_144.schema.tsmodels theedgarSubmissionshape: issuer info + relationships, the (1:1)securitiesInformation/ broker block, repeatingsecuritiesToBeSold(acquisition lots), andsecuritiesSoldInPast3Months(recent sales). Dates are kept as the US-format strings EDGAR emits; numeric leaves are kept as strings and coerced in storage so an empty element doesn't become a fabricated 0.src/storage/form144/tier:form144_filings(with the proposed-sale block folded in),form144_acquisitions,form144_recent_sales, plusclear*for idempotent re-extraction.EntityObserver.Wiring
3,4,5,144added toextractorIds, dispatched inProcessAccessionDocFormTask, registered in both DI containers, set up insetupAllDatabases, listed indb stats.Notable decisions
issuer_cikcomes solely from the XML<issuerCik>; no fallback to the filing's own CIK — the ingestion path may carry the reporting owner's CIK, which would otherwise contaminate the issuer column.num()coerces), avoidingValue.Convert("")→ 0 which would fabricate $0 market values for empty elements.Test plan
tsc --noEmitclean,bun run buildcleansec db setupcreates all 6 new tables (section16_*,form144_*)https://claude.ai/code/session_017BajKRHwBkxAyPYDyDhgLz