Skip to content

feat(forms): ingest Section 16 (3/4/5) and Form 144 ownership filings#116

Merged
sroussey merged 6 commits into
mainfrom
claude/gifted-bardeen-QeoFt
May 28, 2026
Merged

feat(forms): ingest Section 16 (3/4/5) and Form 144 ownership filings#116
sroussey merged 6 commits into
mainfrom
claude/gifted-bardeen-QeoFt

Conversation

@sroussey
Copy link
Copy Markdown
Contributor

@sroussey sroussey commented May 28, 2026

Summary

Implements parsing and storage for SEC insider-ownership filings, covering the EDGAR ownershipDocument schema shared by Forms 3 / 4 / 5 (+ /A amendments) and the separate Form 144 / 144-A "Notice of Proposed Sale" schema.

Section 16 (Forms 3 / 4 / 5)

  • Shared OwnershipDocument.schema.ts (TypeBox); Form_3/4/5.parse() delegate to it. Array paths are declared so single-element tables still parse as arrays, and { value }-wrapped leaves coerce automatically.
  • New src/storage/section16/ tier:
    • section16_filings — one row per filing
    • section16_transactions — Form 4 / 5 transactions, is_derivative flag, full transaction + underlying-security detail
    • section16_holdings — Form 3 / 5 holdings
  • Orphan-safe clearTransactions / clearHoldings on the positional-key detail tables for idempotent re-extraction.
  • Reporting owners + issuer flow through EntityObserver. Owners are classified on the relationship flags (directors/officers are always individuals) with a cleaned-name fallback for the 10%-owner / other cases.

Form 144 / 144-A

  • Form_144.schema.ts models the edgarSubmission shape: issuer info + relationships, the (1:1) securitiesInformation / broker block, repeating securitiesToBeSold (acquisition lots), and securitiesSoldInPast3Months (recent sales). Dates are kept as the US-format strings EDGAR emits; numeric leaves are kept as strings and coerced in storage so an empty element doesn't become a fabricated 0.
  • New src/storage/form144/ tier: form144_filings (with the proposed-sale block folded in), form144_acquisitions, form144_recent_sales, plus clear* for idempotent re-extraction.
  • Issuer + broker observed as companies; account holder observed as a person via EntityObserver.

Wiring

  • Extractors 3, 4, 5, 144 added to extractorIds, dispatched in ProcessAccessionDocFormTask, registered in both DI containers, set up in setupAllDatabases, listed in db stats.
  • Updated the extractor-count assertions in the versioning tests.

Notable decisions

  • Person/entity classification (no explicit flag in EDGAR ownership): directors/officers ⇒ person; otherwise a cleaned-name company-ending test (strips trailing punctuation + lone trailing initial). Verified on real multi-owner filings.
  • issuer_cik comes solely from the XML <issuerCik>; no fallback to the filing's own CIK — the ingestion path may carry the reporting owner's CIK, which would otherwise contaminate the issuer column.
  • Form 144 numeric leaves are typed as strings in the parse schema (storage num() coerces), avoiding Value.Convert("") → 0 which would fabricate $0 market values for empty elements.

Test plan

  • Real EDGAR fixtures: Forms 3 / 3-A (with holdings) / 4 (incl. derivative) / 4-A / 5 / 144 / 144-A (incl. nothing-to-report)
  • Parse + storage tests, owner classification, orphan-clear regression, empty-numeric regression
  • Full suite: 753 pass / 0 fail, tsc --noEmit clean, bun run build clean
  • Verified real sec db setup creates all 6 new tables (section16_*, form144_*)

https://claude.ai/code/session_017BajKRHwBkxAyPYDyDhgLz

claude added 5 commits May 28, 2026 00:02
Implement parsing and storage for the EDGAR ownershipDocument schema
shared by Forms 3, 4, 5 and their /A amendments. Adds a shared TypeBox
schema, parsers on Form_3/4/5, a section16 storage tier (filing,
transaction, holding repos), and wires the new "3"/"4"/"5" extractors
into the dispatch task, extractor registry, and DI containers.

Reporting owners are classified person-vs-company using the relationship
flags (directors/officers are always individuals) with a cleaned-name
company-ending fallback, since EDGAR carries no explicit entity flag and
the generic name heuristics misfire on insider name formats.
…le rows

Address code-review findings on the Section 16 ingestion:

- setupAllDatabases() never created the section16_filings/transactions/
  holdings tables, so `sec db setup` left them missing and the first real
  Form 3/4/5 would hit "no such table" in production (in-memory tests
  masked it). Register the three setupDatabase() calls.
- Derive the observation extractor_id from formToExtractorId(form) — the
  same mapping the dispatch task records extractor_runs against — instead
  of re-deriving it from the XML documentType, so the two never diverge.
- Clear a filing's transaction/holding rows before re-inserting; the
  positional (accession, index) keys would otherwise leave orphans when a
  re-extraction yields fewer rows.
- Surface the three section16 tables in `sec db stats`.
Follow-up review fixes:

- issuer_cik now comes solely from the XML <issuerCik>; drop the fallback
  to the filing's own CIK. Ownership filings are ingested from a submission
  feed that may belong to the reporting owner rather than the issuer, so the
  fallback could stamp the owner's CIK as the issuer (and propagate it to
  every transaction/holding row).
- period_of_report stores null when <periodOfReport> is absent instead of
  substituting filing_date, which fabricated a plausible-but-wrong period.
Add parsing and storage for Form 144 / 144/A (Notice of Proposed Sale of
Securities under Rule 144), filed electronically as XML since 2022.

- Form_144.schema.ts models the edgarSubmission shape (issuer info, the
  single securitiesInformation/broker block, repeating securitiesToBeSold
  acquisition lots, and securitiesSoldInPast3Months sales); dates are kept
  as the US-format strings EDGAR emits.
- New form144 storage tier: filing header (folding in the 1:1 proposed
  sale), acquisitions, and recent-sales tables, with orphan-safe
  clear-before-write on the two detail tables.
- Observes the issuer and broker as companies and the account holder as a
  person via EntityObserver.
- Wires the "144" extractor into the dispatch task, extractor registry,
  both DI containers, setupAllDatabases (table DDL), and db stats.

Updates the existing extractor-count assertions (now 9 extractors).
Review follow-ups for the Form 144 ingestion:

- Type numeric leaves (units, market value, proceeds, amounts) as raw
  strings and coerce in storage. Typed as Type.Number(), Value.Convert
  turned an empty element ("") into a fabricated 0, indistinguishable from
  a real zero. Storage num() maps "" -> null. Adds a regression test.
- is_gift: treat a present-but-empty <isGiftTransaction/> as null (unknown)
  rather than false.
- Stop reusing the trailing-3-month sellerDetails address for the account
  holder: that seller can be a different party and its name is in reversed
  order, so it can't be matched safely.
- Guard securitiesInformation/broker reads with firstOf() so a filing that
  ever repeats the (normally singular) block degrades to the first entry
  instead of nulling the whole proposed-sale block.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end ingestion for SEC insider-ownership XML filings by introducing a shared ownershipDocument parser for Forms 3/4/5 (and amendments), persisting extracted data into new Section 16 storage tables, and wiring new extractors into the processing/versioning infrastructure. It also introduces separate parsing/storage for Form 144.

Changes:

  • Implement shared TypeBox schema + parsers for Forms 3/4/5 ownershipDocument, plus fixtures and tests.
  • Add new storage tier and repos for Section 16 filings/transactions/holdings and wire them into DI + DB setup.
  • Register new extractor IDs (3/4/5/144), dispatch them in the processing task, and update related tests/CLI DB status reporting.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/task/forms/ProcessAccessionDocFormTask.ts Dispatch storage handling for Forms 3/4/5 and 144 (+ amendments).
src/storage/versioning/extractorIds.ts Add extractor IDs + form→extractor mapping for 3/4/5/144.
src/storage/versioning/extractorIds.test.ts Update extractor ID expectations and add mapping tests for 3/4/5/144.
src/storage/versioning/componentRegistry.test.ts Update registry expectations for new extractors and total component count.
src/storage/section16/Section16Schema.ts Introduce TypeBox storage schemas + DI tokens for Section 16 tables.
src/storage/section16/Section16Repo.ts Add repo wrapper for Section 16 filing/transaction/holding persistence.
src/storage/form144/Form144Schema.ts Introduce TypeBox storage schemas + DI tokens for Form 144 tables.
src/storage/form144/Form144Repo.ts Add repo wrapper for Form 144 filing/acquisition/recent-sale persistence.
src/sec/forms/insider-trading/OwnershipDocument.test.ts Add parsing tests across real Form 3/4/5 fixtures and edge-cases.
src/sec/forms/insider-trading/OwnershipDocument.storage.ts Implement Section 16 storage extraction + EntityObserver integration.
src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts Add integration tests for Section 16 storage + entity classification + idempotency.
src/sec/forms/insider-trading/OwnershipDocument.schema.ts Add shared ownershipDocument TypeBox schema (Forms 3/4/5).
src/sec/forms/insider-trading/mock_data/form-5/000158064226003205-primary_doc.xml Add Form 5 fixture.
src/sec/forms/insider-trading/mock_data/form-5/000119312526225124-primary_doc.xml Add Form 5 fixture.
src/sec/forms/insider-trading/mock_data/form-4/000149315226025476-primary_doc.xml Add Form 4 fixture (derivative + non-derivative).
src/sec/forms/insider-trading/mock_data/form-4/000090266426002604-primary_doc.xml Add multi-owner Form 4 fixture for owner classification.
src/sec/forms/insider-trading/mock_data/form-4-a/000089706926001273-primary_doc.xml Add Form 4/A fixture.
src/sec/forms/insider-trading/mock_data/form-3/000212961626000003-primary_doc.xml Add Form 3 fixture.
src/sec/forms/insider-trading/mock_data/form-3-a/000095010326007758-primary_doc.xml Add Form 3/A fixture (holdings + footnote-only leaves).
src/sec/forms/insider-trading/mock_data/form-144/000195917326004038-primary_doc.xml Add Form 144 fixture.
src/sec/forms/insider-trading/mock_data/form-144/000195917326004030-primary_doc.xml Add Form 144 fixture.
src/sec/forms/insider-trading/mock_data/form-144/000166326626000003-primary_doc.xml Add Form 144 fixture (repeating tables + recent sales).
src/sec/forms/insider-trading/mock_data/form-144-a/000166326626000004-primary_doc.xml Add Form 144/A fixture.
src/sec/forms/insider-trading/Form_5.ts Update Form 5 parser to use shared OwnershipDocument schema + Value.Convert.
src/sec/forms/insider-trading/Form_4.ts Update Form 4 parser to use shared OwnershipDocument schema + Value.Convert.
src/sec/forms/insider-trading/Form_3.ts Update Form 3 parser to use shared OwnershipDocument schema + Value.Convert.
src/sec/forms/insider-trading/Form_144.ts Update Form 144 parser implementation + refresh description text.
src/sec/forms/insider-trading/Form_144.test.ts Add parsing tests for Form 144 fixtures and coercion/array behavior.
src/sec/forms/insider-trading/Form_144.storage.ts Implement Form 144 storage extraction + EntityObserver integration.
src/sec/forms/insider-trading/Form_144.storage.test.ts Add integration tests for Form 144 storage + idempotency + null numeric handling.
src/sec/forms/insider-trading/Form_144.schema.ts Add TypeBox schema for Form 144 edgarSubmission.
src/config/TestingDI.ts Register new in-memory storages for Section 16 and Form 144 in tests.
src/config/setupAllDatabases.ts Ensure new storages are setup during DB initialization.
src/config/DefaultDI.ts Register production storages for Section 16 and Form 144 tables.
src/cli/queries/DbStatus.ts Include new tables in DB status query output.
src/cli/groups/version.test.ts Update expected extractor IDs in version CLI tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +21
export const Section16FilingSchema = Type.Object({
accession_number: Type.String({ maxLength: 25, description: "EDGAR accession number" }),
form: Type.String({ maxLength: 10, description: "Form symbol (3, 4, 5, 3/A, ...)" }),
document_type: Type.String({ maxLength: 10, description: "ownershipDocument documentType" }),
issuer_cik: Type.Integer({ minimum: 0, description: "Issuer CIK" }),
issuer_name: Type.String({ maxLength: 150, description: "Issuer name" }),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in a9bd996 — switched issuer_cik on all three Section 16 tables (filing, transaction, holding) to TypeSecCik(). The Codec accepts the existing numeric values from parseCikSafely without ripple changes; tsc and the full suite stay green.


Generated by Claude Code

Comment on lines +18 to +23
export const Form144FilingSchema = Type.Object({
accession_number: Type.String({ maxLength: 25 }),
form: Type.String({ maxLength: 10 }),
submission_type: TypeNullable(Type.String({ maxLength: 10 })),
issuer_cik: Type.Integer({ minimum: 0 }),
issuer_name: Type.String({ maxLength: 150 }),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in a9bd996issuer_cik on all three Form 144 tables (filing, acquisition, recent-sale) now uses TypeSecCik().


Generated by Claude Code

Comment on lines +194 to +199
await processOwnershipForm({ ...storageArgs, form: form!, doc: parsed });
break;
case "144":
case "144/A":
await processForm144({ ...storageArgs, form: form!, doc: parsed });
break;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR title and description to reflect that Form 144 / 144-A is now included in this PR (not a follow-on).


Generated by Claude Code

Align Section16 and Form144 schemas with the newer
observation/filing/issuer schemas that use the TypeSecCik() codec
(format: cik, max 9999999999) instead of plain Type.Integer for CIK
fields. Per inline review on PR #116.
@sroussey sroussey changed the title feat(forms): ingest Section 16 ownership Forms 3/4/5 feat(forms): ingest Section 16 (3/4/5) and Form 144 ownership filings May 28, 2026
@sroussey sroussey merged commit 0a7e6ce into main May 28, 2026
1 check passed
sroussey added a commit that referenced this pull request May 28, 2026
sroussey added a commit that referenced this pull request May 28, 2026
…3/4/5 (#116 follow-up) (#117)

* fix(section16): type VALUE_NUMBER as string to preserve empty-vs-zero distinction (#116 follow-up)

* test(section16): regression tests for null-vs-0 on empty numeric leaves

* test(section16): expect string-valued numeric leaves after VALUE_NUMBER schema change

* refactor(ownership): reorganize imports and clean up code structure in OwnershipDocument.storage.ts

- Consolidated and reordered import statements for better readability.
- Adjusted the extractor version to align with recent changes.
- Improved formatting of function parameters for clarity.
- Ensured consistent handling of transaction and holding saving logic.

* remove
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants