Skip to content

Add TypeScript type-checking with .d.ts declarations for every module#744

Open
JamieMagee wants to merge 8 commits intomasterfrom
typescript-migration-foundation
Open

Add TypeScript type-checking with .d.ts declarations for every module#744
JamieMagee wants to merge 8 commits intomasterfrom
typescript-migration-foundation

Conversation

@JamieMagee
Copy link
Contributor

This wires up tsc as a type-checker for the crawler codebase, following the same approach used in the service repo: .d.ts sidecar files next to .js modules, JSDoc annotations where helpful, no build step.

What changed

tsc now runs as part of npm run lint (and therefore npm test). Every source module (97 .js files) has a .d.ts file sitting next to it with type declarations. A handful of core files in lib and lib are also type-checked directly via JSDoc annotations; the rest have declarations available for consumers but their internals aren't checked yet.

The tsconfig extends @tsconfig/strictest, @tsconfig/node24, and @tsconfig/node-ts. We relax strictNullChecks, exactOptionalPropertyTypes, and noPropertyAccessFromIndexSignature because they don't work well with the existing JavaScript.

New tests can be .ts files — the mocha glob picks up test/unit/**/*.{js,ts}.

What got added

  • 97 .d.ts sidecar files (lib/, ghcrawler/, providers/, config/)
  • 4 ambient declarations in types for packages without DefinitelyTyped coverage (@clearlydefined/spdx, painless-config, throat, geit)
  • 18 @types/ packages for dependencies that do have them
  • Shared interfaces: CrawlQueue, DocStore, StoredDocument, Handler, Locker, Logger
  • JSDoc annotations on ~11 core .js files
  • typescript-migration.md explaining the approach and how to contribute

What didn't change

No .js files were rewritten. The only behavioral change is a bug fix in TraversalPolicy.getShortForm() that was referencing this.policy.freshness instead of this.freshness — caught by the type-checker, which is a nice advertisement for the whole exercise.

Remaining any

18 across all .d.ts files. They're things like payload (genuinely arbitrary queue data), body (HTTP request body), adopt() which mutates __proto__ on an arbitrary object, event emitter callback args, and index signatures on objects that get properties added dynamically in JS. I looked at each one and couldn't narrow it further without lying about the types.

@JamieMagee JamieMagee requested a review from qtomlinson March 22, 2026 19:46
tsc now runs as part of \`npm run lint\`. It doesn't check anything
yet (include array just has types/**/*) but it will once .d.ts
sidecars start landing.

Also adds @tsconfig/node-ts for native type stripping support,
sets verbatimModuleSyntax: false so CommonJS require() calls aren't
errors, and installs @types/ packages for our dependencies.

Four packages that aren't on DefinitelyTyped get hand-written
declarations in types/: @clearlydefined/spdx, painless-config,
throat, geit. These are copied from the service repo.
Covers Request, TraversalPolicy, VisitorMap, EntitySpec, SourceSpec,
FetchResult, BaseHandler, and the utility modules (fetch, utils,
memoryCache, sourceDiscovery). JSDoc annotations in the .js files
fill in parameter types so tsc's noImplicitAny is satisfied.

sourceSpec.js and sourceDiscovery.js have declarations but aren't
in the include list yet -- they pull in provider files that don't
have types, so checking them would cascade. They'll get added when
providers are typed.

Fixes a bug: TraversalPolicy.getShortForm() was reading
this.policy.freshness instead of this.freshness.
Promise<void> for results nobody awaits, Request[] for FetchResult's
dependents list, Record<string, unknown> for metadata bags, specific
types for document IDs and HTTP headers, CacheClass<string, unknown>
for the memory-cache constructor, and a tighter function signature
on MapNode. Pulls the logger shape into a Logger interface so it
isn't redefined in three places.

The remaining any usages are things like payload/body (actually
unconstrained), index signatures on extensible JS objects, and
adopt() which mutates __proto__ on an arbitrary object.
Covers the crawling engine (Crawler, CrawlerService, CrawlerFactory),
the queuing layer (QueueSet, ScopedQueueSets, NestedQueue,
AttenuatedQueue, InMemoryCrawlQueue), storage (InmemoryDocStore),
middleware, routes, and entry points.

Introduces a CrawlQueue interface that the queue implementations
share, and a DocStore interface for the storage layer. These make
it possible to type the Crawler class properly -- it depends on
both.

Only the .d.ts files are in the include list, not the .js files.
The implementations have too many untyped internal functions to
check right now without a lot of JSDoc work that can happen later.
52 files covering fetch (abstractFetch + dispatcher + 15 concrete
fetchers), process (abstractProcessor + abstractClearlyDefined +
23 concrete processors), store (dispatcher, attachment, azqueue,
webhook), filter, logging (insights, logger, loggerUtils), and the
providers/index re-export.

The concrete fetchers and processors are repetitive -- each one
is a factory function returning an AbstractFetch or
AbstractProcessor. The .d.ts files reflect that.

tsconfig now uses providers/**/*.d.ts instead of listing them
individually.
Introduces StoredDocument (document with _metadata) so DocStore
methods aren't just Record<string, any> everywhere. Factory
functions now take BaseHandlerOptions instead of Record<string, any>.
FetchDispatcher caches use MemoryCache and FetchResult. Locker
uses string|null for lock tokens. InMemoryCrawlQueue stores
Request[] not Record[]. CrawlerFactory and index.d.ts use
Record<string, unknown> and proper provider search path types.
sourceFinder callbacks get their real signature.

Down from 115 to 18 any usages in .d.ts files. The 18 that
remain are event handler callbacks, adopt() for object
rehydration, index signatures on extensible JS objects,
and genuinely unconstrained values like payload and body.
Fills the last 16 gaps: config files, root index.js, ghcrawler's
storage queues (StorageQueue, StorageQueueManager, StorageBackedQueue,
StorageBackedInMemoryQueueManager), factories (memoryFactory,
storageQueueFactory, azureBlobFactory, webhookFactory), FileStore,
StorageDocStore, and the ghcrawler providers index.

Every .js source file now has a .d.ts sidecar -- 97 of 97.

Also updates the mocha glob to test/unit/**/*.{js,ts} so new tests
can be written in TypeScript, adds test/**/*.ts to tsconfig's
include, and rewrites the migration doc to reflect the current
state instead of the blank-slate it described before.
@JamieMagee JamieMagee force-pushed the typescript-migration-foundation branch from 0dc672c to cc15b04 Compare March 23, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant