Add TypeScript type-checking with .d.ts declarations for every module#744
Open
JamieMagee wants to merge 8 commits intomasterfrom
Open
Add TypeScript type-checking with .d.ts declarations for every module#744JamieMagee wants to merge 8 commits intomasterfrom
JamieMagee wants to merge 8 commits intomasterfrom
Conversation
tsc now runs as part of \`npm run lint\`. It doesn't check anything yet (include array just has types/**/*) but it will once .d.ts sidecars start landing. Also adds @tsconfig/node-ts for native type stripping support, sets verbatimModuleSyntax: false so CommonJS require() calls aren't errors, and installs @types/ packages for our dependencies. Four packages that aren't on DefinitelyTyped get hand-written declarations in types/: @clearlydefined/spdx, painless-config, throat, geit. These are copied from the service repo.
Covers Request, TraversalPolicy, VisitorMap, EntitySpec, SourceSpec, FetchResult, BaseHandler, and the utility modules (fetch, utils, memoryCache, sourceDiscovery). JSDoc annotations in the .js files fill in parameter types so tsc's noImplicitAny is satisfied. sourceSpec.js and sourceDiscovery.js have declarations but aren't in the include list yet -- they pull in provider files that don't have types, so checking them would cascade. They'll get added when providers are typed. Fixes a bug: TraversalPolicy.getShortForm() was reading this.policy.freshness instead of this.freshness.
Promise<void> for results nobody awaits, Request[] for FetchResult's dependents list, Record<string, unknown> for metadata bags, specific types for document IDs and HTTP headers, CacheClass<string, unknown> for the memory-cache constructor, and a tighter function signature on MapNode. Pulls the logger shape into a Logger interface so it isn't redefined in three places. The remaining any usages are things like payload/body (actually unconstrained), index signatures on extensible JS objects, and adopt() which mutates __proto__ on an arbitrary object.
Covers the crawling engine (Crawler, CrawlerService, CrawlerFactory), the queuing layer (QueueSet, ScopedQueueSets, NestedQueue, AttenuatedQueue, InMemoryCrawlQueue), storage (InmemoryDocStore), middleware, routes, and entry points. Introduces a CrawlQueue interface that the queue implementations share, and a DocStore interface for the storage layer. These make it possible to type the Crawler class properly -- it depends on both. Only the .d.ts files are in the include list, not the .js files. The implementations have too many untyped internal functions to check right now without a lot of JSDoc work that can happen later.
52 files covering fetch (abstractFetch + dispatcher + 15 concrete fetchers), process (abstractProcessor + abstractClearlyDefined + 23 concrete processors), store (dispatcher, attachment, azqueue, webhook), filter, logging (insights, logger, loggerUtils), and the providers/index re-export. The concrete fetchers and processors are repetitive -- each one is a factory function returning an AbstractFetch or AbstractProcessor. The .d.ts files reflect that. tsconfig now uses providers/**/*.d.ts instead of listing them individually.
Introduces StoredDocument (document with _metadata) so DocStore methods aren't just Record<string, any> everywhere. Factory functions now take BaseHandlerOptions instead of Record<string, any>. FetchDispatcher caches use MemoryCache and FetchResult. Locker uses string|null for lock tokens. InMemoryCrawlQueue stores Request[] not Record[]. CrawlerFactory and index.d.ts use Record<string, unknown> and proper provider search path types. sourceFinder callbacks get their real signature. Down from 115 to 18 any usages in .d.ts files. The 18 that remain are event handler callbacks, adopt() for object rehydration, index signatures on extensible JS objects, and genuinely unconstrained values like payload and body.
Fills the last 16 gaps: config files, root index.js, ghcrawler's
storage queues (StorageQueue, StorageQueueManager, StorageBackedQueue,
StorageBackedInMemoryQueueManager), factories (memoryFactory,
storageQueueFactory, azureBlobFactory, webhookFactory), FileStore,
StorageDocStore, and the ghcrawler providers index.
Every .js source file now has a .d.ts sidecar -- 97 of 97.
Also updates the mocha glob to test/unit/**/*.{js,ts} so new tests
can be written in TypeScript, adds test/**/*.ts to tsconfig's
include, and rewrites the migration doc to reflect the current
state instead of the blank-slate it described before.
0dc672c to
cc15b04
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This wires up
tscas a type-checker for the crawler codebase, following the same approach used in the service repo:.d.tssidecar files next to.jsmodules, JSDoc annotations where helpful, no build step.What changed
tscnow runs as part ofnpm run lint(and thereforenpm test). Every source module (97.jsfiles) has a.d.tsfile sitting next to it with type declarations. A handful of core files in lib and lib are also type-checked directly via JSDoc annotations; the rest have declarations available for consumers but their internals aren't checked yet.The tsconfig extends
@tsconfig/strictest,@tsconfig/node24, and@tsconfig/node-ts. We relaxstrictNullChecks,exactOptionalPropertyTypes, andnoPropertyAccessFromIndexSignaturebecause they don't work well with the existing JavaScript.New tests can be
.tsfiles — the mocha glob picks uptest/unit/**/*.{js,ts}.What got added
.d.tssidecar files (lib/, ghcrawler/, providers/, config/)@clearlydefined/spdx,painless-config,throat,geit)@types/packages for dependencies that do have themCrawlQueue,DocStore,StoredDocument,Handler,Locker,Logger.jsfilesWhat didn't change
No
.jsfiles were rewritten. The only behavioral change is a bug fix inTraversalPolicy.getShortForm()that was referencingthis.policy.freshnessinstead ofthis.freshness— caught by the type-checker, which is a nice advertisement for the whole exercise.Remaining
any18 across all
.d.tsfiles. They're things likepayload(genuinely arbitrary queue data),body(HTTP request body),adopt()which mutates__proto__on an arbitrary object, event emitter callback args, and index signatures on objects that get properties added dynamically in JS. I looked at each one and couldn't narrow it further without lying about the types.