feat: add article parsing and handling to timeline and scraper modules by fibonacci998 · Pull Request #195 · the-convocation/twitter-scraper

fibonacci998 · 2026-05-25T02:52:58Z

feat: add article parsing and handling to timeline and scraper modules
test: add article parsing coverage to tweets.test.ts

Ports the article-extraction work from PR the-convocation#146 (LiamVDB1) onto current main. The previous PR drifted behind main since 2025-07-11 and never got its requested tests; this commit applies the same diff cleanly and follow-up commits add the tests + drop the unrelated `prepare` script change that broke CI for downstream consumers. Adds support for X "Articles" (long-form posts) inside the timeline data structure: * `ArticleRaw`, `ArticleResultRaw`, `ArticleContentStateRaw` interfaces in src/timeline-v1.ts representing the raw article payload, including metadata, media, and content state. * `parseArticleToMarkdown` and `parseArticle` in src/timeline-v2.ts that walk `content_state.blocks` and produce markdown (handling text, links, bold/italic, headers, lists, and inline media). * `parseResult` now detects `result.article.article_results.result` and, when present, sets `tweet.isArticle = true`, populates `tweet.article`, and overwrites `tweet.text` with the rendered markdown (since `legacy.full_text` for an Article tweet is just the t.co URL stub). * `Tweet` interface gains optional `isArticle` and `article` fields. Co-authored-by: LiamVDB1 <liam.van.den.berge@hotmail.com>

Addresses karashiiro's review request on PR the-convocation#146. Two tests against the public article tweet 2053808119709659225 (subnetamplify): * isArticle flag is set, article.id matches the article rest_id (not the tweet id — they are distinct), and content_state is populated. * tweet.text is replaced with the rendered markdown body, far larger than the t.co URL stub and starting with an H1 of the article title. Co-authored-by: LiamVDB1 <liam.van.den.berge@hotmail.com>

fibonacci998 and others added 2 commits May 25, 2026 09:48

fibonacci998 mentioned this pull request May 25, 2026

feat: add article parsing and handling to timeline and scraper modules #146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add article parsing and handling to timeline and scraper modules#195

feat: add article parsing and handling to timeline and scraper modules#195
fibonacci998 wants to merge 2 commits into
the-convocation:mainfrom
fibonacci998:feat/article-parsing

fibonacci998 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fibonacci998 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant