Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
548 commits
Select commit Hold shift + click to select a range
8698fc9
chore: implement tests for new methods
daveschumaker Apr 1, 2022
5df46fc
chore: create new methods split from standardizeArticle
daveschumaker Apr 1, 2022
b407e2e
chore: add new methods to parseFromHtml and delete unused util
daveschumaker Apr 1, 2022
60f5728
chore: update dist with latest build
daveschumaker Apr 1, 2022
bd04210
chore: update param documentation
daveschumaker Apr 1, 2022
a3b8c6d
Merge pull request #256 from daveschumaker/chore/transform-refactor
SettingDust Apr 3, 2022
4b02a34
chore: update dependencies (#257)
SettingDust Apr 6, 2022
c000fb6
v6.0.0
Apr 8, 2022
dd96b6c
Merge pull request #258 from ndaidong/v6.0.0
Apr 8, 2022
4442450
v6.0.1
May 28, 2022
93ff3e5
Merge pull request #263 from ndaidong/v6.0.1
May 28, 2022
b39c29b
fix: can't fetch html from document on browser
SettingDust Jun 30, 2022
1160ab9
Merge pull request #265 from SettingDust/main
Jul 1, 2022
7862d23
v6.0.2
Jul 1, 2022
b82c385
Merge pull request #267 from ndaidong/v6.0.2
Jul 1, 2022
effe71c
v6.0.2
Jul 1, 2022
d9d5ab3
Merge pull request #268 from ndaidong/v6.0.2
Jul 1, 2022
ee5e1c5
chore: update `urlpattern-polyfill`
SettingDust Jul 2, 2022
36aa2b5
Merge pull request #269 from SettingDust/fix/update_url_pattern_polyfill
Jul 3, 2022
2bf6fa4
v6.0.3
Jul 3, 2022
f8c3a47
v6.0.3 - Rebuild
Jul 3, 2022
e36d7a2
Merge pull request #270 from ndaidong/v6.0.3
Jul 3, 2022
b7f9024
v6.0.4
Jul 4, 2022
6b92b45
Merge pull request #271 from ndaidong/6.0.4
Jul 4, 2022
dd02490
v6.0.4
Jul 4, 2022
672f2ca
v6.0.4
Jul 4, 2022
25b4b25
Merge pull request #272 from ndaidong/6.0.4
Jul 4, 2022
cf64c25
v6.0.4
Jul 4, 2022
58a759b
Merge pull request #273 from ndaidong/6.0.4
Jul 4, 2022
a32d2d4
Update README
Jul 4, 2022
fb32038
Merge pull request #274 from ndaidong/6.0.4
Jul 4, 2022
1c932cb
Update README
Jul 4, 2022
f3560b4
v6.0.5
Jul 4, 2022
5074746
v6.0.5
Jul 4, 2022
0a70987
Merge pull request #275 from ndaidong/6.0.5
Jul 4, 2022
dd3434e
v6.0.6
Jul 5, 2022
e9e5492
Merge pull request #276 from ndaidong/6.0.6
Jul 5, 2022
0a37b26
v7.0.0rc1
Jul 9, 2022
36f74c7
Update README.md
Jul 9, 2022
f6e3ffd
v7.0.0rc2
Jul 9, 2022
a78e306
Merge pull request #278 from ndaidong/v7.x.x
Jul 9, 2022
5c6856d
v7.0.0rc3
Jul 11, 2022
df62e0b
v7.0.0rc3
Jul 11, 2022
c6394e6
Merge pull request #279 from ndaidong/7.0.0
Jul 11, 2022
cf786d5
Change method to deal with `source` and `description`
Jul 19, 2022
fc1e720
Merge pull request #280 from ndaidong/v7.x.x
Jul 19, 2022
67f8af1
v7.0.0
Jul 27, 2022
d415e55
Merge pull request #283 from ndaidong/v7.x.x
Jul 27, 2022
c3c5227
v7.0.1
Aug 12, 2022
fb45060
Update README
Aug 12, 2022
bc25c16
Merge pull request #285 from ndaidong/v7.x.x
Aug 12, 2022
8f2824b
v7.0.2
Sep 3, 2022
1151830
v7.0.2
Sep 3, 2022
864ae92
Merge pull request #289 from ndaidong/7.0.2
Sep 3, 2022
57716fb
v7.0.3
Sep 16, 2022
da8df03
Merge pull request #291 from ndaidong/7.0.3
Sep 16, 2022
e7935a6
v7.1.0 - To work with `bun` and `deno`
Sep 17, 2022
b5875c7
Update types definition
Sep 17, 2022
2ab8a99
Merge pull request #292 from ndaidong/7.1.0
Sep 17, 2022
0f0a39a
v7.1.1
Sep 17, 2022
13e1a10
v7.1.1
Sep 17, 2022
78774d1
v7.2.0rc1
Sep 17, 2022
411174f
Merge pull request #294 from ndaidong/7.2.0rc1
Sep 17, 2022
49d2551
Update README refer links
Sep 17, 2022
c80bfd0
v7.2.0rc2 - Rebuild
Sep 17, 2022
6a3b1d3
Update README
Sep 17, 2022
9c5ffd6
Merge pull request #295 from ndaidong/7.2.0rc2
Sep 17, 2022
f0aaa55
v7.2.0rc3
Sep 17, 2022
ed3b22c
Merge pull request #296 from ndaidong/7.2.0rc3
Sep 17, 2022
a8e6820
v7.2.0rc4
Sep 17, 2022
791038d
Merge pull request #297 from ndaidong/7.2.0rc4
Sep 17, 2022
6457a05
v7.2.0rc5
Sep 17, 2022
f7824a3
Merge pull request #298 from ndaidong/7.2.0rc5
Sep 17, 2022
1f168fa
Add examples with node, deno, bun, tsnode
Sep 17, 2022
17381e2
Remove bun.lockb
Sep 17, 2022
0b57d4d
Rebuild
Sep 17, 2022
d896abd
Merge pull request #299 from ndaidong/7.2.0rc6
Sep 17, 2022
f2ee351
v7.2.0
Sep 17, 2022
90692fd
Merge pull request #300 from ndaidong/7.2.0
Sep 17, 2022
6f4a342
Update examples
Sep 19, 2022
bcd6d36
Merge pull request #302 from ndaidong/7.2.0
Sep 19, 2022
bee67d2
v7.2.1-rc1
Sep 20, 2022
38b8c10
v7.2.1
Sep 20, 2022
64e308d
Merge pull request #303 from ndaidong/7.2.1
Sep 20, 2022
285f166
v7.2.2-rc1
Sep 23, 2022
d77afed
Update dependencies
Sep 23, 2022
df20aaf
Update README
Sep 23, 2022
32e46cb
v7.2.2-rc2
Sep 23, 2022
d96bddc
v7.2.2
Sep 23, 2022
22f4dab
Merge pull request #305 from ndaidong/7.2.2
Sep 23, 2022
226ea8e
v7.2.3
Sep 23, 2022
c938584
Merge pull request #306 from ndaidong/7.2.3
Sep 23, 2022
07fc012
Update README
Sep 23, 2022
4c6abe1
Merge pull request #307 from ndaidong/update-readme
Sep 23, 2022
30c3c2c
Add option to keep/remove line breaks
Sep 24, 2022
3e86b80
v7.2.4
Sep 24, 2022
0e80ba3
Merge pull request #308 from ndaidong/7.2.4
Sep 24, 2022
813cae1
v7.2.5
Nov 13, 2022
f915f00
Update README
Nov 13, 2022
96d2488
Add more specs for meta data extraction
Nov 13, 2022
efb6606
Add security policy
Nov 13, 2022
9c0e985
Add ci test with node 19.x
Nov 13, 2022
26a13e4
Update security policy.
Nov 13, 2022
ff452c6
Update security contact
Nov 13, 2022
cf0508e
Merge pull request #315 from ndaidong/7.2.5
Nov 13, 2022
e17ee38
Add contributing guide
Nov 13, 2022
b8f10d6
Merge pull request #316 from ndaidong/improve-docs
Nov 13, 2022
27fec5f
Update README
Nov 13, 2022
4da216b
Merge pull request #317 from ndaidong/update-readme
Nov 13, 2022
81febf4
Update SECURITY.md
Nov 13, 2022
8acc3d3
Merge pull request #318 from ndaidong/ndaidong-patch-1
Nov 13, 2022
42d661a
v7.2.6 - Migrate to extractus org
Nov 30, 2022
d41967b
Update README
Nov 30, 2022
edfcc1d
Update coveralls github action
Nov 30, 2022
f31c80f
Merge pull request #323 from extractus/7.2.6
Nov 30, 2022
47656e3
v7.2.7
Dec 2, 2022
a2c0c5a
Update CI settings
Dec 2, 2022
acf7749
Update CI config
Dec 2, 2022
0ca5f82
Fix CI settings
Dec 2, 2022
5fc37be
Update CI settings
Dec 2, 2022
ad69c4c
Update README
Dec 2, 2022
33008bb
Add image to docs
Dec 2, 2022
e4e3585
Update README
Dec 2, 2022
aabc0e7
Merge pull request #324 from extractus/7.2.7
Dec 2, 2022
e527003
v7.2.8
Jan 11, 2023
27afe61
Update README
Jan 11, 2023
4e3debb
Merge pull request #327 from extractus/7.2.8
Jan 11, 2023
7ead9a2
v7.2.9
Feb 20, 2023
ba62bad
Merge pull request #330 from extractus/7.2.9
Feb 20, 2023
3aa1d8f
v7.2.10
Mar 7, 2023
f727040
Merge pull request #332 from extractus/7.2.10
Mar 7, 2023
a1cbc21
Add null to response types
willwashburn Mar 10, 2023
2a0e5d1
Merge pull request #333 from willwashburn/main
Mar 12, 2023
0a923b8
v7.2.11
Mar 12, 2023
a79aa7f
Merge pull request #334 from extractus/7.2.11
Mar 12, 2023
bde8630
v7.2.12
Mar 28, 2023
88348a1
Update ci config
Mar 28, 2023
db5e9ce
Merge pull request #336 from extractus/7.2.12
Mar 28, 2023
59a49d6
v7.2.13rc1
Apr 11, 2023
7126aa7
v7.2.13
Apr 11, 2023
62de704
Merge pull request #337 from extractus/7.2.13
Apr 11, 2023
3831b1d
Rebuild v7.2.13
Apr 11, 2023
2715cd9
Merge pull request #338 from extractus/7.2.13
Apr 11, 2023
8b3d594
v7.2.14
Apr 18, 2023
87a9708
Merge pull request #340 from extractus/7.2.14
Apr 18, 2023
3c7d166
Change string array to dictionary
mphill Apr 27, 2023
d588ff8
Merge pull request #341 from mphill/main
Apr 28, 2023
472a3af
v7.2.15
May 6, 2023
e539578
Merge pull request #342 from extractus/7.2.15
May 6, 2023
1d56244
Merge pull request #343 from extractus/dev
May 6, 2023
c47cc1f
v7.2.15
May 6, 2023
7a51b44
Merge pull request #344 from extractus/7.2.15
May 6, 2023
1a50c6e
v7.2.16
May 21, 2023
575e911
Merge pull request #348 from extractus/7.2.16
May 21, 2023
8e7f229
Add favicon to meta data
LarchLiu Jun 30, 2023
a7a5f58
Merge pull request #350 from LarchLiu/main
Jul 1, 2023
3a57a91
GNU nano 6.4 …
Jul 1, 2023
e4065e0
Merge pull request #351 from extractus/dev
Jul 1, 2023
340a082
v7.2.17
Jul 1, 2023
f781bc4
v7.2.17
Jul 1, 2023
0e39547
v7.2.17
Jul 1, 2023
3e47e87
Merge pull request #352 from extractus/dev
Jul 1, 2023
a84705a
v7.2.18
Jul 5, 2023
4c1a49c
Merge pull request #353 from extractus/7.2.18
Jul 5, 2023
bfca881
v7.3.0
Jul 8, 2023
2ad0579
Update README
Jul 8, 2023
d335cc8
Merge pull request #354 from extractus/7.3.0
Jul 8, 2023
c538ef0
v8.0.0 - Bump version
Jul 12, 2023
a1c949a
Update README
Jul 12, 2023
9a6f4be
Merge pull request #355 from extractus/8.0.0
Jul 12, 2023
d6903b2
Update README
Jul 12, 2023
aaa1996
Merge pull request #356 from extractus/8.0.0
Jul 12, 2023
959a166
v8.0.1
Jul 22, 2023
fc63f10
Merge pull request #358 from extractus/8.0.1
Jul 22, 2023
6f55d94
Update dependencies
Aug 16, 2023
c0c7c1d
Use `childNodes` instead of `children`
Aug 16, 2023
9355b4e
Update README
Aug 16, 2023
b9cb636
Merge pull request #361 from extractus/8.0.2
Aug 16, 2023
736027d
Fix ParserOptions typing
ranmocy Oct 4, 2023
bd877f3
Merge pull request #369 from ranmocy/patch-1
Oct 5, 2023
0cda796
v8.0.3
Oct 5, 2023
65efbec
Stop ci test with node < 16 because EOL
Oct 5, 2023
22e04cc
Merge pull request #370 from extractus/v8.0.3
Oct 5, 2023
2fe4d72
Feat: extract pagetype from og:type or ld+json
andremacola Dec 4, 2023
f84aec2
Merge pull request #374 from andremacola/main
Dec 5, 2023
0fd6c66
v8.0.8
Dec 5, 2023
2ec2573
Update examples
Dec 5, 2023
986a409
Merge pull request #375 from extractus/dev
Dec 5, 2023
901d1cf
v8.0.5
Jan 22, 2024
9050184
Fix CI issue with coverall
Jan 22, 2024
3159ad8
Fix CI issue
Jan 22, 2024
84d5bd0
Fix CI problem
Jan 22, 2024
5b50e44
Change ci event
Jan 22, 2024
a11aa53
Update CI event
Jan 22, 2024
376e8bd
Fix CI problem
Jan 22, 2024
260934b
Fix CI issue
Jan 22, 2024
b438895
Fix CI coverall
Jan 22, 2024
660bc92
Merge pull request #379 from extractus/dev
Jan 22, 2024
d46be89
v8.0.6
Feb 13, 2024
05a6cf1
Merge pull request #381 from extractus/dev
Feb 13, 2024
a8a0e51
v8.0.7
Mar 29, 2024
d2d3834
Merge pull request #385 from extractus/8.0.7
Mar 29, 2024
d616100
v8.0.8
Apr 26, 2024
52bcdbf
Add node 22 to ci
Apr 26, 2024
1d216af
Merge pull request #387 from extractus/8.0.8
Apr 26, 2024
32a17a2
Update examples & test with pupperteer
May 7, 2024
b68c8db
v8.0.9
May 7, 2024
5b0d741
Merge pull request #390 from extractus/8.0.9
May 7, 2024
17c3bb3
v8.0.10
May 7, 2024
4f0e78d
Merge pull request #391 from extractus/8.0.10
May 7, 2024
f631b60
chore: Improvements in handling LD+JSON data
andremacola Oct 14, 2024
bf44188
v8.0.11
Oct 14, 2024
34c58f1
Merge pull request #401 from extractus/8.0.11
Oct 14, 2024
2585443
Merge pull request #400 from andremacola/jdon-ld-improvements
Oct 14, 2024
7b5eb42
Add test coverage
Oct 14, 2024
588f6ff
Merge pull request #402 from extractus/8.0.11
Oct 14, 2024
64c1c0c
fix: Cannot read properties of undefined in ld+json
andremacola Oct 15, 2024
eb52a78
fix: more tests on ld+json
andremacola Oct 15, 2024
a7c00d2
Merge branch 'extractus:main' into jdon-ld-improvements
andremacola Oct 15, 2024
37cae76
Merge pull request #403 from andremacola/jdon-ld-improvements
Oct 15, 2024
b7de5c4
v8.0.12
Oct 15, 2024
39616dc
Merge pull request #404 from extractus/8.0.12
Oct 15, 2024
2c9ee23
Improvements to find dates
andremacola Oct 18, 2024
ec121f9
Merge pull request #405 from andremacola/find-date
Oct 18, 2024
cc89afc
v8.0.13
Oct 18, 2024
5accfa6
Merge pull request #406 from extractus/8.0.13
Oct 18, 2024
2fee686
v8.0.14
Oct 19, 2024
6355d23
Merge pull request #408 from extractus/8.0.14
Oct 19, 2024
ad42f58
fix: adjustment of poorly formatted ldjson error
andremacola Oct 25, 2024
20f34a8
Merge pull request #410 from andremacola/fix/ldjson/null
Oct 26, 2024
bc624f2
v8.0.15
Oct 26, 2024
e15b3af
Merge pull request #411 from extractus/8.0.15
Oct 26, 2024
bc631c5
v8.0.16
Nov 9, 2024
197a2b5
Merge pull request #413 from extractus/8.0.16
Nov 9, 2024
0884cc1
v8.0.17
Feb 9, 2025
2868193
Update eval script
Feb 9, 2025
e877084
Merge pull request #415 from extractus/8.0.17
Feb 9, 2025
e2e2257
8.0.18
May 4, 2025
938aef4
Update README
May 4, 2025
c3607df
Update README
May 4, 2025
1da50a5
Merge pull request #417 from extractus/8.0.18
May 4, 2025
1e401a9
v8.0.19
May 14, 2025
648d2ed
Add test with node 24
May 14, 2025
0dd6739
Merge pull request #418 from extractus/8.0.19
May 14, 2025
77fe8a8
v8.0.20 - Update dependencies
Sep 4, 2025
26778c1
Remove examples
Sep 4, 2025
69baf8d
v8.0.20 - Update packages
Sep 4, 2025
9fa9d66
Merge pull request #422 from extractus/8.0.20
Sep 4, 2025
c3f8ab6
Merge extractus/main into merge-extractus-main, preferring upstream c…
Nov 26, 2025
dd233e3
chore: rename package to @arbitral/article-parser and tidy eslint glo…
Nov 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 7 additions & 14 deletions .github/workflows/ci-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,33 @@ on: [push, pull_request]
jobs:
test:

runs-on: ubuntu-20.04
runs-on: ubuntu-latest

strategy:
matrix:
node_version: [14.x, 15.x, 16.x, 17.x, 18.x]
node_version: [20.x, 22.x, 24.x]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: setup Node.js v${{ matrix.node_version }}
uses: actions/setup-node@v2
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node_version }}

- name: run npm scripts
env:
PROXY_SERVER: ${{ secrets.PROXY_SERVER }}
run: |
npm i -g standard
npm install
npm run lint
npm run build --if-present
npm run test

- name: sync to coveralls
uses: coverallsapp/github-action@v1.1.2
with:
github-token: ${{ secrets.GITHUB_TOKEN }}

- name: cache node modules
uses: actions/cache@v2
uses: actions/cache@v4
with:
path: ~/.npm
key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-



2 changes: 1 addition & 1 deletion .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,8 @@ coverage
yarn.lock
coverage.lcov
pnpm-lock.yaml
lcov.info

dist/
deno.lock

evaluation
23 changes: 6 additions & 17 deletions .npmignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,7 @@
node_modules/
src/
test-data/
.idea/
coverage/
.vscode/

.DS_Store
yarn.lock
coverage.lcov
node_modules
coverage
.github
pnpm-lock.yaml

*.js
*.cjs
*.js.map

!dist/**/*.js
!index.js
examples
test-data
lcov.info
71 changes: 71 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Contributing to `@extractus/article-extractor`

Glad to see you here.

Collaborations and pull requests are always welcomed, though larger proposals should be discussed first.

As an OSS, it's better to follow the Unix philosophy: "do one thing and do it well".

## Third-party libraries

Please avoid using libaries other than those available in the standard library, unless necessary.

This library needs to be simple and flexible to run on multiple platforms such as Deno, Bun, or even browser.


## Coding convention

Make sure your code lints before opening a pull request.


```bash
cd article-extractor

# check coding convention issue
npm run lint

# auto fix coding convention issue
npm run lint:fix
```

*When you run `npm test`, the linting process will be triggered at first.*


## Testing

Be sure to run the unit test suite before opening a pull request. An example test run is shown below.

```bash
cd article-extractor
npm test
```

![article-extractor unit test](https://i.imgur.com/TbRCUSS.png?110222)

If test coverage decreased, please check test scripts and try to improve this number.


## Documentation

If you've changed APIs, please update README and [the examples](examples).


## Clean commit histories

When you open a pull request, please ensure the commit history is clean.
Squash the commits into logical blocks, perhaps a single commit if that makes sense.

What you want to avoid is commits such as "WIP" and "fix test" in the history.
This is so we keep history on master clean and straightforward.

For people new to git, please refer the following guides:

- [Writing good commit messages](https://github.com/erlang/otp/wiki/writing-good-commit-messages)
- [Commit Message Guidelines](https://gist.github.com/robertpainsi/b632364184e70900af4ab688decf6f53)


## License

By contributing to `@extractus/article-extractor`, you agree that your contributions will be licensed under its [MIT license](LICENSE).

---
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2016 Dong Nguyen
Copyright (c) 2016 Extractus

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
Loading
Loading