-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
The number of identified authors can be counted with P50:
$ zcat 20171120/wikidata-20171120-publications.ndjson.gz | \
jq .claims.P50[]? -r | uniq | sort | uniq | wc -l
120821
this takes 7 minutes to run on my machine. Indexing the whole dataset in a database should be faster and more flexible for additional analytics. For instance the number of identified author statements:
$ zcat 20171120/wikidata-20171120-publications.ndjson.gz | jq .claims.P50[]? -r | wc -l
974191
The number of unidentified author statements with P2093 can be counted in the same way:
$ zcat 20171120/wikidata-20171120-publications.ndjson.gz | jq .claims.P2093[]? -r | wc -l
43206518
Metadata
Metadata
Assignees
Labels
No labels