Skip to content

Conversation

@MrElyazid
Copy link

Some improvements for the database build fix :

First off nice work! you managed to fix the build script with minimal changes. I added some improvements for the work done by :

  • invoking the python interpreter using python3 and not python which is the standard on lots of *nix machines especially Debian based
  • one sort command had sort -S 100% causing the script to exit silently due to the os killing it for consuming lots of resources, changed it to 80% ( tho in my case when i built the database locally i used -S 4G for the sort commands )
  • the combine_grouped_links_file.py was loading the whole text files in a dictionary in memory, i changed that by streaming the text and sorting from the two text files instead, this helps a lot for RAM consumption ( the build process is still fast )
  • I think u forgot a True keyword in line 40 in prune_pages_file.py, i removed it and corrected the condition ( i generated the database with and without this fix to test and the pages table had a little less rows after correcting the condition )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant