testing another optimization for the url checker#769
Conversation
this optimization does not check duplicate urls between workers. So we might see a tiny speedup given redudant urls across files! Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
|
@SuperKogito looks like we have a bug where it's not honoring the exclude list - I'll take a look later today! |
|
oh okay 👍 I will go over the code again later too |
|
Actually just found the bug! Going to rebuild now and run again! |
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
|
okay just ran it thrice - 38 seconds 46 and 53 seconds (the second with a failure). For the other PR I saw 36 and 44 seconds. So arguably the change is trivial, at least for the repository here. I think this particular speedup would be more apparent for webby content with huge duplication between files! For our USRSE site I think we have tiny duplication, but it likely doesn't extend beyond the time of the longest worker. Overall, I'm not convinced this is an improvement - the time it takes to parse the files twice seems to make it equal if not worse! Let's keep the PR open for now, hopefully we can merge #768 and get good testing of our new multiprocessing, and given how that does over a few weeks we can go from there for next steps. Going back to bed! Chat later, thanks for the fun :) |
this optimization does not check duplicate urls between workers. So we might see
a tiny speedup given redudant urls across files! If this works well, it will be a new urlchecker release to replace the one in #768 .
Yeah, had a lot of fun this weekend and it still seems to be going strong! 😆
Signed-off-by: vsoch vsoch@users.noreply.github.com
Description
Motivation and Context
Checklist:
cc @usrse-maintainers