testing another optimization for the url checker by vsoch · Pull Request #769 · USRSE/usrse.github.io

vsoch · 2022-03-28T11:23:58Z

this optimization does not check duplicate urls between workers. So we might see
a tiny speedup given redudant urls across files! If this works well, it will be a new urlchecker release to replace the one in #768 .

Yeah, had a lot of fun this weekend and it still seems to be going strong! 😆

Signed-off-by: vsoch vsoch@users.noreply.github.com

Description

Motivation and Context

Checklist:

I have posted the link for the PR in the usrse slack (#website) to ask for reviewers
I have previewed changes locally
I have updated the README.md if necessary

cc @usrse-maintainers

this optimization does not check duplicate urls between workers. So we might see a tiny speedup given redudant urls across files! Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch · 2022-03-28T11:32:19Z

@SuperKogito looks like we have a bug where it's not honoring the exclude list - I'll take a look later today!

SuperKogito · 2022-03-28T11:33:37Z

oh okay 👍 I will go over the code again later too

vsoch · 2022-03-28T11:34:19Z

Actually just found the bug! Going to rebuild now and run again!

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch · 2022-03-28T11:46:11Z

okay just ran it thrice - 38 seconds 46 and 53 seconds (the second with a failure). For the other PR I saw 36 and 44 seconds. So arguably the change is trivial, at least for the repository here. I think this particular speedup would be more apparent for webby content with huge duplication between files! For our USRSE site I think we have tiny duplication, but it likely doesn't extend beyond the time of the longest worker.

Overall, I'm not convinced this is an improvement - the time it takes to parse the files twice seems to make it equal if not worse! Let's keep the PR open for now, hopefully we can merge #768 and get good testing of our new multiprocessing, and given how that does over a few weeks we can go from there for next steps.

Going back to bed! Chat later, thanks for the fun :)

vsoch added 2 commits March 28, 2022 05:22

testing another optimization for the url checker

2abb2e5

this optimization does not check duplicate urls between workers. So we might see a tiny speedup given redudant urls across files! Signed-off-by: vsoch <vsoch@users.noreply.github.com>

do not check old zoom meeting urls

5caeec6

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

run again with updated container

9e61dc7

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

vsoch marked this pull request as draft March 28, 2022 11:47

vsoch closed this May 10, 2022

danielskatz deleted the test/unique-url-optimization branch December 6, 2022 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing another optimization for the url checker#769

testing another optimization for the url checker#769
vsoch wants to merge 3 commits intomainfrom
test/unique-url-optimization

vsoch commented Mar 28, 2022

Uh oh!

vsoch commented Mar 28, 2022

Uh oh!

SuperKogito commented Mar 28, 2022

Uh oh!

vsoch commented Mar 28, 2022

Uh oh!

vsoch commented Mar 28, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

vsoch commented Mar 28, 2022

Description

Motivation and Context

Checklist:

Uh oh!

vsoch commented Mar 28, 2022

Uh oh!

SuperKogito commented Mar 28, 2022

Uh oh!

vsoch commented Mar 28, 2022

Uh oh!

vsoch commented Mar 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

vsoch commented Mar 28, 2022 •

edited

Loading