-
Notifications
You must be signed in to change notification settings - Fork 4
EHR Link Crawler Test #997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| // scope this to admin, ehr folder and subfolders | ||
| if (!decodedHref.contains(getContainerPath()) && !decodedHref.startsWith("/admin")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Crawler.ControllerActionId will parse out the controller, action, and containerPath of a URL (new Crawler.ControllerActionId(href)). Note that it does not include a leading slash like getContainerPath() does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| } | ||
|
|
||
| // scope this to admin, ehr folder and subfolders | ||
| if (!decodedHref.contains(getContainerPath()) && !decodedHref.startsWith("/admin")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this crawl admin actions? Our regular crawler should definitely hit those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to just validate that links in the EHR pointing to admin are valid URLs. I just added admin to the default urls to skip crawling, but it will continue to validate the links to there from the EHR.
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
ehr/test/src/org/labkey/test/tests/ehr/AbstractGenericEHRTest.java
Outdated
Show resolved
Hide resolved
|
|
||
| try | ||
| { | ||
| openLinkInNewWindowOrThrow(anchor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What sort of links don't open successfully? This seems like an error state.
If there are links that we expect to not open, we should filter them out earlier.
Waiting for a thing that won't happen can waste a lot of time (switchToWindow waits ten seconds for the new tab to appear)
Opening these links at all in these phase seems inefficient. validatePageLinks also navigates to the links, so we're hitting each link twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So our pages have lots of anchor tags on them that aren't links, and the ones that are links can have a href, onclick handler or click listener added a variety of ways in JS. Instead of trying to pre-determine which anchor tags are actual links, this clicks on the anchor tags and sees if a new window opens. That's why I added the throw immediately in openLinkInNewWindow, to not wait ten seconds for a new window.
We are hitting each link twice. I'm trying to do somewhat of a breadth first search validating the links on one page at a time (easier logging and troubleshooting) and iterating through the anchor tags to see which are actual links. Once determined if its an actual link then the URL gets queued up for crawling. I don't think it will work to queue up a bunch of anchor tag webelements then navigate away. It needs to determine what anchor tags are actual links for crawling. I'm seeing this test taking 7-12 min. A little longer but it's pretty comprehensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipping the dataset view is making these run a couple minutes faster.
…java Co-authored-by: Trey Chadick <tchad@labkey.com>
…java Co-authored-by: Trey Chadick <tchad@labkey.com>
…java Co-authored-by: Trey Chadick <tchad@labkey.com>
| private String validLink(WebElement anchor) | ||
| { | ||
| String href = anchor.getDomAttribute("href"); | ||
| if (href != null && !href.startsWith("#") && !href.equalsIgnoreCase("undefined")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<a href="undefined"> definitely seems like an error case. Maybe the NightlyTestServer module property needs to be set or ehrBegin.html should handle it being unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did update the WNPRC test to set NightlyTestServer, unfortunately it's not the only place where this happens. The issue here is Crawler.ControllerActionId doesn't handle this and throws a low level exception (NPE or something can't remember). This will still throw an error just doesn't try to parse apart the href.
Rationale
Core EHR test that crawls links in the EHR folder or subfolders to check for broken links. Help find CSP violations and provide a new test to ensure links expected to work do not have regressions.
Related Pull Requests
Changes