Skip to content

Conversation

@labkey-martyp
Copy link
Contributor

@labkey-martyp labkey-martyp commented Jul 28, 2025

Rationale

Core EHR test that crawls links in the EHR folder or subfolders to check for broken links. Help find CSP violations and provide a new test to ensure links expected to work do not have regressions.

Related Pull Requests

Changes

  • Add testCrawlEhrLinks
  • Add skip links known to cause trouble in test setups
  • Update EHRApp test skip links

}

// scope this to admin, ehr folder and subfolders
if (!decodedHref.contains(getContainerPath()) && !decodedHref.startsWith("/admin"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crawler.ControllerActionId will parse out the controller, action, and containerPath of a URL (new Crawler.ControllerActionId(href)). Note that it does not include a leading slash like getContainerPath() does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

}

// scope this to admin, ehr folder and subfolders
if (!decodedHref.contains(getContainerPath()) && !decodedHref.startsWith("/admin"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this crawl admin actions? Our regular crawler should definitely hit those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to just validate that links in the EHR pointing to admin are valid URLs. I just added admin to the default urls to skip crawling, but it will continue to validate the links to there from the EHR.


try
{
openLinkInNewWindowOrThrow(anchor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of links don't open successfully? This seems like an error state.
If there are links that we expect to not open, we should filter them out earlier.
Waiting for a thing that won't happen can waste a lot of time (switchToWindow waits ten seconds for the new tab to appear)

Opening these links at all in these phase seems inefficient. validatePageLinks also navigates to the links, so we're hitting each link twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our pages have lots of anchor tags on them that aren't links, and the ones that are links can have a href, onclick handler or click listener added a variety of ways in JS. Instead of trying to pre-determine which anchor tags are actual links, this clicks on the anchor tags and sees if a new window opens. That's why I added the throw immediately in openLinkInNewWindow, to not wait ten seconds for a new window.

We are hitting each link twice. I'm trying to do somewhat of a breadth first search validating the links on one page at a time (easier logging and troubleshooting) and iterating through the anchor tags to see which are actual links. Once determined if its an actual link then the URL gets queued up for crawling. I don't think it will work to queue up a bunch of anchor tag webelements then navigate away. It needs to determine what anchor tags are actual links for crawling. I'm seeing this test taking 7-12 min. A little longer but it's pretty comprehensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping the dataset view is making these run a couple minutes faster.

private String validLink(WebElement anchor)
{
String href = anchor.getDomAttribute("href");
if (href != null && !href.startsWith("#") && !href.equalsIgnoreCase("undefined"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<a href="undefined"> definitely seems like an error case. Maybe the NightlyTestServer module property needs to be set or ehrBegin.html should handle it being unset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did update the WNPRC test to set NightlyTestServer, unfortunately it's not the only place where this happens. The issue here is Crawler.ControllerActionId doesn't handle this and throws a low level exception (NPE or something can't remember). This will still throw an error just doesn't try to parse apart the href.

@labkey-martyp labkey-martyp merged commit 46a1422 into release25.7-SNAPSHOT Aug 3, 2025
6 of 7 checks passed
@labkey-martyp labkey-martyp deleted the 25.7_fb_ehr_link_crawler branch August 3, 2025 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants