Skip to content

Page Scanner sends the wrong URL for traditional pages — uses admin origin instead of the page's host #35626

@zJaaal

Description

@zJaaal

Problem Statement

When the Page Scanner (Geo or A11y check) is opened on a traditional page, the URL the editor sends to the scanner is built from the dotCMS admin's origin (window.location.origin) plus the page path, rather than the page's actual host.

In dotCMS the same path can exist on multiple hosts and represent different pages — e.g. /about-us on siteA.example.com is a different page from /about-us on siteB.example.com. The current behavior collapses every traditional page's URL to "admin origin + path", so the external scanner fetches and analyzes the wrong URL.

The bug is silent: the request succeeds, the scanner returns a real-looking report, but it's describing a different page than the one the user is editing. There's no error or warning in the UI.

Where it happens

  • core-web/libs/portlets/edit-ema/portlet/src/lib/dot-ema-shell/dot-ema-shell.component.ts — line 325, URL assembled as ${requestHostName}${currentUrl} before being passed to pageScanner.open.
  • core-web/libs/portlets/edit-ema/portlet/src/lib/utils/index.ts — line 797, getRequestHostName(params) returns params.clientHost || window.location.origin. For headless pages clientHost is set, so the URL is correct. For traditional pages clientHost is undefined and the function falls back to window.location.origin.

Scope

  • Affects traditional pages only. Headless pages send clientHost and are not affected.
  • Affects every multi-host dotCMS install where pages share paths across hosts (very common).
  • Affects both the Geo Check and A11y Check tools, since both go through the same URL-construction path.

Browser / OS: Any.

Steps to Reproduce

  1. In a dotCMS instance with at least two hosts (siteA and siteB) accessed via a single admin host (e.g. admin.example.com).
  2. Create a traditional page at /about-us on siteB.
  3. Open it in the UVE editor and trigger Page Scanner → Geo Check (or A11y Check).
  4. Inspect the network request to /api/v1/page-scanner/geo/check. The url field in the body will be https://admin.example.com/about-us instead of https://siteB.example.com/about-us.
  5. Observe that the scan response describes the wrong page (whatever lives at the admin host's /about-us, or a 404).

Expected: The URL submitted to the scanner identifies the actual page being edited, including its real host — so the external scanner fetches the same page the editor is showing.

Actual: For traditional pages the URL is built from the dotCMS admin's origin, which has no relation to the page's host. Two pages on different hosts that share a path become indistinguishable to the scanner.

Acceptance Criteria

  • When the Page Scanner is opened on a traditional page hosted on a non-admin host, the url sent in the body of /api/v1/page-scanner/geo/check and /api/v1/page-scanner/a11y/check resolves to the page's actual host (not the dotCMS admin's origin).
  • The repro scenario above produces a scan report that describes the page being edited, not a different page on the admin host.
  • Two traditional pages with the same path on different hosts produce distinct scan reports — they are no longer indistinguishable to the scanner.
  • Headless behavior is unchanged: clientHost-based URL construction continues to work for headless pages and the URL sent matches the page being edited.
  • The fix covers both the Geo Check and A11y Check entry points, since both go through the same URL-construction path.

dotCMS Version

Latest from main branch (reproduced on issue-35514-uve-iframe-sizing-phase-1; same code path is on main).

Severity

Medium - Some functionality impacted

Links

NA

Notes for triage

  • Backend endpoint (PageScannerResource.geoCheck / a11yCheck) is a proxy: it forwards whatever URL the FE provides to the upstream scanner SaaS without modification. The current host is read from the request, but only to look up the Page Scanner app's secrets — it does not influence the URL that gets scanned.
  • Open question for the fix discussion (not part of the bug itself): whether the right URL should be constructed client-side from page-asset data, or whether the BE proxy should resolve the host server-side from a host identifier in the request body — different trade-offs around trust boundary and validation.

Metadata

Metadata

Type

No fields configured for Bug.

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions