Skip to content

Fix sitemap discovery regression#82

Merged
dacharyc merged 2 commits intomainfrom
fix/sitemap-discovery-regression
May 2, 2026
Merged

Fix sitemap discovery regression#82
dacharyc merged 2 commits intomainfrom
fix/sitemap-discovery-regression

Conversation

@dacharyc
Copy link
Copy Markdown
Member

@dacharyc dacharyc commented May 2, 2026

Fixes issue #81; sitemap discovery fallbacks were missing sitemap-index.xml and sitemap_index.xml variants used in real docs sites. Adding them here to resolve this issue.

Also noted many slow tests related to unlocked URLs attempting real network calls and timing out, so correctly applying mocks in those cases here.

@dacharyc dacharyc merged commit f760c3a into main May 2, 2026
2 checks passed
@dacharyc dacharyc deleted the fix/sitemap-discovery-regression branch May 2, 2026 16:14
philip pushed a commit to philip/afdocs that referenced this pull request May 7, 2026
Sitemap entries are commonly published on the bare-host canonical
(e.g. https://swift.org/...) even when the served site is www.swift.org.
The strict `origin !==` comparison in shouldInclude() and scopeUrls()
discarded every such URL, causing afdocs to fall back to single-page
sampling and trigger the single-page-sample diagnostic.

PR agent-ecosystem#82 already added the right sitemap discovery candidates, so the
root sitemap was being fetched — its URLs were just being filtered
out before they could be used.

Fix: introduce isSameOriginIgnoringWww() (built on the existing
isWwwVariant helper) and use it in both filter sites. Adds tests
covering both directions of www mismatch and a regression test
confirming truly cross-host URLs are still rejected.

Fixes agent-ecosystem#83.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant