Skip to content

Improvement: Sitemap Checker #49

@lovestaco

Description

@lovestaco

Bug Description

Existing sitemap checker is not upto par, developed with js.

The internal tool can be created with go or python & should do the following items below additional to current features.

Expected Behavior

- Check for response time for URL TTFB Time to First Byte

curl -o /dev/null -s -w "%{time_total}\n" https://hexmos.com/freedevtools/mcp/aggregators/1mcp--agent/
1.128393

It should be under 800ms

https://web.dev/articles/ttfb#good-ttfb-score

image

image

- No more than 50k sitemap urls

Sitemap checker should not let any individual sitemap to hold more than 50k urls

https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap#general-guidelines

image

- Check if Sitemap file UTF-8 encoded

https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap#:~:text=Sitemap%20file%20encoding%20and%20location

image

- No more than 50 MB uncompressed sitemap urls

When uncompressed it shouldn't take more then 50 MB

- Each page should not be morethan 15 MB

https://developers.google.com/search/docs/crawling-indexing/googlebot

image

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions