Sitemap & Robots Health Check
Validate your sitemap.xml and robots.txt for AI crawler compatibility
Frequently Asked Questions
What does this tool check?
It validates your robots.txt syntax, sitemap.xml structure, checks for AI-specific directives, and verifies that your sitemap is referenced in robots.txt.
Why is sitemap important for AI crawlers?
AI crawlers use sitemaps to discover your content efficiently. A well-structured sitemap with lastmod dates helps AI search engines index your content faster.
What is a healthy robots.txt?
A healthy robots.txt has clear bot-specific rules, references your sitemap, includes crawl-delay for AI bots, and explicitly manages training opt-out.
How many URLs should my sitemap have?
The standard recommends up to 50,000 URLs per sitemap file. Use a sitemap index for larger sites. We sample up to 500 URLs for analysis.
Does lastmod matter?
Yes, lastmod dates help AI crawlers prioritize fresh content. Missing lastmod dates are flagged as a health issue in our report.
Should I add priority values?
While Google ignores priority values, AI crawlers may use them for content prioritization. Adding them is a low-effort improvement.
What if I have no robots.txt?
Without a robots.txt, all bots (including AI training crawlers) have full access to your site by default. We recommend creating one to manage access.
Can I validate multiple sitemaps?
We detect sitemap index files automatically and validate the referenced child sitemaps. All linked sitemaps are included in the health check.
What encoding should my sitemap use?
Sitemaps should use UTF-8 encoding and valid XML syntax. We check for XML parsing errors and report them in the health check.
How do I reference my sitemap in robots.txt?
Add "Sitemap: https://yourdomain.com/sitemap.xml" at the bottom of your robots.txt. This is the standard way for crawlers to discover your sitemap.