AI Bot Exposure Score
Estimate how exposed your content is to AI training crawlers
Frequently Asked Questions
What does the exposure score mean?
It estimates how likely your content is being scraped by AI training bots based on your robots.txt rules, sitemap freshness, and industry category. Higher = more exposed.
Is this based on actual traffic data?
No. No public API exists for third-party AI bot traffic. This is an inferential score based on public signals — it's clearly labeled as an estimate.
How can I reduce my exposure score?
Block known AI training bots in robots.txt, add an llms.txt file with clear usage guidelines, and consider WAF rules to block unauthorized crawlers.
Is higher or lower better?
Lower is better if you want to protect your content from AI training. Higher is fine if you want maximum AI search visibility (e.g., appearing in ChatGPT answers).
What factors affect my score?
robots.txt coverage, sitemap freshness, llms.txt presence, industry category, HTTPS security, and how many AI bots are explicitly managed.
Does this replace server log analysis?
No. For actual bot traffic data, you need server access logs. This tool provides a quick risk estimate without requiring server access.
Why can't you show actual bot traffic?
There is no public API to measure AI bot traffic for third-party sites. CrUX data doesn't segment by bot type. Only server log analysis can provide actual numbers.
What industries have highest exposure?
Content-heavy sites (news, blogs, educational sites, recipes) typically have the highest AI training exposure due to their textual richness.
Should e-commerce sites worry about exposure?
E-commerce sites benefit from AI search visibility (product recommendations). Focus on managing training opt-out while keeping search bots allowed.
How accurate is the estimate?
It's a directional indicator, not a precise measurement. Use it to identify areas for improvement, then verify with your server logs for exact numbers.