Dark AI Crawler Detector
Discover rogue and undisclosed AI crawlers that may be scraping your content
Frequently Asked Questions
What are dark AI crawlers?
Dark crawlers are bots that scrape websites for AI training without clearly identifying themselves or honoring robots.txt restrictions.
How can I block Bytespider?
Add "User-agent: Bytespider\nDisallow: /" to your robots.txt. Bytespider is ByteDance's aggressive crawler used for AI model training.
Where does the dark bot data come from?
We aggregate data from darkvisitors.com, security research publications, and community-reported crawler databases.
Can dark crawlers ignore my robots.txt?
Yes. Robots.txt is a voluntary standard. Some rogue crawlers do not respect it. Server-level blocking (WAF, Cloudflare) is more effective.
How do I block crawlers at the server level?
Use your web server's config (nginx/Apache) or a WAF like Cloudflare to block specific user-agent strings or IP ranges associated with rogue crawlers.
Are all non-Google bots bad?
No. Many bots serve legitimate purposes. Tools like SemrushBot and AhrefsBot are standard SEO tools. Our risk assessment helps distinguish between legitimate and risky bots.
What risk levels do you use?
HIGH: Known AI training bots with poor transparency. MEDIUM: Commercial crawlers with AI training components. LOW: Legitimate SEO tools that are transparent about their purpose.
Can I report a new dark crawler?
Yes. Open a GitHub issue with the user-agent string, IP range, and any evidence of unauthorized crawling behavior. We verify and add to our database.
Does this tool scan my server logs?
No. We check your public robots.txt to see if you've blocked known dark crawlers. We do not access server logs or private data.
How is PetalBot related to AI?
PetalBot is Huawei's search crawler. While primarily a search engine bot, it also feeds data to Huawei's AI services, making it relevant for AI content protection.