This is just using robots.txt and asking "pretty please, don’t scrape me".
Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...
If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.
If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?
Respect the robots.txt and don’t do it?
Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.
Oh for sure. When he wrote of the AI companies that are "stealing/crawling/hammering", you thought he meant the legitimate ones that do honor robots.txt. That makes sense.
Actually, it looks like all the major ones do honour robots.txt including perplexity. They seemingly get around it using google serps, so theyre not actually crawling or hammering the site servers (or even cloudflare).
https://www.ailawandpolicy.com/2025/10/anti-circumvention-re...