HN via remix.js for vilnius.js

by james2doyle 11 hours ago

This is just using robots.txt and asking "pretty please, don’t scrape me".

Here is an article (from TODAY) about the case where Perplexity is being accused of ignoring robots.txt: https://www.theverge.com/news/839006/new-york-times-perplexi...

If you think a robots.txt is the answer to stopping the billion-dollar AI machine from scraping you, I don’t know what to say.

Aeolun 4 hours ago | [-1 more]

If someone has a robots.txt, and I want to request their page, but I want to do that in an automated way, should I open the browser to do it instead of issue a curl request? How about if I am going to ask claude to fetch the page for me?

kentm 2 hours ago | [-0 more]

Respect the robots.txt and don’t do it?

cpncrunch 5 hours ago | [-2 more]

Yes, I was referring to legitimate companies, and Perplexity doesn't seem to be one of those.

albedoa an hour ago | [-1 more]

Oh for sure. When he wrote of the AI companies that are "stealing/crawling/hammering", you thought he meant the legitimate ones that do honor robots.txt. That makes sense.

cpncrunch 10 minutes ago | [-0 more]

Actually, it looks like all the major ones do honour robots.txt including perplexity. They seemingly get around it using google serps, so theyre not actually crawling or hammering the site servers (or even cloudflare).

https://www.ailawandpolicy.com/2025/10/anti-circumvention-re...