Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Post History
At what extent can we block "crawlers" and the like from stealing site content? What is technically possible? We can block at least the OpenAI crawler and the Google-Extended crawler (for Gemi...
Answer
#2: Post edited
- > At what extent can we block "crawlers" and the like from stealing site content? What is technically possible?
- We can block at least the OpenAI crawler and the Google-Extended crawler (for Gemini) through the `robots.txt` file. We've been discussing this in the admin room for the past few days, and while nothing has been done as of yet, the general sentiment has been leaning towards blocking these AI crawlers.
If the community indicates support for such a move, we'll most likely block AI crawlers to the extent possible, at least for crawlers that we're aware of and have documented methods of blocking. (We don't want to block _all_ crawlers, since that would mess up e.g. the Wayback Machine and search engines.)
- > At what extent can we block "crawlers" and the like from stealing site content? What is technically possible?
- We can block at least the OpenAI crawler and the Google-Extended crawler (for Gemini) through the `robots.txt` file. We've been discussing this in the admin room for the past few days, and while nothing has been done as of yet, the general sentiment has been leaning towards blocking these AI crawlers.
- If the community indicates support for such a move, we'll most likely block AI crawlers to the extent possible, at least for crawlers that we're aware of and have documented methods of blocking. (We don't want to block _all_ crawlers, since that would mess up e.g. the Wayback Machine and search engines.)
- **Update:** Cloudflare [added the ability to block known LLM bots](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and we have enabled this for our network.
#1: Initial revision
> At what extent can we block "crawlers" and the like from stealing site content? What is technically possible? We can block at least the OpenAI crawler and the Google-Extended crawler (for Gemini) through the `robots.txt` file. We've been discussing this in the admin room for the past few days, and while nothing has been done as of yet, the general sentiment has been leaning towards blocking these AI crawlers. If the community indicates support for such a move, we'll most likely block AI crawlers to the extent possible, at least for crawlers that we're aware of and have documented methods of blocking. (We don't want to block _all_ crawlers, since that would mess up e.g. the Wayback Machine and search engines.)