Codidact Meta

+10

−1

At what extent can we block "crawlers" and the like from stealing site content? What is technically possible?

We can block at least the OpenAI crawler and the Google-Extended crawler (for Gemini) through the robots.txt file. We've been discussing this in the admin room for the past few days, and while nothing has been done as of yet, the general sentiment has been leaning towards blocking these AI crawlers.

If the community indicates support for such a move, we'll most likely block AI crawlers to the extent possible, at least for crawlers that we're aware of and have documented methods of blocking. (We don't want to block all crawlers, since that would mess up e.g. the Wayback Machine and search engines.)

Update: Cloudflare added the ability to block known LLM bots and we have enabled this for our network.

posted about 1 year ago

CC BY-NC-SA 4.0

10mo ago by Monica Cellio‭

Mithical‭ staff

23 70 672 105

Copy Link

Raw

Markdown

History

2 comment threads

So we'll block only responsible bots (3 comments)

Do it. (2 comments)

Communities

Comments on What can be done to block Codidact content from getting used by crawlers/for AI training?

What can be done to block Codidact content from getting used by crawlers/for AI training?

1 comment thread

2 comment threads