Codidact Meta

−1

There is no technical measure that could possibly guarantee that we won't get scraped. It comes down to symbolic gestures and hoping they will comply voluntarily. I do think we should do as many "symbolic gestures" as possible.

Indicate in robots.txt that we don't want AI crawlers
If there's any heuristic services like what Mithical mentioned for Cloudflare, enable them. I don't think it's worth putting too much effort into writing our own, the scrapers will win that arms race. But just using an existing service allows us to make their life harder with little cost to us.
The licensing terms should be updated to say "you may not use the answers to train AIs". This will make the bigger projects avoid us, because their legal department will complain.

These don't actually stop anyone from scraping us, but they make us a less preferential target. While we're small, we become "small risk, tiny reward" and they'll go for other sites that are no risk, small reward. When we're bigger, it ceases to be a technical problem, because they will attempt to bribe or coerce the site admins to do it clandestinely.

posted 10 months ago

CC BY-SA 4.0

10mo ago

matthewsnyder‭

31 17 118 42

Copy Link

Raw

Markdown

History

1 comment thread

Types of crawlers (3 comments)

Communities

Comments on What can be done to block Codidact content from getting used by crawlers/for AI training?

What can be done to block Codidact content from getting used by crawlers/for AI training?

1 comment thread

1 comment thread