Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Comments on Could Codidact provide a data dump?
Post
Could Codidact provide a data dump?
FR: Could Codidact provide a data dump (e.g., an archive with all QA, comments, etc.)? Could be hosted on https://archive.org/ if need to save money.
Were you thinking of something you could request at any time, or a daily/weekly backup available to the public?
Depending on how much of the data you want at once (everything or something narrower like just the comments for a particular question) there may be some overlap with work on the requirements for a Codidact API. If you want to add an answer there too, you're still early as implementing it hasn't started yet.
Not suggesting a duplication - I can imagine both an API and data dump.
Thanks trichoplax, regular backup available to the public
Quarterly would probably be enough at the current scale codidact operates at. Monthly might be nice once post volume increases, but anything more than that is arguably unnecessary regardless of volume, especially considering the computational requirements increase as database size does -- especially for compression, and especially if you can't multithread it. For the record, Wikipedia (who has an insanely good data dump system1) does 1-2 per month with no real upper limit, with one per month as the minimum and two per month as a usual practice.
-
seriously, poke around their data dump site and documentation - they even include logs for each site. This would be absolute overkill for an initial implementation of course, but worth keeping in mind for Some Point:tm: in the future ↩
If we're being futuristic and all, we might even get to include images in the dumps! :O
Good point. I guess even if we had an image data dump, we'd want to make sure that the much smaller version without images is still available for people who don't need the images and have a limited download speed / monthly limit.
trichoplax yes, can release separately, eg https://github.com/Franck-Dernoncourt/Stack-Exchange-Image-Dataset

This community is part of the non-profit Codidact network. We have other communities too — take a look!
You can also join us in chat!
Want to advertise this community? Use our templates!
Like what we're doing? Support us! Donate
1 comment thread