Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Comments on Proposal: tool for user-requested import of a single question and its answers from SE
Parent
Proposal: tool for user-requested import of a single question and its answers from SE
We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations:
- Running it requires direct DB access.
- The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
- Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
- It's resource-intensive.
- And did I mention that it requires direct DB access, meaning only a very small number of people can do it?
In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation). Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all here, is counter-productive. This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20. Even those 20 caused enough problems that we haven't done a second run.
I'd like to have a tool that imports one question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited). People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here. These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.1
When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:
- The question and all answers should be brought over. We don't care about comments or full edit history. (We can add a link to the edit history on SE to our post history but needn't recreate all those events here.)
- For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner. This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
- For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, and add the "imported" notice with attribution/license info. (See examples on any of the communities I've mentioned that have imports.)
- If the import is immediate, take the requestor to the imported post page. If the import is not immediate, notify the requestor with a link when it's done. (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
- Notify native (not placeholder) Codidact users when posts of theirs are imported. (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind. Maybe they want to make edits. Maybe they forgot about that old wrong answer and want to delete it. Etc.)
I'm not proposing to bump the post. If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.
TopAnswers has a tool like this. While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we2 could look at their implementation to see how they handle things like that data corruption or staying within API limits. I don't remember if their imports are immediate or queued or, if queued, if there's a notification.
-
If you see a question Somewhere Else that has no good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly). Imports should be for cases where there are good answers but we want to make improvements or otherwise use them here. ↩
-
For values of "we" that include PHP fluency, sadly not a set that includes me. ↩
Post
We don't care about /.../ edit history.
I'm going to disagree here, actually. Edits can be substantial in their own right, introducing new content from users other than the original author of the post. (The golden rule for edits Somewhere Else is to always preserve and respect author intent, but there have been occasions when this still meant pretty much rewriting the post from scratch.) While indeed this isn't attributed to each individual user in the current revision view of a post (which would get messy real fast), it is attributed in the post history to the user who contributed each piece, and that content is normally available only under an attribution-required license. (Some users do things like write in their profile "my contributions can be used under License X", but there's no standard way to indicate that the way there is on Codidact where each individual post has an explicit, specific license associated with it.)
For that reason, while we might not care about revision summaries or timestamps or the fact that users might have changed their display names since (the current one is probably good enough for them), to the extent that the information is available, I think we should aim to preserve information about at least the edit sequence and which user made each edit.
1 comment thread