Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Proposal: tool for user-requested import of a single question and its answers from SE

+8
−1

We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations:

  • Running it requires direct DB access.
  • The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
  • Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
  • It's resource-intensive.
  • And did I mention that it requires direct DB access, meaning only a very small number of people can do it?

In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation). Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all here, is counter-productive. This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20. Even those 20 caused enough problems that we haven't done a second run.

I'd like to have a tool that imports one question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited). People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here. These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.1

When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:

  • The question and all answers should be brought over. We don't care about comments or full edit history. (We can add a link to the edit history on SE to our post history but needn't recreate all those events here.)
  • For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner. This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
  • For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, and add the "imported" notice with attribution/license info. (See examples on any of the communities I've mentioned that have imports.)
  • If the import is immediate, take the requestor to the imported post page. If the import is not immediate, notify the requestor with a link when it's done. (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
  • Notify native (not placeholder) Codidact users when posts of theirs are imported. (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind. Maybe they want to make edits. Maybe they forgot about that old wrong answer and want to delete it. Etc.)

I'm not proposing to bump the post. If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.

TopAnswers has a tool like this. While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we2 could look at their implementation to see how they handle things like that data corruption or staying within API limits. I don't remember if their imports are immediate or queued or, if queued, if there's a notification.

  1. If you see a question Somewhere Else that has no good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly). Imports should be for cases where there are good answers but we want to make improvements or otherwise use them here.

  2. For values of "we" that include PHP fluency, sadly not a set that includes me.

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

1 comment thread

General comments (6 comments)

2 answers

+3
−0

We don't care about /.../ edit history.

I'm going to disagree here, actually. Edits can be substantial in their own right, introducing new content from users other than the original author of the post. (The golden rule for edits Somewhere Else is to always preserve and respect author intent, but there have been occasions when this still meant pretty much rewriting the post from scratch.) While indeed this isn't attributed to each individual user in the current revision view of a post (which would get messy real fast), it is attributed in the post history to the user who contributed each piece, and that content is normally available only under an attribution-required license. (Some users do things like write in their profile "my contributions can be used under License X", but there's no standard way to indicate that the way there is on Codidact where each individual post has an explicit, specific license associated with it.)

For that reason, while we might not care about revision summaries or timestamps or the fact that users might have changed their display names since (the current one is probably good enough for them), to the extent that the information is available, I think we should aim to preserve information about at least the edit sequence and which user made each edit.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

General comments (1 comment)
+1
−2

If you create such a feature, at least make it optional per site with the default off. I would be strongly opposed to any direct import from anywhere else to the Electrical Engineering site, for example.

There is really no necessity for imports. If a user wants to bring over one of their own question, then just bring over the question and let others answer it anew here. However, the question shouldn't be just copied anyway. Since now it is better known what information is relevant, the question should be edited to be more to the point. There is always something that can be made better.

If a user wants to bring over their own answer, they should have not problem writing a targeted question in their own words. Then as above, another edit pass can be applied to the answer.

As you said above, bulk imports have "not served us well". Importing only selected posts individually doesn't really address all the problems.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

General comments (1 comment)

Sign up to answer this question »