Notifications
Sign Up Sign In
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Proposal: tool for user-requested import of a single question and its answers from SE

+6
−1

We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations:

  • Running it requires direct DB access.
  • The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
  • Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
  • It's resource-intensive.
  • And did I mention that it requires direct DB access, meaning only a very small number of people can do it?

In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation). Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all here, is counter-productive. This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20. Even those 20 caused enough problems that we haven't done a second run.

I'd like to have a tool that imports one question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited). People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here. These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.1

When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:

  • The question and all answers should be brought over. We don't care about comments or full edit history. (We can add a link to the edit history on SE to our post history but needn't recreate all those events here.)
  • For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner. This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
  • For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, and add the "imported" notice with attribution/license info. (See examples on any of the communities I've mentioned that have imports.)
  • If the import is immediate, take the requestor to the imported post page. If the import is not immediate, notify the requestor with a link when it's done. (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
  • Notify native (not placeholder) Codidact users when posts of theirs are imported. (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind. Maybe they want to make edits. Maybe they forgot about that old wrong answer and want to delete it. Etc.)

I'm not proposing to bump the post. If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.

TopAnswers has a tool like this. While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we2 could look at their implementation to see how they handle things like that data corruption or staying within API limits. I don't remember if their imports are immediate or queued or, if queued, if there's a notification.

  1. If you see a question Somewhere Else that has no good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly). Imports should be for cases where there are good answers but we want to make improvements or otherwise use them here.

  2. For values of "we" that include PHP fluency, sadly not a set that includes me.

Why should this post be closed?

6 comments

I have the PHP fluency, just don't have the time to do this. But I do think it is a very good idea. In particular it could help for people who want to move from different sites - e.g., there are several different sites that might match to Software Development. ‭manassehkatz‭ 17 days ago

While this might be useful, I'd still be very restrictive with what to import. I've been re-reading through various canonical posts on SE that I've been used as duplicate targets there for years. And found that most of them are not actually that good, suffering from "fragmented answers" where answer A says one good thing and answer B says another, but they ought to be merged in order to actually create a really good answer. -> ‭Lundin‭ 16 days ago

So a complete re-write of these canonical "patchworks" would be ideal, not just importing them out of habit. Codidact offers a fresh start to do just that. Also, some of the old canonical posts might be outdated, particularly when it comes to tech communities. ‭Lundin‭ 16 days ago

@Lundin agreed; people need to be judicious in using the tool. Sometimes some of the answers are good (or good starting points for edits) and others should be deleted, which the community could do here but can't do there. ‭Monica Cellio‭ 16 days ago

@Monica Cellio But because of licensing, we can't really edit them to shape during import, right? Like merging two answers into one. It's either grab the answer(s) as-is or leave it be? ‭Lundin‭ 16 days ago

Show 1 more comments

2 answers

+3
−0

We don't care about /.../ edit history.

I'm going to disagree here, actually. Edits can be substantial in their own right, introducing new content from users other than the original author of the post. (The golden rule for edits Somewhere Else is to always preserve and respect author intent, but there have been occasions when this still meant pretty much rewriting the post from scratch.) While indeed this isn't attributed to each individual user in the current revision view of a post (which would get messy real fast), it is attributed in the post history to the user who contributed each piece, and that content is normally available only under an attribution-required license. (Some users do things like write in their profile "my contributions can be used under License X", but there's no standard way to indicate that the way there is on Codidact where each individual post has an explicit, specific license associated with it.)

For that reason, while we might not care about revision summaries or timestamps or the fact that users might have changed their display names since (the current one is probably good enough for them), to the extent that the information is available, I think we should aim to preserve information about at least the edit sequence and which user made each edit.

1 comment

Oh, good point. I said that because the license doesn't require importing the entire edit history. And maybe we don't want to recreate the full edit history here, but for imported posts we could include the original edit history as a link in the post history. That way the information is available but we don't have to recreate it here (and deal with the fact that many users there won't be here, so we'd have to special-case that). Updated. ‭Monica Cellio‭ 16 days ago

+0
−2

If you create such a feature, at least make it optional per site with the default off. I would be strongly opposed to any direct import from anywhere else to the Electrical Engineering site, for example.

There is really no necessity for imports. If a user wants to bring over one of their own question, then just bring over the question and let others answer it anew here. However, the question shouldn't be just copied anyway. Since now it is better known what information is relevant, the question should be edited to be more to the point. There is always something that can be made better.

If a user wants to bring over their own answer, they should have not problem writing a targeted question in their own words. Then as above, another edit pass can be applied to the answer.

As you said above, bulk imports have "not served us well". Importing only selected posts individually doesn't really address all the problems.

1 comment

This would be controlled per-community, yes. A community that doesn't want imports wouldn't have to allow them, but some communities do want imports and this would let them have them. ‭Monica Cellio‭ 16 days ago

Sign up to answer this question »