Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Post History

76%
+8 −1
Q&A Proposal: tool for user-requested import of a single question and its answers from SE

We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations: Running it requires direct DB access. The resulting data is sometimes wrong in various ways -...

2 answers  ·  posted 3y ago by Monica Cellio‭  ·  last activity 3y ago by Olin Lathrop‭

#2: Post edited by user avatar Monica Cellio‭ · 2020-10-07T13:53:36Z (over 3 years ago)
responding to an answer about edit history
  • We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations:
  • - Running it requires direct DB access.
  • - The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
  • - Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
  • - It's resource-intensive.
  • - And did I mention that it requires direct DB access, meaning only a very small number of people can do it?
  • In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation). Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all *here*, is counter-productive. This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20. Even those 20 caused enough problems that we haven't done a second run.
  • I'd like to have a tool that imports *one* question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited). People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here. These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.[^1]
  • When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:
  • - The question and all answers should be brought over. We don't care about comments or edit history.
  • - For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner. This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
  • - For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, *and* add the "imported" notice with attribution/license info. (See examples on any of the communities I've mentioned that have imports.)
  • - If the import is immediate, take the requestor to the imported post page. If the import is not immediate, notify the requestor with a link when it's done. (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
  • - Notify native (not placeholder) Codidact users when posts of theirs are imported. (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind. Maybe they want to make edits. Maybe they forgot about that old wrong answer and want to delete it. Etc.)
  • I'm not proposing to bump the post. If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.
  • TopAnswers has a tool like this. While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we[^2] *could* look at their implementation to see how they handle things like that data corruption or staying within API limits. I don't remember if their imports are immediate or queued or, if queued, if there's a notification.
  • [^1]: If you see a question Somewhere Else that has *no* good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly). Imports should be for cases where there are good answers but we want to make improvements or otherwise *use* them here.
  • [^2]: For values of "we" that include PHP fluency, sadly not a set that includes me.
  • We have a tool for doing bulk imports of questions (and answers) from SE. It has several limitations:
  • - Running it requires direct DB access.
  • - The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
  • - Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
  • - It's resource-intensive.
  • - And did I mention that it requires direct DB access, meaning only a very small number of people can do it?
  • In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation). Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all *here*, is counter-productive. This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20. Even those 20 caused enough problems that we haven't done a second run.
  • I'd like to have a tool that imports *one* question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited). People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here. These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.[^1]
  • When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:
  • - The question and all answers should be brought over. We don't care about comments or full edit history. (We can add a link to the edit history on SE to our post history but needn't recreate all those events here.)
  • - For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner. This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
  • - For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, *and* add the "imported" notice with attribution/license info. (See examples on any of the communities I've mentioned that have imports.)
  • - If the import is immediate, take the requestor to the imported post page. If the import is not immediate, notify the requestor with a link when it's done. (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
  • - Notify native (not placeholder) Codidact users when posts of theirs are imported. (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind. Maybe they want to make edits. Maybe they forgot about that old wrong answer and want to delete it. Etc.)
  • I'm not proposing to bump the post. If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.
  • TopAnswers has a tool like this. While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we[^2] *could* look at their implementation to see how they handle things like that data corruption or staying within API limits. I don't remember if their imports are immediate or queued or, if queued, if there's a notification.
  • [^1]: If you see a question Somewhere Else that has *no* good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly). Imports should be for cases where there are good answers but we want to make improvements or otherwise *use* them here.
  • [^2]: For values of "we" that include PHP fluency, sadly not a set that includes me.
#1: Initial revision by user avatar Monica Cellio‭ · 2020-10-07T00:34:02Z (over 3 years ago)
Proposal: tool for user-requested import of a single question and its answers from SE
We have a tool for doing bulk imports of questions (and answers) from SE.  It has several limitations:
- Running it requires direct DB access.
- The resulting data is sometimes wrong in various ways -- duplicates, answers not getting wired up to the right questions, something that blocks voting sometimes until something gets tweaked in the DB (I don't know why), probably other things.
- Special characters/encodings tend to get messed up; all the Hebrew on Judaism and at least some of the MathJax on Scientific Speculation had to be hand-corrected.
- It's resource-intensive.
- And did I mention that it requires direct DB access, meaning only a very small number of people can do it?

In addition, we've found that bulk imports have not served us well on the communities that used them (Writing, Outdoors, Scientific Speculation).  Many of us feel that bringing in a big pile of Q&A from SE, when there's no specific request and the community isn't prepared to moderate it all *here*, is counter-productive.  This is why, on Judaism, we asked people to make specific requests, of which we've processed about 20.  Even those 20 caused enough problems that we haven't done a second run.

I'd like to have a tool that imports *one* question and its answers, that users can directly invoke (perhaps gated by an ability and almost certainly rate-limited).  People might do this to bring over their own work, or to add an answer here, or because it could then be a target for duplicates for questions that have been asked here.  These are good use cases that are different from the use cases that drove the creation of the bulk-import tool.[^1]

When a user requests an import (through some tool into which the user drops an SE URL), the following things should happen:

- The question and all answers should be brought over.  We don't care about comments or edit history.
- For any post (Q or A) where the SE account is associated with an existing Codidact user, make that user the owner.  This applies to both "native" users (accounts that a human created) and placeholder users created from previous imports.
- For any post where the SE account is not associated with an existing Codidact user, create a placeholder user, make the user the owner, *and* add the "imported" notice with attribution/license info.  (See examples on any of the communities I've mentioned that have imports.)
- If the import is immediate, take the requestor to the imported post page.  If the import is not immediate, notify the requestor with a link when it's done.  (It's fine if imports aren't immediate for API-budget reasons, but I assume it would happen within several hours.)
- Notify native (not placeholder) Codidact users when posts of theirs are imported.  (It's not nice to silently attribute stuff to people, even if it's their work and they probably don't mind.  Maybe they want to make edits.  Maybe they forgot about that old wrong answer and want to delete it.  Etc.)

I'm not proposing to bump the post.  If the people involved get notified and anybody edits anything it will get bumped then; otherwise, there's probably no reason to move older posts to the top of the list.

TopAnswers has a tool like this.  While our implementation languages, tech stacks, and database schemas are different, so there's little that could be reused, we[^2] *could* look at their implementation to see how they handle things like that data corruption or staying within API limits.  I don't remember if their imports are immediate or queued or, if queued, if there's a notification.


[^1]: If you see a question Somewhere Else that has *no* good answers, and you want to answer it here, then it's better to just re-ask the question here (or if it's yours, copy it directly).  Imports should be for cases where there are good answers but we want to make improvements or otherwise *use* them here.

[^2]: For values of "we" that include PHP fluency, sadly not a set that includes me.