Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Comments on Let's improve how we handle duplicates

Parent

Let's improve how we handle duplicates

+11
−0

Currently, marking a question as a duplicate is part of question closure. Duplicates are a little different from other close reasons, though -- often the question itself is clear, complete, and otherwise solid, but it happens to have been asked before. Question closure can leave people feeling judged (as we learned Somewhere Else), but finding a duplicate should make the asker feel happy -- "we already have an answer for you". I've been wanting to change how we handle duplicates for a while -- the semantics are different, so why should they be part of the same workflow?

Here's a proposal; please provide feedback and help refine it.

Goals

  • Address duplicates as promptly as possible, to get askers to their answers and to reduce effort spent on what turn out to be duplicate answers.

  • Help authors to differentiate their dupe-nominated posts (if they disagree) and expedite resolution when they do.

  • Enable the community to have an ongoing evaluation by collecting all types of feedback including disagreement.

  • As already noted, counter the impression that duplicates are bad.

  • Test some ideas that would apply to closures too (which we also want to improve).

The main ideas

Someone who thinks a top-level post[1] is a duplicate can propose it, including an optional comment with the link. The suggestion is shown on the post and a comment thread is created for discussion. Other people who see this notice can agree, disagree, or propose other duplicate targets. We keep a running tally of votes in both directions, as opposed to going through close/reopen cycles.

The author is given specific editing guidance (or can accept a dupe suggestion). If the author edits in response to the dupe suggestion, and has the Edit ability, we (initially) trust that the edit resolved the issue -- clear the dupe suggestions, record everything in the history, and otherwise reset. Question: To avoid abuse or "dupe wars", should we only do this once (per post)?

If the author doesn't have the Edit ability, then -- while the edit takes effect (you can always edit your own posts), the dupe suggestion remains. People who can review suggested edits see a notice on the post asking them to review the edit and decide if it resolved the duplicate suggestion. If yes, proceed as for the author edit.

If "enough" people (score threshold still TBD[2]) agree that a post is a duplicate, it's marked as such. A duplicate designation can be reversed by the community.

Duplicate identification and resolution is democratized much more than other closures. I propose that anybody with the Participate Generally ability can participate in these votes.

In more detail

The following is taken from the draft specification. That spec also talks a little about closure ("hold"), which isn't very far along and will probably change so please don't focus on it.

Functional specification

Codidact supports duplicate suggestions and hold suggestions. Duplicates are not a type of hold -- the focus of a duplicate is "get to an answer more quickly" and link posts together, while hold is more about closing a question down until problems are addressed. We think the user experience of duplicates can be improved if they're not treated as closures/holds.

Duplicates are, intentionally, more "democratic"; while holds require the Curate ability, anybody with Participate Generally can participate in duplicate resolutions.

Suggesting a duplicate

Anybody with the Participate Generally ability can propose that a top-level post is a duplicate of another top-level post in the same community. (This could be a different category.) This spec also covers "superseded" or other duplicate-like phrasings -- the behavior is the same, even if a community customizes its wording.

To suggest a duplicate, any user (with the ability) can:

  • select the Tools menu under the post
  • select "suggest duplicate" from the menu (move "close" to this menu at the same time to reduce confusion)
  • fill out an in-page form with a required link and an optional comment (the comment can be helpful when it's not obvious why the other question is a duplicate)

Question: Should we disable the option if you have a suggestion pending, i.e. one suggestion per user at a time?

On submission:

  • A "Possible duplicate" comment thread is created or updated. A comment is added with the link and (if provided) additional comment. These comments are attributed (duplicate suggestions are not anonymous).
  • If there are now enough votes for the same duplicate target ("enough" to be defined), the question is marked as a duplicate. The author receives an inbox notification.
  • Otherwise, we display a notice of the suggested dupe, including links to the target and the comment thread, with action buttons (see below).
  • The author and everybody who has already answered the question receive inbox notifications of suggested duplicates.
  • State changes (marking a question as a duplicate or reversing it) are recorded in the post history.

Notice and actions

The notice is something like the following:

This question might be a duplicate of (other title with link) (could be multiple).
Community members provided the following feedback: (comment text that accompanied votes, unsigned here, and link to thread)

The author additionally sees:

Please read the linked question and its answers. If your question is different, you can edit to clarify.

And two buttons: "Yes, it's a duplicate" and "No, I will edit". See "author response" for how these buttons are handled.

Question: Should there be a third option, for "no, I disagree and don't need to edit" (spurious suggestions, etc), which would be treated as an ordinary "disagree" vote?

Everybody else who has Participate Generally sees two buttons next to each suggested duplicate: "agree" and "disagree". Choosing either prompts for a comment to add to the thread (like the initial suggestion).

Question: should each dupe suggestion show the number of suggest + agree / disagree tallies? Or should people who want to know the details have to go to the comment thread?

Answering a possible duplicate

While duplicate suggestions are pending, starting an answer generates a "hey, this might be a duplicate" alert, form and wording to be determined. This serves two purposes: (a) if you know enough to answer the question you probably know enough to contribute to the evaluation of whether it's a duplicate, and (b) you might want to answer that other question instead (or in addition).

Author response

If the author agrees it's a duplicate, the question is so marked (author's vote is binding). A notice is added and "[duplicate]" is added to the title. If there are multiple suggestions, the author selects one or more.

If the author disagrees and begins an edit (either via the button or the usual way):

  • It's the usual edit interface, except that "My question is not a duplicate of (link) because" has been inserted at the bottom and (ideally) the cursor is positioned there. If there's more than one dupe suggestion, do this for each and position cursor at the first.

  • If the author has the Edit ability, when the author submits the edit, the duplicate notice is removed from the question (for all viewers) and this review/resolution is logged in the history. (We can talk about yo-yo cases, where the author keeps rejecting duplicate votes this way, but I think it's something we should consider later. Let's not over-complicate it to start. Perhaps we only allow one author-edit resolution per question.)

  • If the author does not yet have the Edit ability, the duplicate notice remains and is updated to add a message along the lines of "thanks for your edit; the community will review to see if it's not a dupe any more" (not those words). The community sees something like "the author edited this post in response to duplicate suggestions" and, for those who can review edits, an invitation to do so.

Review: problem solved?

Users who can review edits see a notice on the post (similar to the "suggested edit pending" one) that says something like: "This question was suggested as a duplicate of (link) and the author has edited to address the suggestion. (review button)".

Entering the review shows the diff (like for a suggested edit) and includes a link to each suggested duplicate.

The options for the review are "Not a duplicate" and "Still a duplicate".

  • Choosing "still a duplicate" prompts for a comment and is treated like a duplicate vote. If there are multiple duplicate suggestions, the reviewer checks off which ones apply (maybe it's not a dupe of A any more but still is of B).

  • Choosing "not a duplicate" resolves the suggestions -- the question is reset to its "ordinary" state, with the resolution being logged in the post history, and the "possible duplicate" comment thread is archived. (Subsequent duplicate suggestions start over with a new thread.)

Reopening

If a post was marked as a duplicate, everybody sees the duplicate notice. Those with the Participate Everywhere ability also get the "disagree" button, like when duplicate votes are still pending. Here the comment is required -- explain why the duplicate status should be removed. The comment is added to a "possibly not a duplicate" thread. The duplicate notice is updated to add something like:

This question might have been incorrectly marked as a duplicate. Community members provided the following feedback: (comment text, link to thread).

Unaddressed issues

  • Retracting votes
  • Third-party edit from someone trying to help -- how does that affect the flow?
  • Vote threshold

  1. Usually questions, but there's no reason an article couldn't be a duplicate. A community that uses articles for sandboxing could mark those as duplicates of the resulting questions, clearly signaling that the sandbox phase is done and linking to the live question. ↩︎

  2. I think the score threshold -- the net score to mark a duplicate -- should be relatively low, 2 or 3. It should also be a community setting. ↩︎

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

How does reopening work? (2 comments)
Conflict between overview and functional specification (3 comments)
Post
+7
−1

As a momentary foreword, I would like to express my appreciation to you and all of the members of the community which have worked to make this site possible. I wish it the best in its growth and development, and hope to help as a part of it. Now, here's my proposal on this topic:

Duplicate proposals should be separated from closure-as-duplicate voting.

I think you've made a very good point about question closure (and the accompanying feeling of judgement—or, as I would put it, dismissal, perhaps even exclusion), and I think this insight should be applied to duplicates a step further than you suggest.

Again, as you mention, a question being marked as a duplicate (of an answered question, at least) should be a good experience for the question author if all goes well, who should then gladly close the question as a duplicate. However, I think there's a fundamental problem with a single proposal-closure duplicate system: the tendency of users viewing a question marked as a duplicate is to judge whether the question as asked is a duplicate for themselves. There's the added issue of the voting process starting immediately, not necessarily giving the author time to review the question before it's closed—and once that happens, it can easily become a fight against the perception that a question is closed for good reason.

In either case, there is an important step which is being missed in a system which treats the question, not the author: the primary purpose of this site is to answer the questions of its users, whether directly posed or through the provision of existing answered questions, and in the case of a disputed duplicate proposal, the author feels that the linked question does not answer the question the author is trying to ask. This mismatch is key to resolving good-faith duplicate disputes, and should be the primary focus of a user viewing a question with a duplicate suggestion that has not yet been closed.

This might seem like a benign issue, in which I'm putting too little faith on the users judging the question duplicity, but there's a deeper principle here: we should be maximising the ease of use for individuals acting in good faith (i.e. trying their best to follow its rules and principles), but an author who feels that a question has been wrongfully closed as a duplicate is quickly put in an uphill battle, being required to convince and get the attention of enough people to reach a position where the question that the author is trying to ask can be asked instead. This can be exhausting, and worse, can drive users away from trying to ask their questions, and I think there should be some safeguard against this kind of situation.

My proposal for the procedure of duplicate resolution, in full, then, is this:

  1. A user marks the question as a possible duplicate, linking to the question(s) it is believed to be a duplicate of.[1] An explanation of why the question is thought to be a duplicate should be provided such that the author has immediately actionable information.
  2. The author has 72 hours to respond to the proposal, either by accepting the proposal (closing the question as a duplicate), or by disputing the duplicate.[2] Any activity on the question by the author reduces the window to 1 hour, since its purpose is to give the author time to edit the question or otherwise address the proposal.[3]
  3. While a question is proposed as a duplicate, it is marked as such to direct user attention to comments where the proposal is hopefully being discussed between disputer, author, and other involved users. This period is designed to help the author and community resolve the misalignemnt between (author) intent and (community) perception of the question.
  4. Once a duplicate proposal is disputed, or the window expires[4], voting opens for closure on the basis of being a duplicate question. The question is closed as a duplicate if the net vote passes a certain positive threshold determined by the proportion of votes (e.g. +25%; i.e. at least 5:4 pro:versus[5]), with some minimum (e.g. +3 votes net) for small questions. This is important on its own (and something which I don't think I've seen discussed) because the judgement of a question should scale correctly with the popularity of the question: proportional judgement is how we do things in a democratic society, except the threshold is usually 50:50 (or 51:49)! This should also mean that cycles in closure and reopening will be less likely.
  5. As in your suggested procedure, a user (including the author) can vote to reopen a question, starting a new vote, and this vote may have a different threshold—perhaps the inverse of the threshold to close. A user with the Edit ability may reopen a question automatically with an edit of the question (presumably with the intent to differentiate it from the linked (duplicate) question). Each time a question is reopened, the window for the author to respond and the threshold to close the question (again) decrease slightly.[6] This prevents questions from being endlessly reopened, and means that an author should still be careful about the edits made before reopening the question, even with the ability to do so without approval.

I believe this proposal satisifes (almost[7]) all of the goals proposed in the question:

  • The author is notified of a duplicate proposal, which can be made as soon as the question is posted. While a duplicate proposal is awaiting resolution, a notice informs users viewing the question of the possible duplicate, meaning that they can check the linked question(s) and each make an assessment before spending any (potentially redundant) time and effort answering the question: the question should only receive answers from users who believe it not to be a duplicate.
  • Authors are provided with a question to check for duplicity (and answers to their original questions), and so can immediately make the assessment of whether to approve or dispute the duplicate proposal.
  • Feedback of either kind is collected accordingly via suggestions in comments or even other duplicate proposals.
  • Presenting duplicate proposals as something which authors are at their leisure to make the choice to accept, I believe, would do a massive deal of good for the way that this would feel. An author can also feel safer knowing that the question has a period of time in which it can be judged by a reasonable sample of people before it is threatened with closure by a small sample size.

Addendum - a personal note

I understand and appreciate that this proposal may be seen as too extreme. I think that it's appropriate for me to mention that part of the reason I wanted to write this answer is because I had a pretty unpleasant time recently at Somewhere Else regarding this topic (and closure in general), and I felt, quite frankly, bullied and powerless in the way that the system was set up. I would like to ask only that you consider what cost this proposal (in its whole or as its separable components, if any) would have to the good operation of this site.

Thank you again for your role in this site, Monica. I hope that my first post on this site is received well. :)


  1. As a note, I posit that in a well-functioning duplicate resolution system, there should only be one duplicate to link: if a question is sufficiently answered by two different questions' answers, then either one of those questions should be marked as a duplicate of the other, or the question is sufficiently different to warrant its own answers tailored for the question. ↩︎

  2. Whether an edit of the question should be required is a detail for debate. I think this should be a decision made by the author in response to the assessment of the validity of the duplicate suggestion: if the question is then voted for closure in its unedited state, the author suffers the consequences for that decision, but nothing is broken in the system. ↩︎

  3. This is intended as a reasonable safeguard against abusing this window. If this suggestion is seen as reasonable, I propose that an author should ideally be warned that an action will trigger this mechanism. ↩︎

  4. One might argue that the question should instead be automatically closed as a duplicate in the case of no response from the author. However, I think that we lose very little as a community by giving the most favourable treatment of the author of a disputed question in this system. If users are participating properly, then the duplicate proposal should be naturally approved in good time by visiting users anyway, and I think that covering the edge case of an author who is unavailable for more than three days after posting a question is worth more than the cost of having a duplicate question up for as much longer as it takes for the community to close it by concensus. ↩︎

  5. To be clear about what I mean here for completeness, I mean that for a voting ratio of x:y (x pro, y versus), a threshold of +25% means that x >= 1.25 y, that is, votes for should be 25% greater than the votes versus. I realise this might be a pretty unconventional way of doing it, so perhaps defining a threshold directly (by ratio or otherwise) would be preferred, and I apologise for this eccentricity. ↩︎

  6. For the response window, I'd suggest a reduction of −12 hours per reopen suggestion, to a minimum of 32, which covers one day plus a reasonable window for difference in activity time during the day. For the threshold, I think that −5% is a reasonable step, to a minimum of +5%. A question which has hit this minimum threshold can no longer be opened without approval by a user with the Edit ability. ↩︎

  7. The last bullet point is a little vague, so I hope that not satisfying it directly doesn't detract from the quality of this suggestion. However, I believe there could be merit in having a grace window before a question can (first) be closed in general, in order to give the question time to be seen (and answered) by a proper number of users. ↩︎

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

Lots to like here, thanks! (11 comments)
Lots to like here, thanks!
Monica Cellio‭ wrote over 2 years ago

Thank you for this thoughtful feedback. I like the ideas you're suggesting here. One of the reasons I wanted to separate duplicate evaluation from other closure (and not even call dupes "closed") is the negative feelings that closure can carry. We certainly don't want askers to feel bullied as sometimes happens Somewhere Else. Will things always be perfect? No, of course not. Can they be a lot better? I think (and hope) so. I hadn't thought of a time-based window before, and I do see how that would reduce tension. Thank you again.

Fie‭ wrote over 2 years ago

I'm elated to have received such a positive response. It really makes me aware of how I've been feeling, and makes me greatful for the environment that's being cultivated here. Very glad that you think this helpful—I hope that we get some more feedback to try to refine these ideas and figure out the most 'right' formula that we can manage here.

Fie‭ wrote over 2 years ago · edited over 2 years ago

As a note to your first point, an initial draft of this post had a different approach, and did indeed stray more on the side of separating duplicates from closure. If people are interested in that premise, then something I think might be worth considering is differentiating between 'soft' and 'hard' duplicates—the former being questions which share answers, but differ in the content and context of their asking in a way which provides information which is valuable to preserve or offer to an answer-seeker—consider searching for a particular phrasing of a question and struggling to find the 'canonical' answer which has a different phrasing or context—and trying to 'group' or otherwise structure a 'familiy' of answers on the site around a single answer pool. This way, we could try to funnel different approaches to the same core problem into the same (or similar) answer pool. Again, just food for thought—I see the problems here—but I think there might be a good idea lurking in there.

Fie‭ wrote over 2 years ago

One final thought is that the nature of the issue behind duplicate resolution is a very 'big picture' problem: questions which are asked more specifically than necessary can end up not leaving space for a more general question-answer which would help people who might want to ask a different (also too-specific) question, such that the two questions are seen as duplicates while they both address different contexts of the same (shared) problem or misunderstanding (and likely have detail-incompatible answers for their different details). My point is that this is something which naturally happens in the absence of consideration and almost concerted effort by users to try to ask the 'right' kinds of questions: this might be something we should stress more to new users (although I believe there's a similar sentiment already expressed somewhere in the various Q&A guidelines). I don't think this would make everything perfect—just another thought...not to be sharing too many of those. :)

Monica Cellio‭ wrote over 2 years ago

Your comments make me think of "related questions" or, if that sounds too broad, "question clusters". If people interested in question A probably also want to see question B, we should have some way (that's less ad-hoc than reading comment threads) to create those connections.

I'm not thinking of duplicates as a type of closure at all, though one effect would be similar (no new answers here, because you can go there instead). I want to move them out from under the "close" button entirely. With your idea of soft duplicates or related questions, that'd be even more important -- "soft" / related questions would still accept answers (but also direct your attention to the others), so that's clearly not closure at all.

Fie‭ wrote over 2 years ago

I like it! How about 'similar questions'? That way, we can keep 'related questions' for what we're used to: questions on a related (but not the same) topic; whereas similar questions are...well, similar questions around the same topic. Naming aside, I think you're bang on—this is an important enough mechanism and dynamic that there should be structure built around supporting it.

I'm not so sure that duplicate-related closure should be abolished (or rather, absent) entirely—but it should certainly be reserved for those select cases where the same question has been asked, i.e. in cases demonstrating low research effort on the part of the author. These are definitely the kind of questions that we should be wanting to 'prune' as unhelpful to the body of informational resources which we're trying to cultivate. This to me is the original intention of duplicate closure as a concept, and what we should be trying to restrict as much as possible without removing it entirely.

Fie‭ wrote over 2 years ago · edited over 2 years ago

Perhaps one way of achieving what you're talking about—and something I'd agree with—is with the use of a closure reason "low research effort/not good question", or similar: the 'implied' closure-by-duplicate for the open-and-shut cases, but with a different name, and separating the sentiments of the two as much as possible. 'Soft' duplicates are then the only thing which we actually call duplicates (in a strange turn)—or perhaps 'related' instead, working under this 'clustering' system. How does that sound?

Monica Cellio‭ wrote over 2 years ago

"Similar" (nice suggestion) and "exact duplicate" seem like useful terms. "Similar" questions would remain answerable; exact duplicates wouldn't. We probably want them to share some workflow; I expect a lot of nominations to start as one and end up as the other. So maybe we present a list of questions that have been suggested as similar, and there's a way to say "that's more than similar, that's an exact dupe" or "yes similar, no not a dupe" (or, for completeness, "no, not similar")? Thinking out loud here -- I'm not sure what the best way is to present this to community members making suggestions, authors, and community members voting for or against.

Fie‭ wrote over 2 years ago

That sounds about right. I think that the term 'repeat[ed] question' could be useful (to replace '[exact ]duplicate')—both to distance from the established expectations, and because I think it better carries the subtle meaning that this question is repeating another (which is more obviously undesirable), compared to a similar question, which is just that—and harder to conflate. A list of related questions based on keywords (as in Somewhere Else while drafting a question), if that functionality could eventually be added here, could be shown both while writing a question and shown on a side-bar, perhaps? While that may be a little 'noisy', or otherwise undesirable, something like that is I think how we'd present such information (if at all—rather than leave it entirely to users who have seen similar questions before). We could have a very neutral "Something wrong with this question?"-type button in which the options to suggest similar/repeated question reside.

Fie‭ wrote over 2 years ago

As for questions marked as either, I think that the notice/banner above a question works perfectly well—especially if there's a grace period on the voting, since it only serves to guide users to making the appropriate assessment. I think that it's especially sensible to show the suggestion immediately (rather than wait until voting opens) so that users have that learnt response that such a notice is to provoke their consideration, rather than prompt a decision or (voting-based) input. How does that sound?

Monica Cellio‭ wrote over 2 years ago

I took your proposal and these comments and Lundin's answer and posted an updated proposal.