Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Let's improve how we handle duplicates
Currently, marking a question as a duplicate is part of question closure. Duplicates are a little different from other close reasons, though -- often the question itself is clear, complete, and otherwise solid, but it happens to have been asked before. Question closure can leave people feeling judged (as we learned Somewhere Else), but finding a duplicate should make the asker feel happy -- "we already have an answer for you". I've been wanting to change how we handle duplicates for a while -- the semantics are different, so why should they be part of the same workflow?
Here's a proposal; please provide feedback and help refine it.
Goals
-
Address duplicates as promptly as possible, to get askers to their answers and to reduce effort spent on what turn out to be duplicate answers.
-
Help authors to differentiate their dupe-nominated posts (if they disagree) and expedite resolution when they do.
-
Enable the community to have an ongoing evaluation by collecting all types of feedback including disagreement.
-
As already noted, counter the impression that duplicates are bad.
-
Test some ideas that would apply to closures too (which we also want to improve).
The main ideas
Someone who thinks a top-level post[1] is a duplicate can propose it, including an optional comment with the link. The suggestion is shown on the post and a comment thread is created for discussion. Other people who see this notice can agree, disagree, or propose other duplicate targets. We keep a running tally of votes in both directions, as opposed to going through close/reopen cycles.
The author is given specific editing guidance (or can accept a dupe suggestion). If the author edits in response to the dupe suggestion, and has the Edit ability, we (initially) trust that the edit resolved the issue -- clear the dupe suggestions, record everything in the history, and otherwise reset. Question: To avoid abuse or "dupe wars", should we only do this once (per post)?
If the author doesn't have the Edit ability, then -- while the edit takes effect (you can always edit your own posts), the dupe suggestion remains. People who can review suggested edits see a notice on the post asking them to review the edit and decide if it resolved the duplicate suggestion. If yes, proceed as for the author edit.
If "enough" people (score threshold still TBD[2]) agree that a post is a duplicate, it's marked as such. A duplicate designation can be reversed by the community.
Duplicate identification and resolution is democratized much more than other closures. I propose that anybody with the Participate Generally ability can participate in these votes.
In more detail
The following is taken from the draft specification. That spec also talks a little about closure ("hold"), which isn't very far along and will probably change so please don't focus on it.
Functional specification
Codidact supports duplicate suggestions and hold suggestions. Duplicates are not a type of hold -- the focus of a duplicate is "get to an answer more quickly" and link posts together, while hold is more about closing a question down until problems are addressed. We think the user experience of duplicates can be improved if they're not treated as closures/holds.
Duplicates are, intentionally, more "democratic"; while holds require the Curate ability, anybody with Participate Generally can participate in duplicate resolutions.
Suggesting a duplicate
Anybody with the Participate Generally ability can propose that a top-level post is a duplicate of another top-level post in the same community. (This could be a different category.) This spec also covers "superseded" or other duplicate-like phrasings -- the behavior is the same, even if a community customizes its wording.
To suggest a duplicate, any user (with the ability) can:
- select the Tools menu under the post
- select "suggest duplicate" from the menu (move "close" to this menu at the same time to reduce confusion)
- fill out an in-page form with a required link and an optional comment (the comment can be helpful when it's not obvious why the other question is a duplicate)
Question: Should we disable the option if you have a suggestion pending, i.e. one suggestion per user at a time?
On submission:
- A "Possible duplicate" comment thread is created or updated. A comment is added with the link and (if provided) additional comment. These comments are attributed (duplicate suggestions are not anonymous).
- If there are now enough votes for the same duplicate target ("enough" to be defined), the question is marked as a duplicate. The author receives an inbox notification.
- Otherwise, we display a notice of the suggested dupe, including links to the target and the comment thread, with action buttons (see below).
- The author and everybody who has already answered the question receive inbox notifications of suggested duplicates.
- State changes (marking a question as a duplicate or reversing it) are recorded in the post history.
Notice and actions
The notice is something like the following:
This question might be a duplicate of (other title with link) (could be multiple).
Community members provided the following feedback: (comment text that accompanied votes, unsigned here, and link to thread)
The author additionally sees:
Please read the linked question and its answers. If your question is different, you can edit to clarify.
And two buttons: "Yes, it's a duplicate" and "No, I will edit". See "author response" for how these buttons are handled.
Question: Should there be a third option, for "no, I disagree and don't need to edit" (spurious suggestions, etc), which would be treated as an ordinary "disagree" vote?
Everybody else who has Participate Generally sees two buttons next to each suggested duplicate: "agree" and "disagree". Choosing either prompts for a comment to add to the thread (like the initial suggestion).
Question: should each dupe suggestion show the number of suggest + agree / disagree tallies? Or should people who want to know the details have to go to the comment thread?
Answering a possible duplicate
While duplicate suggestions are pending, starting an answer generates a "hey, this might be a duplicate" alert, form and wording to be determined. This serves two purposes: (a) if you know enough to answer the question you probably know enough to contribute to the evaluation of whether it's a duplicate, and (b) you might want to answer that other question instead (or in addition).
Author response
If the author agrees it's a duplicate, the question is so marked (author's vote is binding). A notice is added and "[duplicate]" is added to the title. If there are multiple suggestions, the author selects one or more.
If the author disagrees and begins an edit (either via the button or the usual way):
-
It's the usual edit interface, except that "My question is not a duplicate of (link) because" has been inserted at the bottom and (ideally) the cursor is positioned there. If there's more than one dupe suggestion, do this for each and position cursor at the first.
-
If the author has the Edit ability, when the author submits the edit, the duplicate notice is removed from the question (for all viewers) and this review/resolution is logged in the history. (We can talk about yo-yo cases, where the author keeps rejecting duplicate votes this way, but I think it's something we should consider later. Let's not over-complicate it to start. Perhaps we only allow one author-edit resolution per question.)
-
If the author does not yet have the Edit ability, the duplicate notice remains and is updated to add a message along the lines of "thanks for your edit; the community will review to see if it's not a dupe any more" (not those words). The community sees something like "the author edited this post in response to duplicate suggestions" and, for those who can review edits, an invitation to do so.
Review: problem solved?
Users who can review edits see a notice on the post (similar to the "suggested edit pending" one) that says something like: "This question was suggested as a duplicate of (link) and the author has edited to address the suggestion. (review button)".
Entering the review shows the diff (like for a suggested edit) and includes a link to each suggested duplicate.
The options for the review are "Not a duplicate" and "Still a duplicate".
-
Choosing "still a duplicate" prompts for a comment and is treated like a duplicate vote. If there are multiple duplicate suggestions, the reviewer checks off which ones apply (maybe it's not a dupe of A any more but still is of B).
-
Choosing "not a duplicate" resolves the suggestions -- the question is reset to its "ordinary" state, with the resolution being logged in the post history, and the "possible duplicate" comment thread is archived. (Subsequent duplicate suggestions start over with a new thread.)
Reopening
If a post was marked as a duplicate, everybody sees the duplicate notice. Those with the Participate Everywhere ability also get the "disagree" button, like when duplicate votes are still pending. Here the comment is required -- explain why the duplicate status should be removed. The comment is added to a "possibly not a duplicate" thread. The duplicate notice is updated to add something like:
This question might have been incorrectly marked as a duplicate. Community members provided the following feedback: (comment text, link to thread).
Unaddressed issues
- Retracting votes
- Third-party edit from someone trying to help -- how does that affect the flow?
- Vote threshold
-
Usually questions, but there's no reason an article couldn't be a duplicate. A community that uses articles for sandboxing could mark those as duplicates of the resulting questions, clearly signaling that the sandbox phase is done and linking to the live question. ↩︎
-
I think the score threshold -- the net score to mark a duplicate -- should be relatively low, 2 or 3. It should also be a community setting. ↩︎
> Question closure can leave people feeling judged (as we learned Somewhere Else) > > ...counter the impression that …
3y ago
As a momentary foreword, I would like to express my appreciation to you and all of the members of the community which ha …
3y ago
Update based on feedback I really like the ideas proposed in this answer, which at the time of this posting has the s …
3y ago
Psychology In sum: I think the social problems here have more to do with communication than they do with policy, and …
1y ago
4 answers
Psychology
In sum: I think the social problems here have more to do with communication than they do with policy, and I'm extremely wary of softening policy in order to avoid hurting anyone's feelings.
To be fair, sometimes people should feel judged for posting a duplicate question - they clearly haven't made any attempt to look for the already-existing information within the site. However, it can also easily happen that someone thinks of the same problem in different terms, and thus there is a new title or description of the same problem that is also useful to have.
In most such cases, there is either no clear advantage or disadvantage to the new phrasing; the OP should simply be pointed at the old question with a note that has a positive tone: "good news, your question appears to have an answer here already:" etc. etc.
Marking duplicates, and other closures
Somewhere Else, marking something as a duplicate is treated by the system as a form of closure. Insofar as "closed" means "the state in which a question is publicly visible but may not receive new answers", I see no reason to diverge from that model. Recognizing a question as a duplicate absolutely should function as an injunction on answers. It is crucial to the value of a Q&A site that answers for a question are centralized:
-
so that someone who searches for those answers can get them all at once;
-
so that when those ideas are considered in the "marketplace", there is no friction in that system;
-
so that answers are given equal exposure, phrased in terms of the same motivating example (if there's actually a need to refer to a specific motivating example), and otherwise compared on an apples-to-apples basis.
If people react poorly to a prohibition on "their" question receiving new answers, but "their" question is in fact the same as one that does qualify for new answers, then there is a need to take ego out of the equation in some form. This is entirely a social problem, not a technical problem.
If people react poorly to "their" question being "closed", regardless of the implications of "closure", then evidently the problem is that terminology, not the implications. We should rephrase this, and I think that would be a good idea anyway. The other experimental phrasing for this idea is "On Hold"; I don't think this is really any better.
I don't have a clear-cut proposal here, but I do want to note that this phrasing interacts with policy for closing questions. If we ever reached a level of popularity, for example, where it makes sense to start out questions in a closed state just to maintain basic QC, then it would in turn make sense to use terminology like "under review", "being workshopped" etc.
Disputing closures
The procedure laid out for us by our forbears is pretty straightforward: just as the question was nominated and confirmed for closure by a certain number of votes, it can be nominated and confirmed for reopening in the same way. While the closed question cannot receive new answers, it can be edited, including unilaterally by the OP.
My contention here is that this system mostly works fine. To the extent that it fails, this is a technical problem with discoverability of the reopening process.
Perhaps OP should be able to add a privileged note explaining why the identification as a duplicate is contested - sometimes someone in OP's position won't have an easy time editing the question, but this information can make it possible for someone else to apply the edit. In other cases, it can help commenters to explain why the target question really is a duplicate.
There definitely should be a way that curation-minded users - whether or not they have an actual Curate ability yet - can easily search for recently closed questions, and specifically for recently duplicate-marked questions, in order to review those decisions.
The system definitely needs to communicate clearly to users, in the aftermath of a closure, about what they can and should do. Ideally this process is interactive, like the initial guided tour, asking the user questions like "did you understand the linked question and did it solve your problem?" (At this point we could even softly suggest adding a "worked for me" reaction...) "Is your actual question fundamentally different? If so, how?" etc. Designing the interface for something like this requires considerable thought, and should definitely not be done "by committee" through Meta.
The "help desk" question
Communities need to decide for themselves whether people who come in with a problem to solve and get directed to existing Q&A, are responsible for understanding how that Q&A applies to their personal situation.
If a community sees itself as offering a "help desk" to new users, then it is useful to be able to write an "answer" that references an existing Q&A and briefly explains how the existing Q&A applies to the current circumstance, perhaps showing the result of applying advice from the answers to the specific circumstances. There shouldn't ever be a need for more than one such answer, and technical measures should be used to prevent competition in that regard. Once the question is identified as, let's say, an example of a common problem, it could have an automatically generated, templated, community-wiki answer that cites the would-be "duplicate" and which can then be edited to add explanation (of the sort that, Somewhere Else, would be given in the comments if at all).
On the other hand, if such questions aren't accepted (i.e. not even in a separate category, as I've been suggesting in some other Meta Q&A), then there is no particular need for such "mercy" - just close the question as a duplicate.
0 comment threads
As a momentary foreword, I would like to express my appreciation to you and all of the members of the community which have worked to make this site possible. I wish it the best in its growth and development, and hope to help as a part of it. Now, here's my proposal on this topic:
Duplicate proposals should be separated from closure-as-duplicate voting.
I think you've made a very good point about question closure (and the accompanying feeling of judgement—or, as I would put it, dismissal, perhaps even exclusion), and I think this insight should be applied to duplicates a step further than you suggest.
Again, as you mention, a question being marked as a duplicate (of an answered question, at least) should be a good experience for the question author if all goes well, who should then gladly close the question as a duplicate. However, I think there's a fundamental problem with a single proposal-closure duplicate system: the tendency of users viewing a question marked as a duplicate is to judge whether the question as asked is a duplicate for themselves. There's the added issue of the voting process starting immediately, not necessarily giving the author time to review the question before it's closed—and once that happens, it can easily become a fight against the perception that a question is closed for good reason.
In either case, there is an important step which is being missed in a system which treats the question, not the author: the primary purpose of this site is to answer the questions of its users, whether directly posed or through the provision of existing answered questions, and in the case of a disputed duplicate proposal, the author feels that the linked question does not answer the question the author is trying to ask. This mismatch is key to resolving good-faith duplicate disputes, and should be the primary focus of a user viewing a question with a duplicate suggestion that has not yet been closed.
This might seem like a benign issue, in which I'm putting too little faith on the users judging the question duplicity, but there's a deeper principle here: we should be maximising the ease of use for individuals acting in good faith (i.e. trying their best to follow its rules and principles), but an author who feels that a question has been wrongfully closed as a duplicate is quickly put in an uphill battle, being required to convince and get the attention of enough people to reach a position where the question that the author is trying to ask can be asked instead. This can be exhausting, and worse, can drive users away from trying to ask their questions, and I think there should be some safeguard against this kind of situation.
My proposal for the procedure of duplicate resolution, in full, then, is this:
- A user marks the question as a possible duplicate, linking to the question(s) it is believed to be a duplicate of.[1] An explanation of why the question is thought to be a duplicate should be provided such that the author has immediately actionable information.
- The author has 72 hours to respond to the proposal, either by accepting the proposal (closing the question as a duplicate), or by disputing the duplicate.[2] Any activity on the question by the author reduces the window to 1 hour, since its purpose is to give the author time to edit the question or otherwise address the proposal.[3]
- While a question is proposed as a duplicate, it is marked as such to direct user attention to comments where the proposal is hopefully being discussed between disputer, author, and other involved users. This period is designed to help the author and community resolve the misalignemnt between (author) intent and (community) perception of the question.
- Once a duplicate proposal is disputed, or the window expires[4], voting opens for closure on the basis of being a duplicate question. The question is closed as a duplicate if the net vote passes a certain positive threshold determined by the proportion of votes (e.g. +25%; i.e. at least 5:4 pro:versus[5]), with some minimum (e.g. +3 votes net) for small questions. This is important on its own (and something which I don't think I've seen discussed) because the judgement of a question should scale correctly with the popularity of the question: proportional judgement is how we do things in a democratic society, except the threshold is usually 50:50 (or 51:49)! This should also mean that cycles in closure and reopening will be less likely.
- As in your suggested procedure, a user (including the author) can vote to reopen a question, starting a new vote, and this vote may have a different threshold—perhaps the inverse of the threshold to close. A user with the Edit ability may reopen a question automatically with an edit of the question (presumably with the intent to differentiate it from the linked (duplicate) question). Each time a question is reopened, the window for the author to respond and the threshold to close the question (again) decrease slightly.[6] This prevents questions from being endlessly reopened, and means that an author should still be careful about the edits made before reopening the question, even with the ability to do so without approval.
I believe this proposal satisifes (almost[7]) all of the goals proposed in the question:
- The author is notified of a duplicate proposal, which can be made as soon as the question is posted. While a duplicate proposal is awaiting resolution, a notice informs users viewing the question of the possible duplicate, meaning that they can check the linked question(s) and each make an assessment before spending any (potentially redundant) time and effort answering the question: the question should only receive answers from users who believe it not to be a duplicate.
- Authors are provided with a question to check for duplicity (and answers to their original questions), and so can immediately make the assessment of whether to approve or dispute the duplicate proposal.
- Feedback of either kind is collected accordingly via suggestions in comments or even other duplicate proposals.
- Presenting duplicate proposals as something which authors are at their leisure to make the choice to accept, I believe, would do a massive deal of good for the way that this would feel. An author can also feel safer knowing that the question has a period of time in which it can be judged by a reasonable sample of people before it is threatened with closure by a small sample size.
Addendum - a personal note
I understand and appreciate that this proposal may be seen as too extreme. I think that it's appropriate for me to mention that part of the reason I wanted to write this answer is because I had a pretty unpleasant time recently at Somewhere Else regarding this topic (and closure in general), and I felt, quite frankly, bullied and powerless in the way that the system was set up. I would like to ask only that you consider what cost this proposal (in its whole or as its separable components, if any) would have to the good operation of this site.
Thank you again for your role in this site, Monica. I hope that my first post on this site is received well. :)
-
As a note, I posit that in a well-functioning duplicate resolution system, there should only be one duplicate to link: if a question is sufficiently answered by two different questions' answers, then either one of those questions should be marked as a duplicate of the other, or the question is sufficiently different to warrant its own answers tailored for the question. ↩︎
-
Whether an edit of the question should be required is a detail for debate. I think this should be a decision made by the author in response to the assessment of the validity of the duplicate suggestion: if the question is then voted for closure in its unedited state, the author suffers the consequences for that decision, but nothing is broken in the system. ↩︎
-
This is intended as a reasonable safeguard against abusing this window. If this suggestion is seen as reasonable, I propose that an author should ideally be warned that an action will trigger this mechanism. ↩︎
-
One might argue that the question should instead be automatically closed as a duplicate in the case of no response from the author. However, I think that we lose very little as a community by giving the most favourable treatment of the author of a disputed question in this system. If users are participating properly, then the duplicate proposal should be naturally approved in good time by visiting users anyway, and I think that covering the edge case of an author who is unavailable for more than three days after posting a question is worth more than the cost of having a duplicate question up for as much longer as it takes for the community to close it by concensus. ↩︎
-
To be clear about what I mean here for completeness, I mean that for a voting ratio of
x:y
(x
pro,y
versus), a threshold of +25% means thatx >= 1.25 y
, that is, votes for should be 25% greater than the votes versus. I realise this might be a pretty unconventional way of doing it, so perhaps defining a threshold directly (by ratio or otherwise) would be preferred, and I apologise for this eccentricity. ↩︎ -
For the response window, I'd suggest a reduction of −12 hours per reopen suggestion, to a minimum of 32, which covers one day plus a reasonable window for difference in activity time during the day. For the threshold, I think that −5% is a reasonable step, to a minimum of +5%. A question which has hit this minimum threshold can no longer be opened without approval by a user with the Edit ability. ↩︎
-
The last bullet point is a little vague, so I hope that not satisfying it directly doesn't detract from the quality of this suggestion. However, I believe there could be merit in having a grace window before a question can (first) be closed in general, in order to give the question time to be seen (and answered) by a proper number of users. ↩︎
Question closure can leave people feeling judged (as we learned Somewhere Else)
...counter the impression that duplicates are bad
To solve that problem, one needs to address the source. Somewhere Else would instantly make a conclusion like "aha it's the evil community being rude again" and then come up with some misguided system to counter that. But by applying a slight bit of empathy, we can get to the root source:
The people who cast duplicate close votes Somewhere Else are fed up with endless duplicates. Newbies asking the same question over and over again, with little to no research effort made. Therefore the regulars get tired of that behavior and just close the posts without providing much feedback to the person who asked the question.
There exists a somewhat rare phenomenon though: sometimes when a high quality question that is a dupe gets asked, it is left open long enough for good answers to pop up. And when this happens, this new question might actually turn into the "canonical" target for duplicates. And then the old, present duplicates get closed with the new post as target. It's a very good thing when this happens. The old duplicates of diverse quality are not necessarily the best ones.
But most of the time, new questions that are duplicates just get closed with an old post as target, because that's how the system was designed.
Somewhere Else is suffering from the results of this: there's a lot of old posts with canonical status but so-so quality. Also such posts tend to attract a whole lot of answers over time, where everyone and their mother feels inclined to contribute even though they aren't adding anything new. Or in case they are adding something new, they only add that part and not a complete answer. So over time the canonical post "fragments" into several answers and the result as whole is not very good.
It would be better if these old posts were recompiled into complete answers and one natural way to do that is to close them when something better and more complete shows up. But the duplicate system often doesn't let that happen.
We shouldn't close posts as duplicates unless there exists a high quality duplicate target. If a question has been asked before, then that alone should not be a reason to close it.
Update based on feedback
I really like the ideas proposed in this answer, which at the time of this posting has the same vote count as this question (+6, -0). The proposal in that answer creates what I think is a better experience for the asker and the community, and is less complicated from a behavior and UI perspective. Win! Some additional points came up in a discussion, and this other answer also raises important points, so I'm going to try to bring this all together here.
New proposed workflow:
- A user with the Participate Everywhere ability marks a question as either similar to or a duplicate of another linked question.
-
You can only propose a duplicate if the other question has at least one well-received answer (thanks Lundin for raising that). "Duplicate" is about guiding people to answers; if a bunch of similar questions exist but none are answered, then perhaps what we have been waiting for is the right framing, and we shouldn't shut the new question down. (Maybe the others will end up as duplicates of this one!)
-
Similar is just informational; we find a place to show the links with the question. The question remains open to receive answers. The rest of this workflow is specifically about duplicates.
- The author has 72 hours to respond, as described in the first answer I linked.
-
If the author has the Edit ability, we (once) treat the edit as resolving the duplicate suggestion. Those links are moved into the "similar to" list, and somebody might re-nominate a duplicate.
-
If the author does not have the Edit ability, or if the author has previously resolved a duplicate suggestion here with an edit, the "suggested duplicate" notice on the question is updated to add that the author edited after the suggestion was made. Anybody else with the Edit ability can confirm that the edit resolves the issue and dismiss the duplicate notice (moving it to the "similar" list as above).
-
While there's a duplicate suggestion pending, we show a notice and alert people starting to answer.
-
When the author disputes the duplicate suggestion or the window expires, the community can vote as described in the feature proposal. Instead of a net score, we're looking for a ratio, as suggested by Fie, with a minimum number -- something like "at least 3 dupe votes, and at least 60% of all votes are yes". As Fie pointed out, this allows voting to scale with question popularity, which I think will help prevent the ping-pong effects of groups of three users alternately voting yes and no.
-
Removing a duplicate linkage ("reopening") is possible; see step 5 in Fie's answer.
2 comment threads