Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
Comments on Let's improve how we handle duplicates
Parent
Let's improve how we handle duplicates
Currently, marking a question as a duplicate is part of question closure. Duplicates are a little different from other close reasons, though -- often the question itself is clear, complete, and otherwise solid, but it happens to have been asked before. Question closure can leave people feeling judged (as we learned Somewhere Else), but finding a duplicate should make the asker feel happy -- "we already have an answer for you". I've been wanting to change how we handle duplicates for a while -- the semantics are different, so why should they be part of the same workflow?
Here's a proposal; please provide feedback and help refine it.
Goals
-
Address duplicates as promptly as possible, to get askers to their answers and to reduce effort spent on what turn out to be duplicate answers.
-
Help authors to differentiate their dupe-nominated posts (if they disagree) and expedite resolution when they do.
-
Enable the community to have an ongoing evaluation by collecting all types of feedback including disagreement.
-
As already noted, counter the impression that duplicates are bad.
-
Test some ideas that would apply to closures too (which we also want to improve).
The main ideas
Someone who thinks a top-level post[1] is a duplicate can propose it, including an optional comment with the link. The suggestion is shown on the post and a comment thread is created for discussion. Other people who see this notice can agree, disagree, or propose other duplicate targets. We keep a running tally of votes in both directions, as opposed to going through close/reopen cycles.
The author is given specific editing guidance (or can accept a dupe suggestion). If the author edits in response to the dupe suggestion, and has the Edit ability, we (initially) trust that the edit resolved the issue -- clear the dupe suggestions, record everything in the history, and otherwise reset. Question: To avoid abuse or "dupe wars", should we only do this once (per post)?
If the author doesn't have the Edit ability, then -- while the edit takes effect (you can always edit your own posts), the dupe suggestion remains. People who can review suggested edits see a notice on the post asking them to review the edit and decide if it resolved the duplicate suggestion. If yes, proceed as for the author edit.
If "enough" people (score threshold still TBD[2]) agree that a post is a duplicate, it's marked as such. A duplicate designation can be reversed by the community.
Duplicate identification and resolution is democratized much more than other closures. I propose that anybody with the Participate Generally ability can participate in these votes.
In more detail
The following is taken from the draft specification. That spec also talks a little about closure ("hold"), which isn't very far along and will probably change so please don't focus on it.
Functional specification
Codidact supports duplicate suggestions and hold suggestions. Duplicates are not a type of hold -- the focus of a duplicate is "get to an answer more quickly" and link posts together, while hold is more about closing a question down until problems are addressed. We think the user experience of duplicates can be improved if they're not treated as closures/holds.
Duplicates are, intentionally, more "democratic"; while holds require the Curate ability, anybody with Participate Generally can participate in duplicate resolutions.
Suggesting a duplicate
Anybody with the Participate Generally ability can propose that a top-level post is a duplicate of another top-level post in the same community. (This could be a different category.) This spec also covers "superseded" or other duplicate-like phrasings -- the behavior is the same, even if a community customizes its wording.
To suggest a duplicate, any user (with the ability) can:
- select the Tools menu under the post
- select "suggest duplicate" from the menu (move "close" to this menu at the same time to reduce confusion)
- fill out an in-page form with a required link and an optional comment (the comment can be helpful when it's not obvious why the other question is a duplicate)
Question: Should we disable the option if you have a suggestion pending, i.e. one suggestion per user at a time?
On submission:
- A "Possible duplicate" comment thread is created or updated. A comment is added with the link and (if provided) additional comment. These comments are attributed (duplicate suggestions are not anonymous).
- If there are now enough votes for the same duplicate target ("enough" to be defined), the question is marked as a duplicate. The author receives an inbox notification.
- Otherwise, we display a notice of the suggested dupe, including links to the target and the comment thread, with action buttons (see below).
- The author and everybody who has already answered the question receive inbox notifications of suggested duplicates.
- State changes (marking a question as a duplicate or reversing it) are recorded in the post history.
Notice and actions
The notice is something like the following:
This question might be a duplicate of (other title with link) (could be multiple).
Community members provided the following feedback: (comment text that accompanied votes, unsigned here, and link to thread)
The author additionally sees:
Please read the linked question and its answers. If your question is different, you can edit to clarify.
And two buttons: "Yes, it's a duplicate" and "No, I will edit". See "author response" for how these buttons are handled.
Question: Should there be a third option, for "no, I disagree and don't need to edit" (spurious suggestions, etc), which would be treated as an ordinary "disagree" vote?
Everybody else who has Participate Generally sees two buttons next to each suggested duplicate: "agree" and "disagree". Choosing either prompts for a comment to add to the thread (like the initial suggestion).
Question: should each dupe suggestion show the number of suggest + agree / disagree tallies? Or should people who want to know the details have to go to the comment thread?
Answering a possible duplicate
While duplicate suggestions are pending, starting an answer generates a "hey, this might be a duplicate" alert, form and wording to be determined. This serves two purposes: (a) if you know enough to answer the question you probably know enough to contribute to the evaluation of whether it's a duplicate, and (b) you might want to answer that other question instead (or in addition).
Author response
If the author agrees it's a duplicate, the question is so marked (author's vote is binding). A notice is added and "[duplicate]" is added to the title. If there are multiple suggestions, the author selects one or more.
If the author disagrees and begins an edit (either via the button or the usual way):
-
It's the usual edit interface, except that "My question is not a duplicate of (link) because" has been inserted at the bottom and (ideally) the cursor is positioned there. If there's more than one dupe suggestion, do this for each and position cursor at the first.
-
If the author has the Edit ability, when the author submits the edit, the duplicate notice is removed from the question (for all viewers) and this review/resolution is logged in the history. (We can talk about yo-yo cases, where the author keeps rejecting duplicate votes this way, but I think it's something we should consider later. Let's not over-complicate it to start. Perhaps we only allow one author-edit resolution per question.)
-
If the author does not yet have the Edit ability, the duplicate notice remains and is updated to add a message along the lines of "thanks for your edit; the community will review to see if it's not a dupe any more" (not those words). The community sees something like "the author edited this post in response to duplicate suggestions" and, for those who can review edits, an invitation to do so.
Review: problem solved?
Users who can review edits see a notice on the post (similar to the "suggested edit pending" one) that says something like: "This question was suggested as a duplicate of (link) and the author has edited to address the suggestion. (review button)".
Entering the review shows the diff (like for a suggested edit) and includes a link to each suggested duplicate.
The options for the review are "Not a duplicate" and "Still a duplicate".
-
Choosing "still a duplicate" prompts for a comment and is treated like a duplicate vote. If there are multiple duplicate suggestions, the reviewer checks off which ones apply (maybe it's not a dupe of A any more but still is of B).
-
Choosing "not a duplicate" resolves the suggestions -- the question is reset to its "ordinary" state, with the resolution being logged in the post history, and the "possible duplicate" comment thread is archived. (Subsequent duplicate suggestions start over with a new thread.)
Reopening
If a post was marked as a duplicate, everybody sees the duplicate notice. Those with the Participate Everywhere ability also get the "disagree" button, like when duplicate votes are still pending. Here the comment is required -- explain why the duplicate status should be removed. The comment is added to a "possibly not a duplicate" thread. The duplicate notice is updated to add something like:
This question might have been incorrectly marked as a duplicate. Community members provided the following feedback: (comment text, link to thread).
Unaddressed issues
- Retracting votes
- Third-party edit from someone trying to help -- how does that affect the flow?
- Vote threshold
-
Usually questions, but there's no reason an article couldn't be a duplicate. A community that uses articles for sandboxing could mark those as duplicates of the resulting questions, clearly signaling that the sandbox phase is done and linking to the live question. ↩︎
-
I think the score threshold -- the net score to mark a duplicate -- should be relatively low, 2 or 3. It should also be a community setting. ↩︎
> Question closure can leave people feeling judged (as we learned Somewhere Else) > > ...counter the impression that …
3y ago
As a momentary foreword, I would like to express my appreciation to you and all of the members of the community which ha …
3y ago
Update based on feedback I really like the ideas proposed in this answer, which at the time of this posting has the s …
3y ago
Psychology In sum: I think the social problems here have more to do with communication than they do with policy, and …
1y ago
Post
As a momentary foreword, I would like to express my appreciation to you and all of the members of the community which have worked to make this site possible. I wish it the best in its growth and development, and hope to help as a part of it. Now, here's my proposal on this topic:
Duplicate proposals should be separated from closure-as-duplicate voting.
I think you've made a very good point about question closure (and the accompanying feeling of judgement—or, as I would put it, dismissal, perhaps even exclusion), and I think this insight should be applied to duplicates a step further than you suggest.
Again, as you mention, a question being marked as a duplicate (of an answered question, at least) should be a good experience for the question author if all goes well, who should then gladly close the question as a duplicate. However, I think there's a fundamental problem with a single proposal-closure duplicate system: the tendency of users viewing a question marked as a duplicate is to judge whether the question as asked is a duplicate for themselves. There's the added issue of the voting process starting immediately, not necessarily giving the author time to review the question before it's closed—and once that happens, it can easily become a fight against the perception that a question is closed for good reason.
In either case, there is an important step which is being missed in a system which treats the question, not the author: the primary purpose of this site is to answer the questions of its users, whether directly posed or through the provision of existing answered questions, and in the case of a disputed duplicate proposal, the author feels that the linked question does not answer the question the author is trying to ask. This mismatch is key to resolving good-faith duplicate disputes, and should be the primary focus of a user viewing a question with a duplicate suggestion that has not yet been closed.
This might seem like a benign issue, in which I'm putting too little faith on the users judging the question duplicity, but there's a deeper principle here: we should be maximising the ease of use for individuals acting in good faith (i.e. trying their best to follow its rules and principles), but an author who feels that a question has been wrongfully closed as a duplicate is quickly put in an uphill battle, being required to convince and get the attention of enough people to reach a position where the question that the author is trying to ask can be asked instead. This can be exhausting, and worse, can drive users away from trying to ask their questions, and I think there should be some safeguard against this kind of situation.
My proposal for the procedure of duplicate resolution, in full, then, is this:
- A user marks the question as a possible duplicate, linking to the question(s) it is believed to be a duplicate of.[1] An explanation of why the question is thought to be a duplicate should be provided such that the author has immediately actionable information.
- The author has 72 hours to respond to the proposal, either by accepting the proposal (closing the question as a duplicate), or by disputing the duplicate.[2] Any activity on the question by the author reduces the window to 1 hour, since its purpose is to give the author time to edit the question or otherwise address the proposal.[3]
- While a question is proposed as a duplicate, it is marked as such to direct user attention to comments where the proposal is hopefully being discussed between disputer, author, and other involved users. This period is designed to help the author and community resolve the misalignemnt between (author) intent and (community) perception of the question.
- Once a duplicate proposal is disputed, or the window expires[4], voting opens for closure on the basis of being a duplicate question. The question is closed as a duplicate if the net vote passes a certain positive threshold determined by the proportion of votes (e.g. +25%; i.e. at least 5:4 pro:versus[5]), with some minimum (e.g. +3 votes net) for small questions. This is important on its own (and something which I don't think I've seen discussed) because the judgement of a question should scale correctly with the popularity of the question: proportional judgement is how we do things in a democratic society, except the threshold is usually 50:50 (or 51:49)! This should also mean that cycles in closure and reopening will be less likely.
- As in your suggested procedure, a user (including the author) can vote to reopen a question, starting a new vote, and this vote may have a different threshold—perhaps the inverse of the threshold to close. A user with the Edit ability may reopen a question automatically with an edit of the question (presumably with the intent to differentiate it from the linked (duplicate) question). Each time a question is reopened, the window for the author to respond and the threshold to close the question (again) decrease slightly.[6] This prevents questions from being endlessly reopened, and means that an author should still be careful about the edits made before reopening the question, even with the ability to do so without approval.
I believe this proposal satisifes (almost[7]) all of the goals proposed in the question:
- The author is notified of a duplicate proposal, which can be made as soon as the question is posted. While a duplicate proposal is awaiting resolution, a notice informs users viewing the question of the possible duplicate, meaning that they can check the linked question(s) and each make an assessment before spending any (potentially redundant) time and effort answering the question: the question should only receive answers from users who believe it not to be a duplicate.
- Authors are provided with a question to check for duplicity (and answers to their original questions), and so can immediately make the assessment of whether to approve or dispute the duplicate proposal.
- Feedback of either kind is collected accordingly via suggestions in comments or even other duplicate proposals.
- Presenting duplicate proposals as something which authors are at their leisure to make the choice to accept, I believe, would do a massive deal of good for the way that this would feel. An author can also feel safer knowing that the question has a period of time in which it can be judged by a reasonable sample of people before it is threatened with closure by a small sample size.
Addendum - a personal note
I understand and appreciate that this proposal may be seen as too extreme. I think that it's appropriate for me to mention that part of the reason I wanted to write this answer is because I had a pretty unpleasant time recently at Somewhere Else regarding this topic (and closure in general), and I felt, quite frankly, bullied and powerless in the way that the system was set up. I would like to ask only that you consider what cost this proposal (in its whole or as its separable components, if any) would have to the good operation of this site.
Thank you again for your role in this site, Monica. I hope that my first post on this site is received well. :)
-
As a note, I posit that in a well-functioning duplicate resolution system, there should only be one duplicate to link: if a question is sufficiently answered by two different questions' answers, then either one of those questions should be marked as a duplicate of the other, or the question is sufficiently different to warrant its own answers tailored for the question. ↩︎
-
Whether an edit of the question should be required is a detail for debate. I think this should be a decision made by the author in response to the assessment of the validity of the duplicate suggestion: if the question is then voted for closure in its unedited state, the author suffers the consequences for that decision, but nothing is broken in the system. ↩︎
-
This is intended as a reasonable safeguard against abusing this window. If this suggestion is seen as reasonable, I propose that an author should ideally be warned that an action will trigger this mechanism. ↩︎
-
One might argue that the question should instead be automatically closed as a duplicate in the case of no response from the author. However, I think that we lose very little as a community by giving the most favourable treatment of the author of a disputed question in this system. If users are participating properly, then the duplicate proposal should be naturally approved in good time by visiting users anyway, and I think that covering the edge case of an author who is unavailable for more than three days after posting a question is worth more than the cost of having a duplicate question up for as much longer as it takes for the community to close it by concensus. ↩︎
-
To be clear about what I mean here for completeness, I mean that for a voting ratio of
x:y
(x
pro,y
versus), a threshold of +25% means thatx >= 1.25 y
, that is, votes for should be 25% greater than the votes versus. I realise this might be a pretty unconventional way of doing it, so perhaps defining a threshold directly (by ratio or otherwise) would be preferred, and I apologise for this eccentricity. ↩︎ -
For the response window, I'd suggest a reduction of −12 hours per reopen suggestion, to a minimum of 32, which covers one day plus a reasonable window for difference in activity time during the day. For the threshold, I think that −5% is a reasonable step, to a minimum of +5%. A question which has hit this minimum threshold can no longer be opened without approval by a user with the Edit ability. ↩︎
-
The last bullet point is a little vague, so I hope that not satisfying it directly doesn't detract from the quality of this suggestion. However, I believe there could be merit in having a grace window before a question can (first) be closed in general, in order to give the question time to be seen (and answered) by a proper number of users. ↩︎
2 comment threads