Welcome to Codidact Meta!
Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.
What is "clutter" in the context of Codidact?
It's common to hear comments about QA sites being "cluttered", "clogged", "spammed" etc. with types of questions that the commenter doesn't want.
What does this actually mean? Is there a definition of "clutter"?
I think of clutter primarily in terms of how it impacts community norms. If your programming language Q&A site has an …
1y ago
> It's common to hear comments about QA sites being "cluttered", "clogged", "spammed" etc. Even disregarding adverti …
1y ago
Clutter refers to the unwanted or uninteresting stuff you have to dispense with before getting to what you want. On a Q …
1y ago
This is not an answer, but more detail about why it confuses me. I left it out of the question to keep it succinct. You …
1y ago
4 answers
This is not an answer, but more detail about why it confuses me. I left it out of the question to keep it succinct. You can read this as an answer to the effect of "I have searched for a meaning/definition and failed to find one".
I know I hate my house or my desk being cluttered. I am not wealthy, so space is very limited on both. When there is clutter, it is hard an available space to put things I'm working on. The clutter, being disorganized, is hard to find things in. Also, I see my entire house, entire desk on any given day, clutter and all, and it bothers me. But I can't connect this to a QA site: Stack sites have millions of questions and usually pages are a few tens. I never see most of them. It is not hard to "find" things in "the pile" because the search and tag features usually work very well. Space is effectively unlimited (obviously storage costs money, but I assume short text questions are not many GBs each).
I can relate to "clogging" - I hate it when my toilet gets clogged. It clogs because I try to flush too much, and the pipe is of a limited size. Too much throughput. However, a QA site has effectively unbounded throughput. Every new user increases it. To be sure, users have preferences - people who like to edit and tag help with the categorization throughput (ie. how many questions per time we can effectively triage, QC and give feedback on) while people who like answering increase the answering throughput. But even question-only users are potential answerers and curators. No human is really allergic to answering, some are just less confident about it than others. But when there's enough unanswered questions floating about, even shy people tend to start venturing an attempt. So even if we started getting new users that only ask, never answer, the problem would self-correct because they would start answering. If only my sewer pipe grew wider on its own every time I flush...
Spam is another familiar problem. I have many email accounts I abandoned, because the sheer volume of spam makes it impossible for me to find pertinent emails. Of course, many mail providers severely limit filtering features (some want you to pay for them). The problem is that I am one person, and the rate at which I can sort emails is fixed. In fact, it probably reduces somewhat with age, as does the time and patience I have available for sorting them. However, the spam only grows, as my address gets out there and spreads among spammers. But in a QA site, as I noted in the previous paragraph, the filtering capacity organically grows without limit. Moreover, technology is very advanced now and it's not hard to do all sorts of automations, like suggesting tags based on word content. SO did this a decade ago and it worked very well. Now that we have LLMs, I feel like the sky is the limit for categorizing questions with minimal human involvement.
0 comment threads
I think of clutter primarily in terms of how it impacts community norms.
If your programming language Q&A site has an about page that says, ‘We welcome questions about all programming languages,’ but out of the last 100 questions, 97 of them were about SQL, you have a de facto SQL Q&A site. For a new user with a Julia question, do you think they're going to look at the about page first, conclude that this is a site where their Julia question is likely to get a good response, and ask it? Do you think they'll quickly reach for the tags feature to hide all the SQL-related questions and use that view to form their opinion of the site? Or do you think they'll conclude that the written policy is less salient than the self-evident norm that participants here are focused on SQL, and bounce off?
This likely bad outcome is despite the fact that SQL questions are on-topic for the site, as intended and as written! What to call this problem? I call it clutter.
The same concept applies to other community norms, like the one we're discussing in your post about research effort. If the last 97 questions out of 100 are low-effort and unresearched, a new user won't read our guidelines for how to write a good question; they'll just ask a low-effort, unresearched question themselves, because they correctly perceive that this is the norm. Any remaining users who would prefer a different norm will find that this is no longer the community they were originally interested in joining; no amount of tagging or AI filtering will restore to them the flow of new users who are now being trained to the low-effort norm. They have been cluttered out.
1 comment thread
Clutter refers to the unwanted or uninteresting stuff you have to dispense with before getting to what you want. On a Q&A site, that usually means low quality questions. These questions clutter up the site so that when you look around for good questions the low quality ones get in the way and make the process less efficient and more annoying.
Clutter is also more than just in the way when you are looking for something good. It degrades the apparent value of the whole site when someone is first looking around. In that sense, it's not as much clutter as junk that lowers the average quality of whatever part of the site is investigated.
And no, good search doesn't really solve the problem. The search algorithm can be distracted by clutter just like a human. A search might identify 5 question out of 5000 that match your criteria, but several of those 5 could easily be clutter.
One thing we do to combat clutter is voting. Clutter would usually be voted down, while good content is voted up. Differentiating questions and answers by vote score does help. However, sometimes we don't want something else making decisions for us about what we want to see. Then clutter gets in the way again.
It's common to hear comments about QA sites being "cluttered", "clogged", "spammed" etc.
Even disregarding advertising / off-topic astroturfing etc. (the sort of thing that the "It's spam" flag is intended for), I think that "spammed" has a subtly, but fundamentally different meaning from "cluttered"/"clogged" here.
In my mind, "spam" in this context means large amounts of self-similar content, in a context where that is unwanted or undesirable. There's a specific issue here that the axis of "similarity" might not be recognizable until after the content is already there. This seems to be the problem that r~~ is getting at.
On the other hand, "clutter" is anything that gets in the way when you're trying to find something else.
Spammed content (say, questions about a particular sub-topic) could potentially have that effect, in a limited way: if the front page of the hypothetical Pets community is constantly filled with questions about dogs, then owners of other sorts of pets might not only feel unwelcome, but experts on the care of those pets will find it that much harder to access questions where they can share knowledge. Of course the search is available to everyone, but only the dog experts will see dog questions immediately.
However, in my mind, there are two much greater categories of clutter: low-quality questions (the problem Olin Lathrop points out), and superficially similar, but actually unrelated questions. The latter can be hard to avoid for technical sites, for a variety of reasons (jargon might have heavily overloaded meanings; important keywords like in
or and
could be common English words that search engines ignore or treat specially; a given error can have multiple unrelated causes in different contexts; etc.), but efforts should still be taken to ensure that "FAQs" can be easily found - both by users of external search, and by site curators.
When I have called Stack Overflow "cluttered" (or thought of it as being so), it's because I'm trying to close a duplicate question that I know is asked constantly, but I can't find a proper duplicate target - and my attempts turn up a large volume of totally unsuitable candidates.
There are a few tasks which I think are vital to avoiding such a mess (this list is what comes to mind immediately, but I'm sure there are more):
-
Experienced users should try to preempt bad beginner questions (based on the common gotchas that they know about from experience - I have written about this here before) by identifying the underlying source of confusion (etc.) and asking a good question about it instead.
-
Duplicates need to be closed swiftly and with fairly liberal interpretation of "duplicate". If a target is a near-exact match, consider whether the target's scope is inappropriately narrow, and if the question can be improved to avoid focusing on irrelevant details.
-
For computer-related sites, a question title should not be allowed to consist of only an error message. (There might be an analogous prescription for other sites, at least the relatively technical ones; but I don't know offhand what it is). It should at least either try to include a half-sentence summary of the context in which the error occurs, or indicate that the question is fundamentally about diagnosing the underlying problem, rather than fixing one specific such problem.
-
Ideally, every question would be approached from a mindset of "could this question, at least in theory, be used to close someone else's question as a duplicate some day?" If it's too specific to a single person's misunderstanding (or even just a simple oversight) then maybe it doesn't have value - at least in a main Q&A space; perhaps some Codidact communities will want a separate category for more personalized help.
If a question seems simply too low quality to use as a duplicate target, then improve the quality if at all possible. Hold high standards and keep on top of the quality issue from the get-go. We have seen how things play out on Stack Overflow for 15 years now; we have seen what they learned about question quality; we don't need to rediscover it.
0 comment threads