Welcome to Codidact Meta!

Codidact Meta is the meta-discussion site for the Codidact community network and the Codidact software. Whether you have bug reports or feature requests, support questions or rule discussions that touch the whole network – this is the site for you.

Post History

75%

+4 −0

Q&A Should we start displaying the score of a post instead of the raw votes?

Why show scores at all? When I was at Stack Exchange, we spent a good deal of time discussing sort orders in the context of obsolete answers. One suggestion was to change the sort order to use Wil...

posted 1y ago by Jon Ericson‭

Answer

#1: Initial revision by

Jon Ericson‭ · 2023-11-30T18:24:11Z (over 1 year ago)

Copy Link

Raw

Markdown

## Why show scores at all?

When I was at Stack Exchange, we spent a good deal of time discussing sort orders in the context of [obsolete answers](https://meta.stackexchange.com/questions/264045/lets-move-some-negatively-scored-answers-from-the-top-spot). One suggestion was to change the sort order to use Wilson scoring. The objections were:

1. The algorithm is confusing so will raise more questions than shed light.
2. For many questions sorting by the sum of positive and negative votes gives the same result as using a more complicated sorting mechanism.
3. Answer scores are part of the Stack Overflow brand.

The final point doesn't apply here except to the degree Codidact's strategy might be to distance/embrace a connection to Stack Overflow.

The goal of a number is to give people a quick view into how good the answer is. If nobody has voted, the answer is `¯\_(ツ)_/¯`. The more people who have voted, the greater confidence we have in the voting. For instance, which toaster would you buy?

![Toaster with 5 stars and 1 rating](https://meta.codidact.com/uploads/nqxx65yr93n9sunuhqjiknnuoafo)
![Toaster with 4 1/2 stars and 46 ratings](https://meta.codidact.com/uploads/6q1wahujrg93kercwzr0xj6euu62)

Even though we would normally pick a 5-star toaster over a 4½ star toaster, we know that 46 ratings probably means more than 1 rating. The single 5-star rating could very well have been from someone associated with the product.

There are some other things it would be helpful to know, such as how old the ratings are and if the product has changed after getting a handful of ratings. And this also applies to answers. An answer with many upvotes might not be good anymore if the world has changed and the answer hasn't. Or maybe someone edited the answer in a way that would have caused early voters to vote differently if the edit had been original. 

We have some intuition when it comes to the 5 star review system Amazon uses. The simple sum of positive and negative votes is also easy to understand. Codidact's display isn't for most people. Looking at answers to this very question right at the moment, I see:
* 6/1 
* 3/0
* 6/2

With some help from [a calculator](https://www.statskingdom.com/proportion-confidence-interval-calculator.html), I can see that's the right sorting:
* 0.4869
* 0.4385
* 0.4093

But it kinda breaks my brain to think about it. I just don't have the right intuition. (Yet?)

## Show rank instead?

A simple change would be to display ranks instead of scores:

> 1. This is the top ranked answer by Wilson score, so it's #1!
> 2. Still a good answer.
> 3. Not great, but better than #4.
> 4. Just good enough to avoid getting deleted.

If you click on the rank, you could see details such as the number of up/down votes, the confidence interval and maybe some indication of age of votes. It really doesn't matter how you calculate rank as long as it's documented somewhere. That's helpful because it could let you explore using other signals.

As Evan Miller wrote in ["Bayesian Average Ratings"](https://www.evanmiller.org/bayesian-average-ratings.html):

> Bayesian average ratings are an excellent way to sort items with up-votes and down-votes, and lets us incorporate a desired level of caution directly into the model. It is eminently “hackable” in the sense of affording opportunities for adjusting the model to accommodate new features (prior beliefs, belief decay) without violating the model’s consistency. As long as we make a judicious choice of belief structure (namely a beta distribution), it is feasible to compute.
> 
> As with other hackable systems, it is possible to take the Bayesian framework and produce something utterly useless. But as long as you set aside time for a little experimentation, I think you’ll eventually arrive at a sorting method that helps people find the good stuff quickly.

Now you do lose something because there's still a vast difference between answers scored +10/-0 and a +0/-10, but with only two answers the second one will be ranked second. (By definition!) Sometimes second answers are great, but not in this case. So maybe some indication of the strength of the system's belief in the quality of the answer would be helpful. (I'm partial to [Isaac Moses' signal strength indicator](https://meta.codidact.com/posts/276749/276751#answer-276751).) But the primary signal is simply where the answer is placed on the page and maybe a number showing that rank.

## What about random, weighted placement?

If you remove the numbers (whether score or rank) that opens up another possibility: randomly place each answer on the page with better answers weighted more heavily to be on top. When there are several answers with no votes, this is an honest method of display since the system can't know which is the best. Chronological sorting is usually a default, but is the best answer the first or the last? On a programming site, the more recent answer might have incorporated updates to the tools. But the older answer might be standard and the newer answer a speculative variant. Only a human can tell.

Once votes start to roll in, the system can estimate which answer is better, but it's still an estimate. Bumping up based on upvotes and down based on downvotes gives a more accurate view, of course. But random placement gives downvoted answers a chance of getting support and avoids  the [fastest gun bias](https://meta.stackexchange.com/questions/225033/how-big-is-the-fastest-gun-in-the-west-bias/225034#225034).

Random sorting has an impact on performance because of caching. You either need to rebuild the page every time it loads or be stuck with just one random sorting until something triggers a rebuild. Building this is **probably not worthwhile unless there can be some experimentation showing that random order is useful**.

## How about bins?

Another option would be to group answers in "quality bins". Pick a Wilson score threshold to be "great answers" and put them in a section nearest the top of the page. Answers the system is less confident about would be in a lower group. Maybe you only need two groups:

* Good answers
* Other

Or maybe a bin for bad answers that aren't yet deleted. I'm less sure that's helpful though becasue:

## Most readers want the top answer and maybe the second

Based on my experience, the key indicator of answer quality for visitors is the top answer. (I thought I did a study at Stack Exchange, but I can't find any evidence. Maybe it was internal?) This might be less true on more philosophical communities (like meta), but people don't have much patience for reading past one or maybe two answers. That means sort order is _really important_. I find the current +/- display distracting. I want a discoverable way to see the breakout, but my intuition about that the numbers mean (perhaps from years of seeing scores) isn't formed yet. Sort order probably tells me all I need anyway.

Communities

Post History