Research: Why Ratings on Everything from Wine to Amazon Products Improve Over Time

Ratings play an enormous role in our lives. Ratings made by critics, judges, and evaluators determine a range of outcomes, from the seemingly trivial (which wine you pick for dinner or which products you buy from Amazon) to the more consequential (which athletes win Olympic gold or which students get into top universities).

But how reliable are these ratings? How well do they hold up over time?

We thought about this when we learned about the speculation over wine rating inflation. When Robert Parker introduced his 100-point rating system for wine decades ago, the highest score he gave that year was 91 points. Now many wines each year receive perfect scores from his publication, the Wine Advocate. Similarly, in 2000 just 15% of wines rated by Wine Spectator received a score above 90. By 2015 the frequency of those scores had more than doubled: Nearly a third of all wines reviewed now receive a score above 90.

We wanted to know what was going on here, and whether people have a bias toward giving higher ratings over time.

Ratings Rise with Experience

In eight studies, recently published in Psychological Science, we captured more than 12,000 sequential evaluations to see whether ratings changed as the rater gained more experience. The evaluations covered much territory: judges’ scores on the TV show Dancing with the Stars, student grades from university professors, and ratings for short stories and photographs by college students. We also analyzed thousands of Amazon product reviews by devoted reviewers.

In one study, we analyzed 5,511 scores from the same panel of judges on Dancing with the Stars. As 20 seasons passed, we found that the more evaluations judges made, the higher ratings they gave. This was true even if we controlled for other factors, such as whether professional partners were actually improving or whether more-skilled dancers appeared in later seasons.

We followed that with a different study that examined student grades over a 10-year period in 991 courses that were taught several times by the same professors. As with the dance competition judges, the more times an instructor taught a course, the higher grades they gave. Again, we wondered if other factors could account for these results, including whether, over time, students were getting better, all course grades were increasing, courses that awarded higher grades were more likely to be offered again, and professors were improving their teaching. Despite controlling for these possibilities, we found the same results: When professors taught the same course many times, they tended to give higher grades.

To rule out alternative explanations, we also tested for this pattern in a controlled experiment in which people evaluated short stories. We asked 168 college undergraduates to rate one short story per day for 10 days. By the end of the study, all participants had rated the same 10 stories, but they each saw them in a different randomized order. Randomizing allowed us to isolate the influence of order (day 1, day 2, and so on) on evaluations. In other words, does making more evaluations, regardless of what people are evaluating, make ratings go up? As before, we found that the more stories a person rated, the higher ratings they gave. Consequently, the 10th story was rated higher, on average, than the first.

Across the board, we found the same result.

More Evaluating Makes Evaluation Feel Easier

Why might ratings rise over time?

We wondered if the process of evaluating might feel easier the more you do it, which may influence how positively you rate something. In a follow-up study, we asked 362 people in an online panel to rate one randomly selected story per day over 10 days. We also asked them such daily questions as, How easy was it to evaluate each story? As the days progressed, participants said that they found it easier and more enjoyable to rate each story. These feelings, in turn, led to improved evaluations for stories over time.

The findings suggest that biased evaluations are the result of a misattribution process: If something feels easier to evaluate, people believe that it must actually be better. In other words, they misattribute their own feelings about evaluation (it feels easier to make an evaluation) onto their assessment of the actual merits (this thing must deserve a higher rating). This was true even though each person’s sequence of stories was randomized.

When we asked if they thought their ratings were getting any higher over time, however, participants disagreed that they were. The outcome suggested that most people are unaware such bias might influence their judgments.

Product Ratings, Promotions, and Performance Feedback: How Trustworthy Are They?

Why do our findings matter for managers and organizations? One practical implication speaks to organizations that seek customer reviews. In a supplementary study, we found that reviewers on Amazon give higher product ratings the more reviews they give. For example, if someone makes an evaluation for the first or second time, she might give a lower star rating — regardless of the product — than if this is her 20th evaluation. If crowdsourced information is a key feature of an organization’s business model and a driver of consumer choice, biases like this would be important for business leaders to consider and for consumers to be aware of.

Our recent findings also raise an exciting, open question for managers: How might this bias in evaluations affect hiring, promotion, and performance reviews? Despite attempts to make accurate and fair assessments, our findings suggest that evaluation processes will benefit candidates interviewed by a recruiter who has been making evaluations for longer time periods. We are studying this next and seeking organizations to partner with.

We are also interested to see whether similar results would play out in promotion decisions and sequential annual 360-degree feedback processes. If true, the impact of these biases could be widespread and affect much of the current and prospective labor force.

Finding ways to mitigate this bias, such as making hiring assessments, performance reviews, and promotions more accurate, is something we are also eager to look at. We noted in our studies that most people seemed unaware that, over time, bias influenced decisions. One possible remedy is simply to make people aware of this potential influence on their decisions. There are other situational variables that we are also trying to better understand.

There are limitations to our studies worth noting. Despite the bias we found in all of the contexts we studied, many other factors contribute to evaluation decisions. Positive bias over time is just one. Second, there is some evidence that, under certain conditions, evaluations may also become more negative over time. The factors and conditions under which evaluations get more positive or more negative, however, is still an unanswered question.

Perhaps the next time you post on Yelp or spend time interviewing candidates, you’ll consider how many evaluations you have already completed and how your current assessment might drift more positively. Doing so could help you assess more accurately.

Conversely, when you depend on others’ numerical evaluations, keep in mind that the rating not only reflects the inherent product quality but may also be higher due to more-experienced raters. Indeed, it may be worthwhile to buy that older, lower-scored bottle of wine.