A Statistical Analysis of Pitchfork’s Ratings pt. 2

About a year ago, I whipped together a quick look into Pitchfork’s album rating system in order to give artists who have been reviewed by the site a better understanding of what their numerical value actually meant. In order to do so, I compiled a more-or-less complete breakdown of p4ks album reviews of original music (no reissues, no live albums, no “greatest hits”) from February 24, 2009 to February 24, 2010 and displayed the scores in more meaningful percentiles. I also made a series of observations concerning their “Best New Music” designation which produced a little bit of a stir.

Now, a year later, I decided it would be interesting to see if the dynamic of Pitchfork’s ratings system has changed over time by briefly analyzing album ratings from February 25, 2010 to February 25, 2011 against my previous data set. Again, I focused my attention on original content which meant I had to go through each album review individually and see if it met my criteria for inclusion. In addition to the items mentioned above, soundtracks, label compilations, and comedy albums were excluded but EPs, remix albums, and mixtapes were deemed OK as I wanted to maintain consistency with the previous year’s analysis. The first astonishing thing I noticed when putting together the data sets was how the total number of original albums reviewed by the site were nearly identical from 2009 to 2010 (1025 and 1027, respectively)!

So without further ado, let’s get to some plots (click on the images to view a slightly larger size):

The above histogram shows the distribution of the scores for each year. Glancing at the figures, you can easily tell that there is good agreement between the two years (in fact, the correlation coefficient is .9336, indicating that the data is strongly correlated). This was expected as any long-time p4k readers can attest to the fact that:

  • Pitchfork tends to not review low performing albums (the reason why the plots are negatively skewed).
  • A majority of album ratings fall within the 6.0 – 8.0 range.
  • An extremely small amount are of high quality, explaining the dearth of albums achieving 9.0 and above.

However, a more detailed inspection between the two histograms show a couple of interesting differences. For one, the most recent data set (I’ll refer to it as “albums from 2010″ from now on, which is somewhat of misnomer but close enough for our purposes) shows ratings centered at a higher value than the previous year’s data (which I’ll refer to as “albums from 2009″). This can be shown more clearly by looking at the two histograms overlapped on each other. What this implies is that the albums from 2010 were rated more favorably than albums from 2009.

Another interesting find when you compare the two histograms is that the most frequent album rating has changed from 7.0 to 7.5 over the past year. Not only that, but the number of occurrences for the most common rating has increased 20% (51 vs. 61 albums). In a related point, the 2010 histogram is noticeably more concentrated in the 6.5-8.2 range than the 2009 plot. This implies that Pitchfork is not distributing the scores as evenly in 2010 as compared with 2009. It’s common knowledge that Pitchfork doesn’t fully utilize the 100 potential rating options in the most effective manner (see: normal distribution), however, the fact that they are classifying more albums in less of a range is disheartening at best and troubling at worst. I call this trend towards album ratings homogeneity the “Rolling Stone Effect.”

Looking at the box plots of the data confirm both of these conclusions:

Notice the median score, denoted by the red vertical line in the box, shows an increase of .2 points between the two years (7.0 to 7.2) as well as lower quartile increase of .2 points and an upper quartile increase of .1 points. This confirms that, according to Pitchfork, albums from 2010 were generally better than albums from 2009. Also, the difference between the lower and upper quartile has shrunk by .1 (1.4 from 1.5) — and it would have been reduced a further tenth of a point if not for an uncharacteristically high amount of albums garnering an 5.8 rating in 2010 (notice the spike on the histogram at that value). This indicates that the range where a majority of albums score is indeed reducing.

Here are the percentile breakdowns for 2009 and 2010 so that any artist or band whose been fortunate enough to have gotten reviewed by the site can see how they stack up against other albums released within the same year. These percentiles also show how in 2010 an artist had to score a higher value in order to remain in the same percentile, further convincing us that p4k viewed 2010 as a better year in music than 2009:

Switching gears and looking at album’s that achieved the “Best New Music” designation, much of the same complaints from last year still apply:

After last year’s post, a lot of people expressed the position that the “Best New Music” category was meant for high quality albums that are easily accessible to the average music listener. As a result, this would automatically disqualify genres such as metal, electronic offshoots, and jazz. Even if taking this stance, I personally feel that these high scoring albums from “unfamiliar” genres should at least be better represented in the year-end lists (Kylesa landed at #44 and Forest Swords at #48 with Actress and Guido getting Honorable Mentions).

Well that does it for this year! If you want to run some other stats of your own, you can download my raw data here. I have a lot of ideas on where to go further with this project that (dependent, of course, if I have the time). Oh, one last thing I’d like to point out before saying adieu: of all the hyperlinked items and searchable content that is on the pitchfork site (artists, albums, labels, etc…), I find it incredibly surprising that you can not search for album reviews by rating or writer. It isn’t too terribly complicated to code these features and they would be incredibly helpful for site readers (especially data miners like myself). OK, on to the discussion!

Tags: , ,

15 Responses to “A Statistical Analysis of Pitchfork’s Ratings pt. 2”

  1. Korman says:

    Very well done

  2. ISLAND says:

    thanks, this was a great read!

  3. Josh says:

    You can search by review scores here: http://pitchfork.com/search/#advanced-search (look for the review scores tab)

  4. Mark says:

    I’d like to see how scores ranked against sales..!

  5. eighty says:

    well at least you tried hard?

  6. freddie hubbard says:

    is this for real??!

  7. Shaz says:

    Did someone actually pay you to do this?
    Or you have too much free time on your hands?

  8. Adam says:

    Great post. Totally agree about the travesty of Forest Swords not getting BNM, and the general principle that BNM is for mainstream/familiar genres.

  9. awesome read thanks! Nice to see some other Utah music bloggers.

    I usually have a pretty hard time with Pitchfork’s reviews, I just feel like they are the rich snotty kids that when they turn 16 their parents buy them a new beamer.

    I do read the reviews quite often at pitchfork and agree with about half of them.

  10. Ronnie Dawson says:

    what percentage of artists scoring 8.0 or above wear those silly sunglasses and ridiculous boat shoes? 100%

  11. Thanks for this. Crotchfork always seemed gutless. Would love to see ratings according to age of act; as in, how little does pitchfork give higher than a 7 to an band/artist recording since 1995 or earlier, no matter how good the record. In their defense, I suppose it’s tough to be cool.

  12. NINa says:

    Good reading, thanx! A decline of meaningful albums is visible in most of music styles without specialized math ;) If reviews are honest then the average rating would be somewhere in the middle. However, if a reviewer tends to encourage a band neverthless of poorly composed parts of some of the songs, then the rating will be still higher. If bands are eager to pay for reviews that can slightly rise the rating as well. Journalism isn’t the best paid job no mater of hard brain work (no copy-paste).
    That was a lot of work you did!

  13. Julien says:

    Thank you for this very interesting analysis ! I like the “rolling stone effect” but I have to ask if you think 2 data points are enough to demonstrate that it’s really happening. Like you say, the box plot shows differences of .1 (quartiles) or .2 (median), those numbers are rather low compared to the whole scale of 10 points so I suppose it would be wise to wait for a third year of data, or maybe to run the analysis on 6-months periods ? In any case this post is terrific. Cheers !

  14. Great article! Coming from a former psychology statistics student, I can appreciate the nuances of analyzing a data set. Thanks for sharing! Shoot me an e-mail if you’re interested in a link swap.

    Dylan

  15. [...] days later an album with an 8.1 receives it. Some Pitchfork readers, such as the author of the blog Part-Time Music, have done detailed statistical analyses of many scores and confirmed that the only truly concrete [...]

    [WORDPRESS HASHCASH] The comment’s server IP (173.201.216.56) doesn’t match the comment’s URL host IP (173.201.234.109) and so is spam.

Leave a Reply