Posts Tagged ‘Pitchfork’

A Statistical Analysis of Pitchfork’s Ratings pt. 2

Monday, March 21st, 2011

About a year ago, I whipped together a quick look into Pitchfork’s album rating system in order to give artists who have been reviewed by the site a better understanding of what their numerical value actually meant. In order to do so, I compiled a more-or-less complete breakdown of p4ks album reviews of original music (no reissues, no live albums, no “greatest hits”) from February 24, 2009 to February 24, 2010 and displayed the scores in more meaningful percentiles. I also made a series of observations concerning their “Best New Music” designation which produced a little bit of a stir.

Now, a year later, I decided it would be interesting to see if the dynamic of Pitchfork’s ratings system has changed over time by briefly analyzing album ratings from February 25, 2010 to February 25, 2011 against my previous data set. Again, I focused my attention on original content which meant I had to go through each album review individually and see if it met my criteria for inclusion. In addition to the items mentioned above, soundtracks, label compilations, and comedy albums were excluded but EPs, remix albums, and mixtapes were deemed OK as I wanted to maintain consistency with the previous year’s analysis. The first astonishing thing I noticed when putting together the data sets was how the total number of original albums reviewed by the site were nearly identical from 2009 to 2010 (1025 and 1027, respectively)!

So without further ado, let’s get to some plots (click on the images to view a slightly larger size):

The above histogram shows the distribution of the scores for each year. Glancing at the figures, you can easily tell that there is good agreement between the two years (in fact, the correlation coefficient is .9336, indicating that the data is strongly correlated). This was expected as any long-time p4k readers can attest to the fact that:

  • Pitchfork tends to not review low performing albums (the reason why the plots are negatively skewed).
  • A majority of album ratings fall within the 6.0 – 8.0 range.
  • An extremely small amount are of high quality, explaining the dearth of albums achieving 9.0 and above.

However, a more detailed inspection between the two histograms show a couple of interesting differences. For one, the most recent data set (I’ll refer to it as “albums from 2010″ from now on, which is somewhat of misnomer but close enough for our purposes) shows ratings centered at a higher value than the previous year’s data (which I’ll refer to as “albums from 2009″). This can be shown more clearly by looking at the two histograms overlapped on each other. What this implies is that the albums from 2010 were rated more favorably than albums from 2009.

Another interesting find when you compare the two histograms is that the most frequent album rating has changed from 7.0 to 7.5 over the past year. Not only that, but the number of occurrences for the most common rating has increased 20% (51 vs. 61 albums). In a related point, the 2010 histogram is noticeably more concentrated in the 6.5-8.2 range than the 2009 plot. This implies that Pitchfork is not distributing the scores as evenly in 2010 as compared with 2009. It’s common knowledge that Pitchfork doesn’t fully utilize the 100 potential rating options in the most effective manner (see: normal distribution), however, the fact that they are classifying more albums in less of a range is disheartening at best and troubling at worst. I call this trend towards album ratings homogeneity the “Rolling Stone Effect.”

Looking at the box plots of the data confirm both of these conclusions:

Notice the median score, denoted by the red vertical line in the box, shows an increase of .2 points between the two years (7.0 to 7.2) as well as lower quartile increase of .2 points and an upper quartile increase of .1 points. This confirms that, according to Pitchfork, albums from 2010 were generally better than albums from 2009. Also, the difference between the lower and upper quartile has shrunk by .1 (1.4 from 1.5) — and it would have been reduced a further tenth of a point if not for an uncharacteristically high amount of albums garnering an 5.8 rating in 2010 (notice the spike on the histogram at that value). This indicates that the range where a majority of albums score is indeed reducing.

Here are the percentile breakdowns for 2009 and 2010 so that any artist or band whose been fortunate enough to have gotten reviewed by the site can see how they stack up against other albums released within the same year. These percentiles also show how in 2010 an artist had to score a higher value in order to remain in the same percentile, further convincing us that p4k viewed 2010 as a better year in music than 2009:

Switching gears and looking at album’s that achieved the “Best New Music” designation, much of the same complaints from last year still apply:

After last year’s post, a lot of people expressed the position that the “Best New Music” category was meant for high quality albums that are easily accessible to the average music listener. As a result, this would automatically disqualify genres such as metal, electronic offshoots, and jazz. Even if taking this stance, I personally feel that these high scoring albums from “unfamiliar” genres should at least be better represented in the year-end lists (Kylesa landed at #44 and Forest Swords at #48 with Actress and Guido getting Honorable Mentions).

Well that does it for this year! If you want to run some other stats of your own, you can download my raw data here. I have a lot of ideas on where to go further with this project that (dependent, of course, if I have the time). Oh, one last thing I’d like to point out before saying adieu: of all the hyperlinked items and searchable content that is on the pitchfork site (artists, albums, labels, etc…), I find it incredibly surprising that you can not search for album reviews by rating or writer. It isn’t too terribly complicated to code these features and they would be incredibly helpful for site readers (especially data miners like myself). OK, on to the discussion!

The Knife // Silent Shout: An Audio/Visual Experience

Saturday, November 20th, 2010

Highly, HIGHLY recommend watching the video. One of my favorite groups of the past decade doing what they do best. Fucking amazing. (Available for one week via P4k)

The Knife // Wanting to Kill

The Knife // Forest Families

Pitchfork // A Statistical Look at Their Ratings

Thursday, February 25th, 2010

About a week or so ago, there was a hearty discussion on twitter from well-known music bloggers about the controversial 7.6 rating by Pitchfork of Toro y Moi’s excellent debut LP Causers of This. Since I am guilty of being more of a mathematician than a writer, I decided that this was a great opportunity to dive right into the numbers and do a brief statistical study of Pitchfork’s rankings from a period of one complete year and see where exactly Chaz Bundick’s 7.6 grade stacked up in comparison to his peers. After sifting through the data most of yesterday afternoon, I have to say there are some pretty interesting finds (including some statistical anomalies) behind Pitchfork’s rating system for albums.

Before beginning, I feel I should make a brief mention on how the data was collected. Initially, I was going to write a script to go through Pitchfork’s Record Reviews, logging each numbered grade between February 24, 2009 and February 24, 2010. However, knowing that p4k has an affinity for rating reissues and compilations very favorably (an unbelievable 30 reissued albums scored higher than the highest rated contemporary album — chalk that up to the Beatles, Neil Young, and Radiohead re-releases), I figured the only sure fire way to get accurate data on non-reissued material was to look into each review, see if it fits my criteria for a new release, and jot down the score. A cumbersome process to say the least! There were several things I decided to omit when classifying an album as “original”: soundtracks, label compilations, live recordings, and of course reissues. This left a relatively large sample size of 1,025 records of newly released, original albums to run analysis on. Is this result error free? Of course not — no doubt I tallied a handful of albums as “original” when they weren’t and vice versa. However, with the sample size large enough and my propensity to err small, any stray mistakes can be deemed statistically insignificant. The following is a histogram plotting the number of occurrences of each rating (click for larger view):

If you are a frequent follower of p4k, then most of the plot doesn’t come as a surprise. The bulk of the histogram centers around the 6.5-8.5 range with a score of 7.0 being the most common rating (51 times). Also, because pitchfork tends to not publish reviews on horrendously bad albums, it’s a no brainer to see the plot negatively skewed significantly. Similarly, exceptionally performing albums (i.e. 8.7 and above) are also relatively rare events.

Probably one of the most interesting results of the histogram is seeing whole number ratings occurring significantly more often than its x.9 and x.1 neighbors — in fact enough to be considered a statistical anomaly. Notice how the peaks at 6.0, 7.0, and 8.0 are noticeably higher (almost twice as high in some instances) than 5.9, 6.9, and 7.9 respectively. My theory behind this is that when it comes to “on the fence” reviews, p4k tends to give the benefit of the doubt to the artist. Knowing that perceptively a rating with a unit higher whole number looks more impressive (also explains why things are priced $6.99 rather than $7.00 — we subconsciously think it is a lot less), they tend to bump up the score more often to show a more positive review. Now if it is true that individual critics are responsible for giving an album a score rather than a collective following a loose outline of established “rules”, then this result is very interesting both from a mathematical and a sociological point of view.

To see a better idea of the break-down of scores and a loose determination of percentiles, a box plot was performed (click for larger view):

This plot tells us a couple of things, most notably establishing a line between OK albums and great albums. One can see from the plot that the 1st quartile, representing the “top” 25% of rankings occurs at the 7.6 line. What this means is that our beloved Toro y Moi album would be statistically defined as on the border of the upper tier. Confirming our natural inclination that a majority of albums are rated around the “7″ mark, the box of the boxplot, representing the middle 50% of scores, occurs from 6.1 – 7.6. The final interesting part is that if an album scores below 3.9, it’s considered a statistical outlier (meaning Lil’ Wayne can breathe easy knowing his rock album just made the cut). Refining the results further into 10% percentiles, the following is established:

In my opinion, the above table gives a better way for bands to determine the meaning of their p4k rating than what the actual numerical score can provide. Take for example a hypothetical review of 7.7. Without any context, it is a rather meaningless number which invokes a wide-range of opinions (C-grade, “better than most”, underwhelming, etc…). However, when comparing it to a large sample of past albums’ ratings and seeing that it is in the 60th percentile — meaning it is better than 60% of the albums they’ve graded — then you understand the score a lot better.

The final thing I’ll mention is a couple of points when looking over their Best New Music selections and the seemingly arbitrary way they assign the label. With how much significance is attached to a BNM nod (record sales, exposure, tour upgrades), it was rather unsettling noticing some trends that seemed to pop up:

  • All albums scoring 8.6 and higher was automatically made Best New Music.
  • If you are a metal fan, you’ve gotten royally screwed over and overlooked by p4k. Only two albums were selected for BNM within the past year: Sunn O))))’s Monoliths & Dimensions and Isis’s Wavering Radiant (both with scores of 8.5). Adding insult to injury was that out of the 15 albums that scored an 8.5, 11 of them made BNM. Two of the four that didn’t make the cut were metal-related records (Baroness’s Blue Record and Converge’s Axe to Fall) — both occurring on days when no other record made BNM.
  • Another one of the four albums that ranked 8.5 and was not stamped with a BNM was contemporary jazz musician Jon Hassel’s LP verbosely entitled Last Night the Moon Came Dropping Its Clothes in the Street, supplying another example of a high performing album from a more obscure genre getting the shaft. In p4k’s defense, Yacht’s superb See Mystery Lights was BNMed that day which leads me to my next point…
  • If you release a great record, make sure you don’t get reviewed on the same day as another great record. I don’t have an individual statistic for this, but I often saw high scoring albums (8.2-8.5) not get a BNM because another even better (or same ranking, just more hyped) album was reviewed the same day.
  • If you are a hyped record or are an established act, you have a better shot of getting a Best New Music when you are on the cusp. Now this seems kind of obvious, but there were some egregious instances where this occurred. Of the 41 albums that scored an 8.1 and 8.2, five were chosen as BNM: Surfer Blood’s Astro Coast, Atlas Sound’s Logos Cass McCombs’s Catacombs, Bill Callahan’s Sometimes I Wish We Were An Eagle, and Wavves’s S/T
  • Yeah, I have no idea what they were thinking BNM-ing that Mos Def record (the lowest score and, out of 36 records that scored an 8.0, it was the only one to get BNM-ed).

This was a fun project which allowed me to brush up on some of my Matlab skillz. In the future, I would like to dive deeper and provide a more detailed analysis, but that will have to wait until I get some free time. If you have comments or would like to speculate on p4ks ratings, or if you have any insight on how they are determined (individual vs. collective), just leave a comment. If you want a copy of my data so you could run your own analysis, I would be happy to supply it to you (EDIT :: You can download the data set here).

Pitchfork’s ’09 Review // Guest List

Monday, January 18th, 2010

Now that the dust has settled on the year that may or may not be the end of a decade, no doubt a lot of best-of lists have been popping up around the web (yours truly included). Among them, shoved in the corner of Pitchfork’s ridiculous amount of year-end coverage, is a four-page article highlighting artists’ personal favorites of 2009.

Ranging from hyperactive electronic performer Dan Deacon to Canadian hard-rockers Fucked Up and pretty much everything in between, Pitchfork provides an opportunity for the music fan to see not only what their favorite group has been spinning all year but also their potential influences — something that is pretty hard to do unless you personally know the band. Now I know there are a bunch of P4K haters in the bunch, but I think we can all agree that this compilation sure beats developing carpel tunnel trying to track down interviews through repeated google inquiries.

Below are just some of the many interesting tidbits I gleaned going through the article:

  • There’s some love for little-known Detroit punk-rock pioneers Death (well, little-known before this article) as LA garage rockers No Age listed them as their favorite.
  • There was a mutual love-affair between tour mates HEALTH and Pictureplane, as they both mentioned each other as “Best of the Year”.
  • Unfortunately Anand Wilder of Yeasayer didn’t take the list seriously, opting to enumerate the Top Ten diseases of the year.
  • Best write-ups go to New York indie bands Pains of Being Pure at Heart and Cymbals Eat Guitars, Thermal‘s bassist Kathy Foster, sample-master Girl Talk, and Alan Palomo of Vega/Neon Indian fame who could probably get a job as a writer if this whole “music thing” falls through.
  • El Perro del Mar and I seem to have identical tastes (Fuck Buttons, jj, The xx, Nite Jewel, Fever Ray, etc…)
  • Paul Collins of Beirut gets the prize for the most eclectic mix with Sunn O)))) and Jewels of the 78 RPM Era 1918 to 1951 Compilation going 1-2.
  • I have a hard time believing that Langhorne Slim listens to the metal band Russian Circles, but then again I thought the same about John Darnielle. Regardless, what he says about Dawes being damn good live is 100% true.

These are just a sample of the great things mentioned in the article, so check it out!