Do LLMs have good music taste?

08-28

TL;DR: No.

I made frontier models rank Kanye West albums. I asked questions that compared the first ten solo albums, pairwise, which totals to 45 questions. Here is the prompt format I used for each album pair:

> Pick your favorite Kanye West Album between "X" and "Y". You have to pick one. Respond with just their name.

I turned these binary preferences into per-album scores by fitting a Bradley-Terry model. As for the reference ranking, I went with Rate Your Music, the canonical website for music ratings. Here's what the models think:

Model rankings of Kanye West albums
The lack of love for Jesus is King is saddening

I used the Kendall tau distance metric to compare how similar each model's ranking was to the reference. Opus 4.1 comes out on top:

Kendall tau distance bars by model
Surprisingly on-point eval for taste, I think!

This was a fun experiment, though I realize that this method can be generalized to much more than 10 different albums and far beyond music taste. If you want to give me some API credits or compute to do experiments on a grander scale, reach out via email!