Mathematical representation of my taste in music through album scores

The dataset in this analysis contains information on 97 albums that I have listened to in their entirety and for which I have consequently formed a strong opinion. Since I read a lot of album reviews, I was interested in understanding whether my album scores were correlated to album scores of several music publications — AllMusic, musicOMH, Pitchfork, The Guardian — and one aggregator website, Metacritic. I was also interested in finding out whether my album score could be predicted using information on album length, year of release, type of artist, and music genre.

Obviously, a significant drawback of this dataset stems from the fact that my system of evaluating album scores might be vastly different from that of music publications. I purposefully kept my method of scoring consistent across all albums using the following formula:

output_0

There is no guarantee, of course, that music publications used the same method. And, it’s naturally possible that, within the same music publication, scoring methodologies were inconsistent.

I first explored the data by looking at basic descriptive and inferential statistics, and by visualizing the frequency distribution of scores across different categories and publishers.

Descriptive and inferential statistics applied to my taste in music

I started by exploring the relative frequency of albums per year of release, type of artist, and genre. Since certain albums belonged to either isolated or cross-sectional genres (such as hip-hop for Lauryn Hill’s The Miseducation of Lauryn Hill and classical + pop for Benjamin Clementine’s At Least for Now), I unfortunately had to include those albums under one of the major genres. The major category was picked based on my subjective evaluation of the “closest” genre of music. Those were:

Pop: pop and pop-based dance music.
Electronic: ambiental, and any electronic dance music such as house, techno, and EDM.
R&B: R&B, Soul, Hip-Hop
Rock
Experimental: cross- and multi-categorical genres that are not easily defined

output_3_1

As seen above, 35% of albums in my playlist have been released in years 2009, 2011, and 2013. Interestingly, though not surprisingly given my pop- and dance music-oriented ear, almost 80% of albums in this dataset have been released by solo female artists. More than 70% of albums belong to pop, electronic, and experimental genres.

I then plotted the distribution of albums per length and album score so that I could perform some basic inferential statistics.

output_16_0

These numbers indicate we can be 95% confident that the true population mean of all albums, current and future ones, will fall within the following ranges:

(1) For length, the true population mean will be between 46.4 and 50.4 minutes.
(2) For score, the true population mean will be between 66.0 and 73.4.

Broken down in less technical lingo, if we assume my taste in music doesn’t notably change, albums that secure a permanent spot on my playlist will be, on average, of typical LP length (40 – 50 minutes) and, on average, will not be outstanding albums given the score bracket of 65 – 75.

Although this makes sense, I still found the latter conclusion surprising. It signals that many of the albums that I regularly listen to contain songs that have not grown on me. This means, furthermore, that I consider a notable chunk of my favorite albums to be “very good” or only “good” according to this scoring methodology.

Correlation between my album scores and those of influential music publications

Before I looked at the correlation between my scores and those of other music publications, it was important to notice that musicOMH, The Guardian, and AllMusic assign scores on a scale from 0 to 5 in increments of 0.5, which translates to multiples of 10 on a 0 to 100 scale. Meanwhile, Pitchfork assigns scores on a scale from 0 to 10 in increments of 0.1, while Metacritic assigns scores on a scale from 0 to 100 in increments of 1.

These discrepancies can be clearly seen in the distribution histogram plots below.

output_5_1

Looking at these plots, we can see that other publications’ scores tend to skew higher compared to mine. The frequency distribution of my scores, however, also has a long left tail. In general, one can notice that the albums in the dataset have good (60+) scores — an expected finding if we assume that:

Albums that I listen repeatedly tend to be those that I like and therefore have higher scores, and that
Music journalists will, on average, agree on the overall quality of these albums, therefore also assigning higher scores.

This was further corroborated by the mean scores and sample standard deviations, which show that my distribution of album scores has the lowest mean and the largest standard deviation. MusicOMH’s distribution, on the other hand, has the highest mean, while Metacritic has the lowest sample standard deviation. The latter, in particular, was expected since Metacritic is an aggregator website.

As a side note, for simplicity and relevance, I did not calculate confidence intervals for the album scores of music publications.

means

allmusic_score       81.0
musicOMH_score       81.6
pitchfork_score      78.4
theguardian_score    77.3
metacritic_score     79.1
dtype: float64

stdevs

allmusic_score        8.5
musicOMH_score       10.7
pitchfork_score       9.4
theguardian_score    12.9
metacritic_score      6.5
dtype: float64

Going back to the comparison of scoring methodologies, I first rounded my scores to the nearest multiple of 10 when comparing data to musicOMH, The Guardian, and AllMusic. So, for example, Grimes’ Art Angels album, which had a score of 86, got rounded to 90. The data remained untransformed for comparisons against Pitchfork and Metacritic, since their scores had the same scale as mine.

output_11_1

# Correlation coefficients

coefficients = [correlation_OMH,
correlation_pitchfork,
correlation_theguardian,
correlation_metacritic,
correlation_allmusic]

[0.21, -0.09, 0.19, 0.14, 0.26]

I expected to see a somewhat stronger linear correlation between these scores, but overall, my album scores were weakly correlated with those of other music publications. Highest coefficients corresponded to correlations with AllMusic, musicOMH, and TheGuardian, for which my scores had to be transformed to nearest multiple of 10. Had the data not been transformed, these coefficients would have been slightly lower.

Generally, I was most surprised by the near-zero correlation coefficient with Pitchfork scores, indicating no linear relationship between our album scores. I considered transforming some of these variables (for example, into a logarithmic scale), but I didn’t see any solid theoretical ground to assume a non-linear relationship between these independent variables.

Can regression models be used to predict my preference for music albums?

# Transforming the target variable with a custom function

transformed_data = turn_to_dummy(data_filtered, columns, 70)

# Transforming the target variable with a custom function

transformed_data = turn_to_dummy(data_filtered, columns, 70)

              precision    recall  f1-score   support

           0       0.27      0.38      0.32         8
           1       0.71      0.60      0.65        20

   micro avg       0.54      0.54      0.54        28
   macro avg       0.49      0.49      0.48        28
weighted avg       0.58      0.54      0.55        28

Linear Regression Model

As with every linear regression model, I wanted to check for the following criteria before fitting the data:

(1) Linearity
(2) No endogeneity of regressors
(3) Normality and homoescadasticity
(4) No autocorrelation
(5) No multicollinearity

* Linearity

Since genre and type of artist were categorical data that would be assigned dummy variables, I wanted to check only linearity for year of release and length of album.

output_31_1

As can be seen, for these two variables, linearity is not observed, and the relationship is rather random. Even after transforming the independent and dependent variables to log and square root distributions, the visual lack of linear relationship persisted. This obviously served as a detriment for the model, but I still wanted to see whether the other variables, type of artists and genre, could be used to have statistical significance in explaining some variability in the score distribution.

With only categorical variables left in the explanation of variability in album scores, I assumed no endogeneity of regressors ad normality and homoescadasticity of error term. Since this is not time-series data, I also assumed no autocorrelation. However, I wanted to check whether there was any significant correlation between each of the genre variables and type of artist.

output_33_1

We can notice from this heatmap that genres and type of artist are weakly correlated, with values less than absolute 0.5, which means that it’s valid to include them in the model. Running ANOVA on this model, we get:

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	67.1489	7.366	9.116	0.000	52.518	81.780
female	9.0825	4.683	1.939	0.056	-0.220	18.385
pop	-5.3310	6.795	-0.785	0.435	-18.829	8.167
electr	-4.7769	7.309	-0.654	0.515	-19.295	9.741
experim	-5.0291	7.264	-0.692	0.491	-19.459	9.401
rock	-2.4935	9.223	-0.270	0.788	-20.814	15.827

What were some findings from this — admittedly simplified — analysis?

(1) That my methodology of scoring albums leads to generally lower scores compared to eminent music publications,
(2) That my scores are weakly correlated with scores of those music publications, and that
(3) The linear and logistic regression models cannot yet be used to explain variability in my album scores.

Looking onward, as part of my next project, the analysis can be potentially improved by expanding the dataset and including additional variables, of which the most insightful would be the quality of lyrical content, the quality of production, and the quality of musical and vocal arrangements. These variables could also be scored on a scale from 1 to 100. The best improvement, however, will be made after many more years of listening to music.

Note: This page contains only the write-up and the images from the analysis. I recommend visiting my github to access the dataset in .csv format, and the full analysis — including the Python code — in .ipynb format.

Quantifying my taste: analysis of music album scores

Mathematical representation of my taste in music through album scores

Descriptive and inferential statistics applied to my taste in music

Correlation between my album scores and those of influential music publications

Can regression models be used to predict my preference for music albums?

Linear Regression Model

Logistic Regression Model

Final thoughts on the predictiveness of my music taste

Leave a ReplyCancel reply