Music Mood Classification - Stanford University

2y ago

14 Views

2 Downloads

363.86 KB

5 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Lilly Kaiser

Report this link

Download PDF

Transcription

Music Mood ClassificationCS 229 Project ReportJose PadialAshish GoelIntroductionThe aim of the project was to develop a music mood classifier. There are many categories of mood into whichsongs may be classified, e.g. happy, sad, angry, brooding, calm, uplifting, etc. People listen to different kinds ofmusic depending on their mood. The development of a framework for estimation of musical mood, robust tothe tremendous variability of musical content across genres, artists, world regions and time periods, is aninteresting and challenging problem with wide applications in the music industry.In order to keep the problem simple, we considered two song moods: Happy and Sad.DatabaseAs with any learning project, the size and quality of the data set is key to success. We initially underestimatedthe difficulty in acquiring a music database labeled by mood. Building the labeled Happy/Sad databaseproved to be a challenging journey for a number of reasons, not the least of which being the difficulty inmaking the subjective decision to label songs as strictly ‘Happy’ or ‘Sad’.We began by analyzing songs from our personal music collection and soon realized the need for a larger andmore comprehensive database. After spending some time searching for a suitable database, we found theMillion Song Database (MSD), a freely-available collection of audio features and metadata for a millioncontemporary popular music tracks. The MSD was compiled by labROSA at Columbia University with thehelp of analysis done using Echo Nest API (an open source platform for analysis of audio files). Each trackdata file contains a wealth of tempo, mode (minor/major), key and local harmony information. This is theinformation we planned to extract ourselves via time-domain and spectral methods, and were thus veryexcited to find it in this database.The entire database of a million songs is 300GB in size. Downloading and unpacking the database alone tookseveral days, and crawling through the database within the timeframe of this project turned out to be aninfeasible task. Hence, we largely operated with a subset of the database containing 10,000 songs.The most challenging task was generating accurate Happy/Sad labels for the songs contained in this database.Tags from the website last.fm were available for the songs contained in the MSD. Out of the 1 million MSDsongs, nearly 12,000 had a ‘Happy’ tag, and over 10,000 a ‘Sad’ tag. However, upon inspection of these songs,we discovered that the majority of Happy/Sad tags were incorrect.Ultimately we hand-labeled songs from the 10,000 subset to generate our training set. The final data setcomprised 137 sad songs and 86 happy songs. The drop from 10,000 to 223 is a result of most songs beingunfamiliar to us, and many of those we knew are not clearly ‘Happy’ or ‘Sad’.Hold-out cross validation was used for testing the performance of our learning algorithm. 70% of the finaldata set was used for training and 30% of it was used for testing purposes.Feature SelectionThe following were considered as candidate features for the classification process

Tempo: the speed or pace of the piece, measured in beatsbeats-per-minuteminute (BPM). This is a time domainfeature which captures the rhythm of the song.Energy: obtained by integrating over the Power Spectral Density (PSD).Mode: indicates if a piece is played iin major or minor key.Key: identifies which of the 12 keys the song has been played in (Fig. 1).Harmony: relative weighting between notes, characterized as chords or modes.Figure 1: 12-note musical scale.HarmonyWhile feature elements such as Tempo and Energy were easy to obtain and use, a lot of time and effort wasspent on sensibly extracting the harmony information from the data. The MSD provided us with the PSD of0.3 seconds long segments of the song arranged in 12 bins cocorrespondingrresponding to the frequencies of the 12different notes.s. Hence a song of duration 300 seconds was divided into 1000 segments,, yielding a pitchmatrix of size 12x1000 for each songsong.This local harmony information could be processed and used in several ways. If we had a large enoughtraining set (approximately 10 times the size of the feature vector), we could have simply passed the huge1000x12 matrix into the classifier. However, since the data set was limited, we had to intelligently capture theharmony information in a small-sizedsized feature vector. The need for doing this will be more evident from thelearning curve analysis (Fig. 4) which sshows that we were suffering from the problem of high variance.variance Themotivation for the approach we adopted came from the concept of modern musical modes as shown below inFig. 2.Figure 2: Musical modes,, each corresponding to a 7-note subset of the total 12 musical notes.noteWe hypothesized that extractingxtracting the above modes from the harmony information would contribute to themood detection significantly. Severalal attempts were made to associate the song with one of the 7 musicalmodes. We switched to the time domain and tried working over segments of different lengths but couldn’tsucceed in assigning a mode to a majority of the songs in our database. Eventually, wwee picked the 7 mostimportant notes for each of the 0.3 seseconds long segments, averaged over the entire song and subtracted thekey from each of the notes to obtain a 77-dimensional feature vector for each of the songs. Although theremight be better ways of capturing the harmony information, the ususee of these 7 dominant notes as elements ofour feature vector did significantlytly aid the classification task.

Model Selection and Supervised Learning ResultsAt different stages of the project when different features were being tested, the mutual information metricwas used to evaluate their usefulness. The KL-distance was used for computing the mutual information. Whilecomputing the KL distance is straightforward for the case of discrete feature vectors, the continuous featurevectors were dealt with by binning them and then using the discrete approach. The following figure (Fig. 3)lists the mutual information for each of the feature vectors considered.Figure 3 Mutual Information for different feature vectorsHaving obtained a rough idea about the usefulness of the various features at hand, the forward search processwas used to find the optimum set of features for classification through supervised learning using a SoftMargin SVM. The following table (Table 1) shows the progress at some of the steps in the forward searchprocess. From the table, though it may seem that the feature vectors beyond energy and tempo didn’t addmuch to the classification process, one must remember that marginal improvement of performance getssuccessively harder.Table 1: Soft Margin SVM performance for some of the candidate feature sets and SVM kernels.Depending on the set of feature vectors used, either linear or a Radial Basis Function (RBF) kernel seemed togive the best performance. In the case of simple features such as energy and tempo, where the relationshipwith mood is quite straightforward, a linear kernel performed best. The addition of harmony informationintroduced much more complexity to the feature space, and subsequently the RBF kernel gave the bestresults.It was crucial for us to use the soft margin SVM because the training set was labeled manually. Since theperception of mood varies from person to person, there was a strong likelihood of some of the examplesbeing labeled incorrectly. We varied the ‘C’ parameter to minimize the generalization error. In fact, the SVMmodule of Matlab that was used for classification scales the ‘C’ parameter for different training examples toaccount for the difference in the number of training examples for each of the classes.

AnalysisHaving finalized the composition of our feature vector, choice of SVM Kernel etc., we performed k-fold crossvalidation in order to arrive at better estimates of the generalization error. We decided not to use k-fold crossvalidation for model selection since that would be computationally expensive and cumbersome. We alsovaried the size of the training set and averaged over the results of the iterations to obtain the followinglearning curve (Fig. 4).Figure 4 Learning Curve obtained through k-fold cross validationThe curve suggests that we are suffering from high variance. While we felt that with 157 training examplesand a 10-dimensional feature vector we would be okay, it turns out that we are indeed over-fitting.Unsupervised LearningIn order to gain more insight into our problem, we attempted unsupervised learning. If unsupervisedlearning worked well in clustering the dataset into Happy/Sad songs based on harmony alone, it wouldsuggest that what we subjectively consider as being ‘Happy’ or ‘Sad’, correlates well with our harmonyfeature vector.K-means clustering was run on the dataset with two clusters, harmony being the only feature vector. Basedon the fact that the RBF gave best results for the features with harmony data, we hypothesized that K-meanswould not be able to do a great job clustering along the lines of happy and sad songs. However, we wanted totest it and see how well it could do.As expected, if we assign labels to the clusters, the classification thus obtained was poor with an accuracy of52.47%. In order to gain some visual understanding of why the clustering might be so difficult, we plotted therank-2 approximation of the harmony feature data.2-D Visualization of Harmony-only Feature SpaceFor visualization purposes, and as a sanity check on the data, we projected all of our 7-D harmony featurevectors into 2-D space. To project our higher dimensional data into 2-D, we computed the SVD (SingularValue Decomposition) of the Nx7 data matrix for each feature vector. We then selected the two eigenvectorsof ATA corresponding to the largest singular values of our data matrix, taken from the first two columns of theright singular matrix. We then projected each song’s 7-D harmony feature vector onto the first and second

principal directions to obtain the coordinates of the feature vector in the 2-D space. Fig. 5 provides a goodvisualization for the high inseparability of the data, albeit visualized in 2-D. This helps to explain why Kmeans would do so poorly in separating the data. Further, it helps to verify why the RBF kernel worked bestwhen harmony data was included in the feature vector, i.e. the RBF was able to carve out a complex decisionsurface for the best separation of the data.Figure 5: 2-D Low-Rank Approximation of 7-D Harmony Feature Data. Red points correspond to songslabeled 'Happy'. Blue points correspond to songs labeled 'Sad'.ConclusionThe performance and capability of our algorithm can be significantly improved if we have access to a largerdataset because a larger dataset would allow us greater freedom in playing around with different ways ofcapturing the harmony information. Considering the subjective nature of mood classification, we believe that70% success is a good result. The success of our algorithm is comparable to the results obtained by differentresearch groups around the world. Papers in literature quote anywhere from 65% to 75% as the level ofsuccess achieved by their algorithms[1][2], though it should be noted that the classification results listed inthe literature typically involve multi-class classification as opposed to our binary classification task.References[1] Cyril Laurier and Perfecto Herrera [2007], Audio Music Mood Classification Using Support Vector Machine.In Proceedings of the International Conference on Music Information Retrieval, Vienna, Austria.[2] Lu, Liu and Zhang (2006), Automatic Mood Detection and Tracking of Music Audio Signals. IEEETransactions on Audio, Speech, and Language Processing, Vol. 14, # 1, January 2006AcknowledgementsWe thank Prof. Andrew Ng, Andrew Maas and other members of the teaching staff for guiding us through theproject. We also thank Abhishek Goel for helping us classify the list of 10,000 songs in our database. Finally,we thank Mayank Sanganeria for his valuable suggestions and help regarding feature selection.

information we planned to extract ourselves via time-domain and spectral methods, and were thus very excited to find it in this database. . Ultimately we hand-labeled songs from the 10,000 subset to generate our training set. The final data set comprised 137 sad songs and 86 happy songs. The drop from 10,000 to 223 is a result of most songs being

Related Documents:

SEISMIC: A Self-Exciting Point Process Model for ...

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity Qingyuan Zhao Stanford University qyzhao@stanford.edu Murat A. Erdogdu Stanford University erdogdu@stanford.edu Hera Y. He Stanford University yhe1@stanford.edu Anand Rajaraman Stanford University anand@cs.stanford.edu Jure Leskovec Stanford University jure@cs.stanford .

72 Views

3y ago

Predicting COVID-19 in Chest X-Ray Images - Stanford University

Computer Science Stanford University ymaniyar@stanford.edu Madhu Karra Computer Science Stanford University mkarra@stanford.edu Arvind Subramanian Computer Science Stanford University arvindvs@stanford.edu 1 Problem Description Most existing COVID-19 tests use nasal swabs and a polymerase chain reaction to detect the virus in a sample. We aim to

32 Views

1y ago

Domain Adversarial Training for QA Systems

Domain Adversarial Training for QA Systems Stanford CS224N Default Project Mentor: Gita Krishna Danny Schwartz Brynne Hurst Grace Wang Stanford University Stanford University Stanford University deschwa2@stanford.edu brynnemh@stanford.edu gracenol@stanford.edu Abstract In this project, we exa

58 Views

2y ago

Deep Learning for Aspect-Based Sentiment Analysis - Stanford University

Stanford University Stanford, CA 94305 bowang@stanford.edu Min Liu Department of Statistics Stanford University Stanford, CA 94305 liumin@stanford.edu Abstract Sentiment analysis is an important task in natural language understanding and has a wide range of real-world applications. The typical sentiment analysis focus on

20 Views

1y ago

Mood and Theme in Poetry

Mood and Theme in Poetry Poetry is about THINKING and FEELING! Therefore, when we study poetry, we are on a quest to THINK about the THEME and FEEL the MOOD! In poetry, the mood, or atmosphere, is the feeling that a poem creates in a reader.For example, a poem’s mood

139 Views

2y ago

FFL Teacher Manual - Memoria Press

MOOD:There are three moods in Latin. (Some grammars count the infinitive as a mood.) The indicative mood is used for statements and questions. Ex: I have Latin homework. The imperative mood is used for commands. Ex: Do your homework. The subjunctive mood is used for subordinate clauses, imaginary statements, exhortation, contrary to fact,

15 Views

2y ago

DIACHRONIC ANALYSIS AND DEICTIC MEANS OF FRENCH ...

mood. It is known that the French language inherited from Latin an indicative, subjunctive and imperative mood. The indicative mood expresses reality, the imperative expresses a request or order, while the subjunctive mood, corresponding to the Greek subjunctive and desirable mood at the same time, expresses conceived, opportunity and desire.

17 Views

2y ago

A Mood-Based Music Classification and Exploration …

4.4 Audio and lyric mood classification of songs from Radiohead's Kid A album . full potential despite the popularity of mood and emotion as a means of describing a song or musical context [7]. Thus, the goal

11 Views

2y ago

Recent Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Community Fundraising Kit - Marrickville Legal Centre

Is a CLC the same as Legal Aid? Community legal centres are not the same as Legal Aid. Legal Aid NSW is a government body that provides legal services to people who experience significant disadvantage across NSW. Legal Aid provides assistance for criminal, family and civil law plus domestic and family violence.

6m ago

70 Views

Dafne-EFC 2020 Legal Environment for Philanthropy in .

Dafne-EFC Philanthropy Advocacy: 2020 Legal Environment for Philanthropy in Europe, Switzerland 3 I.Legal framework for foundations 1. Does the jurisdiction recognise a basic legal definition of a foundation? (please describe) What different legal types of foundations exist (autonomous organisations with legal

3y ago

215 Views

Legal Studies - Washington University in St. Louis

Legal Studies (02/09/21) Legal Studies The Legal Studies minor is an interdisciplinary program that allows students to study the role of law and legal institutions in society. Students who minor in Legal Studies learn about law in courses from anthropology, economics, history, philosophy, political science and other disciplines.

3y ago

183 Views

CLASS K - LAW

K85-89 Legal research K94 Legal composition and draftsmanship K100-103 Legal education K109-110 Law societies. International bar associations K115-130 The legal profession K133 Legal aid. Legal assistance to the poor K140-165 History of law K170 Biography K

2y ago

172 Views

Contract Management in Corporate Legal Departments .

May 25, 2016 · Relationship Between Legal, Finance, & the Business Create/Negotiate Activate Perform Analyze Renew Business Business Legal Legal Finance Finance Business Legal Finance Business . - Collaboration Legal Portal - Standard Operating Procedures - KPIs Dashboards - Reports. Technology Enabled Contract Management Best Practices 1. Initiate/

2y ago

361 Views

Persuasive Legal Writing

the court just focuses on the facts of the crime and hardly addresses any legal issue. The way to convince a court that a legal issue is worth reversing on requires that we have more than a legal basis to appeal - it requires us to put the legal issue in the context of a persuasive storyline. Sometimes the storyline will be about the legal issue.

1y ago

129 Views

Legal AI - Thomson Reuters

of the legal AI market, for example in relation to contract generation and completion. In short, legal AI has a potential use wherever there are people who must deal with legal documents or address legal queries, especially where those legal needs are expressed through text, which AI experts refer to as 'unstructured data'.

1y ago

123 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

Smart legal contracts Advice to Government

The forms a smart legal contract can take 22 Use cases for smart legal contracts 30 Costs and benefits of smart legal contracts 35. CHAPTER 3: FORMATION OF SMART LEGAL CONTRACTS 39. The law on contract formation 39 Agreement 39 Consideration 49 Certainty and completeness 50 Intention to create legal relations 54 Formality requirements 57

1y ago

162 Views

CSR FREQUENTLY ASKED QUESTIONS - Legal Services Corporation

Because of this lack of legal analysis applying the law to the client's unique circumstances, these letters do not meet the definition of legal assistance (legal advice is a subset of legal assistance) set forth in Section 2.2 of the 2008 CSR Handbook which reads: For CSR purposes, legal assistance is defined as the provision of limited service

1y ago

140 Views

Music Mood Classification - Stanford University

It looks like you're using an ad-blocker