Classification Of Artist Genre Through Supervised Learning

1y ago

13 Views

2 Downloads

2.17 MB

6 Pages

Last View : 27d ago

Last Download : 3m ago

Upload by : Ryan Jay

Report this link

Download PDF

Transcription

Classification of Artist Genre through SupervisedLearningRichard Ridley and Mitchell DumovicAbstract – The goal of this paper is to classify the genre ofan artist given a set of quantitative measures for each oftheir associated songs. We utilized supervised learning inthis task, and relied upon a dataset composed ofquantitative song data that Spotify provides for themajority of the songs that they stream. We collected datafor 14 different features for over 35 million songs, whichspanned over 100 thousand artists and over 12 hundredgenres. Before applying classification methods to thisdata, we collapsed our set of genres by a factor of ten,and collapsed our song data by applied differentstatistical measures to reduce to a set of features forevery artist. Then, we applied Stochastic GradientDescent, Gaussian SVM, and Nearest-Neighbors asclassifiers, and results were somewhat successful.INTRODUCTIONGenre classification is immensely important to thefield of music. Accurate genre classification canaid in the effectiveness of music recommendationengines, help unearth similarities betweendifferent music genres, and reduce the need for thehand labeling of genres in streaming services.This project focuses on attempting to accuratelypredict the genres that an artist belongs to giveninformation about the songs that they haveproduced. The input to our algorithm is a datasetthat we pulled from Spotify containinginformation about artists (with their associated listof classified genres) and tracks (with theirassociated song features). We then use variousclassification algorithms (KNN, SGD, SVM) tooutput predicted genre classifications for artists.RELATED WORKMost related works we found dealt with songclassification instead of artist classification, butwe still were able to get valuable ideas andinformation from various sources.For feature extraction, [2] and [4] used “SpectralCentroid, Spectral Roll-Off, Spectral Flux, TimeDomain Zero Crossings, Pitch Distribution and 13different Mel-Frequency Cepstral Coefficients ofindividual tracks” whereas [5] and [1] deal withextracted features related to timbre, melody,rhythm, and pitch. In general, most papers wefound pulled features from the raw songwaveform, so we knew that we would have toextract features derived from these waveforms aswell. However, we found that some of theseapproaches like [4], [5], and [6] suffered fromrelatively small datasets or limited numbers ofgenre labelings.In terms of models, the model that performed bestin [4] was k-nearest neighbors applied to themulti-label data. [5] suggests using support vectormachines as well as attempting one versus one andone versus all classification algorithms for themulti-label data. [6] suggests the use of anensemble technique for multi-label classification,another classification algorithm which involvestraining multiple classifiers for the data andcombining the results of the multiple classifiersinto one single classification. In general, it becamequickly apparent from these related works that wewould need to use one of the multi-label methodsfor classifying our data, and would really benefitfrom attempting algorithms involving the trainingof multiple classifiers, as these seemed to performbest over the similar papers that we looked at.DATASETTo collect our data, we relied upon a publicallyavailable Spotify API that allowed us to captureinformation about specific songs, artists, and theiralbums. To collect data about the songs of a widerange of different artists, we first needed toassemble a set of artists for which we couldcollected song data for. To do so, we first handpicked a “root” set of 50 different artists that webelieved to be representatives of a wide variety of

different genre clusters. After doing so, we used afunctionality that would allow us to find all of theartists related to a specific artist. Through the useof recursive formulation, we assembled a list ofover 100,000 artists that would form the basis forour training data. Each of these artists had labeledgenres, which indicated to which genres they werea part of. Then, for each of these artists, wecollected a set of all of their associated songs. Foreach of these songs, we were then able to collect avariety of different quantitative measures overdifferent qualitative aspects of our songs. Thefeatures that we collected are below.This succeeded in reducing the dimensionality ofour data set, and also reduced our training set toabout100,000examples,consistingofagglomerated song data for every artist that wehad collected. However, this did not come withouta cost, as the use of these summary statisticsreduced the precision of the data that we hadcollected by a significant degree. Nonetheless,artists of different genres generally differedsignificantly in their values for these statistics, asthe below example illustrates.AVAILABLE SONG criptionA confidence measure of how acousticthe track is.A metric assessing the degree to whichthe track is danceable.Represents a perceptual measure of thesongs intensity and activity.A confidence measure dictatingwhether the track contains vocals.An integer indicating the key the trackis in.A measure that details the likelihoodthat the track was performed live.The overall loudness of the track asmeasure in decibels.The confidence in the appearance ofspoken words in a track.The overall estimated tempo of a trackin beats per minute.FEATURESBecause songs do not have a genre labeling, weneeded to build a feature set for every artist thataccurately represented the set of songs associatedwith them. And since artists could have a highlyvariable number of songs, we needed to develop astrategy that would allow us to have a staticnumber of features defined over a variable numberof input songs. To do so, we calculated differentstatistical measures over each of our individualsong features for each of our artists. We decided touse mean, median, variance, and skew.GENRE COLLAPSINGIn our original dataset, Spotify classified songswith 1241 unique genres. As some of ouralgorithms involve training classifiers for eachgenre labeling, it was necessary to try to reducethe number of genre labelings as much as possible.To do this, we relied heavily on association rulealgorithms in order to find labelings that appearedfrequently together. These algorithms workedroughly as follows:First, define support for some set of genres G 𝑠𝑢𝑝𝑝(𝐺) - as the number of times that an artist isclassified with all the genres in the set G. Ouralgorithm started by removing genre labelings forsingle genre set below a certain support thresholdover the entire dataset. Next, define confidenceand lift for some pair of genres G1 and G2 asfollows:𝑠𝑢𝑝𝑝 𝐺! 𝐺!𝑐𝑜𝑛𝑓 𝐺! 𝐺! 𝑠𝑢𝑝𝑝 𝐺!𝑠𝑢𝑝𝑝 𝐺! 𝐺!𝑙𝑖𝑓𝑡 𝐺! 𝐺! 𝑠𝑢𝑝𝑝 𝐺! 𝑠𝑢𝑝𝑝(𝐺! )A confidence value of 1 indicates thatevery time that G1 appears it appears with G2,

meaning that it is a good candidate for genrecollapsing. For example, in our analysis we foundthat the genre “college a capella” appearedtogether with the genre “a capella” the exact samenumber of times the genre “college a capella”appeared by itself. This indicates a confidencevalue of 1, and resulted in us collapsing “college acapella” together with “a capella.”In general, our algorithm worked by iterativelycollapsing the two genres with the highestconfidence and lift values in our dataset,terminating when the confidence and lift valuesfell below a certain threshold. Finally, we scannedover the entire dataset once more and removed anymore collapsed genres that had a support valuelower than a second, higher threshold. Thisresulted in our final collapsed list of 95 uniquegenres. This also had the added effect of reducingthe number of labelings an artist was classified ason average, further simplifying our dataset. Beforethis genre collapsing, each artist was classifiedwith an average of approximately 5.27 differentgenre labelings. After the collapse, this numberwas reduced to an average of approximately 1.88different labelings.machines in this multi-label classificationproblem, we decided to try using both “one versusone” and “one versus all” approaches.A “one versus all” approach in the context ofmulti-label classification problem involves thetraining of a binary classifier for each of thelabels. This classifier discriminates between itsassociated label and all of the other labels in thetraining set. In our case, when using this approach,we trained 95 different classifiers which would tellus whether an artist either belonged to or did notbelong to a specific genre, and labeled that artistwith that genre if the classifier outputted that theartist does belong to that genre.We also employed a “one versus one”classification scheme when using SVM. “Oneversus one” in the context of multi-labelclassification involves the training of a classifierfor every pair of labels, and learns to distinguishwhich of the two labels an example is more likelyto belong to. After training these classifiers, anexample is assigned a label according to the classthat got the highest number of positive predictions.Thus, in our case, we trained a classifier for everypair of musical genres, and when evaluating whatgenre that an artist belonged to, we returned thegenre that was predicted most often in the 95 pairsin which it appeared. So as to ensure that artistscould attain multiple genre labelings, we assignedadditional genre labelings if the number of timesthat these individual genres were predicted wasclose to the number of times the mostly commonlypredicted genre was predicted.Both “one versus one” and “one versus all” can beconsidered valid, but it is generally the case that“one versus one” multi-label classification takessignificantly more processing time and requiresmore data than does “one versus all” approaches.CLASSIFICATION MODELSTo classify the genres for different artists, we usedthe supervised learning approaches of StochasticGradient Descent, K-nearest neighbors, andSupport Vector Machines. In configuringStochastic Gradient Descent and Support VectorMulti-label K-Nearest Neighbors ModelWhen attempting to classify a testing example x,let A be a sorted list containing the Euclideandistances of the training examples to the test point.Let Ai, y denote the label of the training example

that is the i-th closest in terms of Euclideandistance to the test point. The single label k-NNalgorithm uses the following to output select theoutput label:!𝑡𝑒𝑠𝑡𝐿𝑎𝑏𝑒𝑙 𝑎𝑟𝑔! max (1[𝐴 !,! 𝑘])!!!In the multi-label algorithm, the neighbors are firstfound, then a maximum a posteriori (MAP)principle is utilized to determine the test label[7].Since we operated in a space with so manypossible labelings, we set k to be equal to 10.SVM ModelThe SVM model is a model that seeks to minimizean objective function by creating a hyperplane thatis used to separate two classes of data inputs andmaximize the separation between both sets ofpoints and the hyperplane. We utilized a softmargin with a hinge loss function, such that if adata point is classified on the wrong side of thehyperplane, its Euclidean distance from thehyperplane is added to the objective function.!min (max 0, 1 𝑦! 𝑤 𝑥! 𝑏 𝜆 𝑤!!!!The parameter λ determines the tradeoff betweenincreasing the margin-size and ensuring that datapoints lie on the correct side of the margin.A point x is classified as having label -1 if𝑤 𝑥 𝑏 1and having label 1 if𝑤 𝑥 𝑏 1where b denotes the size of the margin chosen toseparate the two classes of data points from theseparating hyperplane.Stochastic Gradient DescentStochastic gradient descent is a machine learningapproach that can be used to quickly learn aclassifier that can distinguish between two classesof data points. It works by updating itsclassification model, which is represented by aweight vector, over every data point that itobserves, and suspending its iteration after aconvergence to a local minimum, or after somenumber of passes through its provided trainingdata. The update rule is below.𝑤 𝑤 η α ! R(w) ! 𝐿(𝑤𝑥 ! , 𝑦 ! )The function R is a function chosen is called theregularization term, and acts to minimize the sizeof the weight vector w. We used a reward functionequal to the L2 norm of the weight vector.The function L is called the loss function, andprovides a measure of the difference between anexamples classification and its actual class. Weused a logistic regression lost function.𝐿!"# 𝑤𝑥, 𝑦 log (1 exp 𝑦 𝑤 ! 𝑥 )The learning rate, η, scales how quickly theclassifier “learns”, or is updated. We used anannealing learned rate, meaning that that thelearning rate decreased with the number ofiterations, thus ensuring that our classifierconverged quickly.RESULTSIn the end we trained using a training setconsisting of 21,838 artists and a test set of 2,426artists. Our results from our different approachesin the confusion matrix form are below. Since,artists can have multiple genres, the term 𝑦!"# denotes whether a specific genre was predicted fora specific artist. Similarly 𝑦!"# denotes thepresence of specific genre within an artist’sclassifications.Test Error𝑦!"# 1𝑦!"# 1.023𝑦!"# 0.977𝑦!"# 0.032.968K-nearest NeighborsTrain ErrorTest Error𝑦!"# 1𝑦!"# 1.097𝑦!"# 0.903𝑦!"# 1.067𝑦!"# 0.933𝑦!"# 0.020.980.031.969SGDTrain ErrorTest Error𝑦!"# 1𝑦!"# 10.162𝑦!"# 00.838𝑦!"# 1.115𝑦!"# 0.885𝑦!"# 00.0260.974.028.972SVM

As expected our SVM approach performed thebest. However, in general, we were disappointedwith our results. Our hit-rate was much smallerthan we originally expected. However, this was inlarge part due to the high dimensionality of ourdata: with 95 possible genre labelings, a hit-rate of11.5% is far better than randomly choosinggenres. Additionally, we found that we had abetter hit rate on some genres than others: ouralgorithm in general had much more hits on nichegenres like classical, metal, and deep electronicmusic than more generic genres like rock. Beloware confusion matrices for the collapsed genresroughly corresponding to the classical and heavymetal music genres:Test Error𝑦!"# 1𝑦!"# 1.512𝑦!"# 0.488𝑦!"# 0.017.983Classical𝑦!"# 1𝑦!"# 1.452𝑦!"# 0.548𝑦!"# 0.021.979As it was clear that our feature set was effective indifferentiating clearly unique genres, we decidedto train a classifier on a reduced training setcontaining only artists having the genres ofclassical, heavy metal, deep electronic, or country.We constructed a similar testing set, and ended upwith a training set of 1686 examples, and a testingset of 484 examples. We utilized a one versus oneSVM with the same hyper parameters as before.Its confusion matrix is below.Test Error𝑦!"# 1𝑦!"# 1.678𝑦!"# 0.322𝑦!"# 0.125.875Reduced Data SetCONCLUSIONOur best performing algorithm was SVM using aone-versus-one scheme, followed by SGD using aone-versus-all scheme, followed by multi-label knearest neighbors. In general, however, we were abit disappointed with the hit rate of our results.The two main reasons we believe that our resultsover our entire training set were poorer thananticipated are due to the high dimensionality ofour data and our base features used. The need touse an algorithm to collapse genres togetherdistorted our original data, and the 95 genres wewere left with were still far too many to train aneffective classifier. Additionally, while the Spotifysong features may be useful for recommendationpurposes, it is likely that by aggregating themusing different statistical measures we lost a lot ofthe information that makes genres unique.NEXT STEPSTest ErrorHeavy Metaldisplay that the features we collected could berelevant in helping to classify the genres of artists.As expected, it performed very well relative to theclassifier trained and tested on our whole data set.Such a result was heartening, and seemed toIf we were to work improve our results, we woulddefinitely start by improving the quality of ourtraining set. This would involve reinventing ourfeature selection and extraction algorithms as wellas our genre collapsing algorithms so as to bothreduce the dimensionality of our problem andprovide the best possible features for prediction.Additionally, we may also look at alternativemethods at classification, and perhaps first buildclassifiers for the genre of individual songs, whichcould be used to classify artists. Such a schemewould allow us to operate directly on all of thedata that we collected, rather than operating onsummary statistics and measures of our data.REFERENCES[1] N. Scaringella, G. Zoia and D. Mlynek, "Automaticgenre classification of music content: a survey," in IEEESignal Processing Magazine, vol. 23, no. 2, pp. 133-141,March 2006.[2] McKinney, Martin F., and Jeroen Breebaart. "Featuresfor Audio and Music Classification." Features for Audio andMusic Classification. Johns Hopkins University, n.d. Web.15 Dec. 2016.

[3] Prasanna, K., and M. Seetha. "ASSOCIATION RULEMINING ALGORITHMS FOR HIGH DIMENSIONALDATA – A REVIEW." International Journal of Advances inEngineering & Technology (2012): n. pag. Web.[4] Silva, Vitor Da, and Ana T. Winck. "Multi-LabelClassification of Music into Genres." Applied DataMining (2013): 181-203. Web.[5] Wang, Shu. "Musical Genre Categorization UsingSupport Vector Machines". N.p., 2016. Web. 12 Dec. 2016.[6] Sanden, Chris, and John Z. Zhang. "Enhancing Multilabel Music Genre Classification through EnsembleTechniques." Proceedings of the 34th International[7] Zhang, M.L.; Zhou, Z.H. (2007). "ML-KNN: A lazylearning approach to multi-label learning". PatternRecognition. 40 (7): 2038–2048.

multi-label data. [5] suggests using support vector machines as well as attempting one versus one and one versus all classification algorithms for the multi-label data. [6] suggests the use of an ensemble technique for multi-label classification, another classification algorithm which involves training multiple classifiers for the data and

Related Documents:

3d artist magazine pdf - ripedesign.com

3d artist magazine free. 3d artist magazine subscription. 3d artist magazine back issues. 3d artist magazine uk. 3d artist magazine tutorial. 3d artist magazine france. What happened to 3d artist magazine. 3d artist magazine website. Show season is upon us and the animation festivals, expos and conferences are underway. Now is a great time to .

35 Views

1y ago

Artist Files Revealed

Jul 13, 2005 · Artist Files Revealed . prints, private view cards, resumes, scripts, serials, slides, and transcripts. Artist Files Revealed: Documentation and Access 4 . Artist Files Revealed: Documentation and Access 5 . Subject of Artist Files. The subject of artist files re

92 Views

2y ago

Aa Artist - lmi.mt.gov

Aa Artist Aa Aa Aa Aa Aa Artist Artist Artist Artist Write down another job that starts with the letter “A”

35 Views

2y ago

Artist Control User Guide - Avid Technology

Artist Control, Artist Mix, Artist Transport, and Artist Color comprise the Artist Series family of media controllers. They use a 100 Mb/s Ethernet EUCON connection to control audio and video applications running on a workstation computer. EUCON is a high-speed communication protocol an

36 Views

2y ago

2018 Drama National 5 Finalised Marking Instructions

the genre of their drama, with a relevant reason. The candidate: Has identified the genre of the drama, with a relevant reason. If more than one genre is 2 marks Has identified the genre of the drama. 1 mark 2 Possible genres may include - comedy, tragedy, crime, dance drama, documentary drama, historical, kitchen sink drama,

113 Views

3y ago

Odyssey Charter School Mandatory Summer Work Packet

4. James and the Giant Peach by Roald Dahl (Lexile: 870L) Genre: Fantasy Book Choices for Entering 5th Graders: 1. Number the Stars by Lois Lowry (Lexile: 670L) Genre: Historical Fiction 2. Sing Down the Moon by Scott O'Dell (Lexile: 820L) Genre: Historical Fiction 3. When the Mountain Meets the Moon by Grace Lin (Lexile: 820L) Genre: Fiction 4.

45 Views

2y ago

CLASSIFICATION OF CRUDE DRUGS

classification has its own merits and demerits, but for the purpose of study the drugs are classified in the following different ways: Alphabetical classification Morphological classification Taxonomical classification Pharmacological classification Chemical classification

31 Views

2y ago

ABRASIVE JET MACHINING FOR EDGE GENERATION

Abrasive jet machining (AJM), also called abrasive micro blasting, is a manufacturing process that utilizes a high-pressure air stream carrying small particles to impinge the workpiece surface for material removal and shape generation. The removal occurs due to the erosive action of the particles striking the workpiece surface. AJM has limited material removal capability and is typically used .

52 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

Introduction - Google Earth User Guide

Google Earth Community: Learn from other Google Earth users by asking questions and sharing answers on the Google Earth Community forums. Using Google Earth: This blog describes how you can use some of the interesting features of Google Earth. Selecting a Server Note: This section is relevant to Google Earth Pro and EC users.

3y ago

288 Views

Using Google Forms to Manage Officials Signups

Google Sheets, deleting a response from the form or sheet will not affect the other. Once the Google Form is linked to a Google Sheet, clicking on the spreadsheet icon will open the linked Google Sheet. Google Responses Sheet Google automatically creates and populates the sp

2y ago

276 Views

Classification Of Artist Genre Through Supervised Learning

It looks like you're using an ad-blocker