Institutional Sector Classi Cation

2y ago

25 Views

2 Downloads

3.58 MB

38 Pages

Last View : 19d ago

Last Download : 3m ago

Upload by : Esmeralda Toy

Report this link

Download PDF

Transcription

Workshop on “Big Data & Machine Learning Applications for Central Banks”October 22nd 2019Centro Carlo Azeglio CiampiInstitutional sector classi!cationA Machine Learning ApplicationPaolo MassaroOliver GiudiceDivisione Informazioni AnagraficheDipartimento ECSDivisione Ricerca sulle Tecnologie AvanzateDipartimento ITw w w. b a n k i t . a r tThe opinions expressed and conclusions drawn are those of theauthors and do not necessarily reflect the views of the Bank of Italy.

Problem statementGivenDeterminea set of featuresof a companythe appropriate SAE codeto assign to itNumeric and non-numeric: name, number ofemployees, balance sheet data, whetherpublicly held or not, etc.SAE “SETTORE DI ATTIVITA’ ECONOMICA" is a codede!ned by Circ. 140/97 meant to cluster companies intoone of 116 "institutional sectors" (e.g., public institution,productive company, !nancial holding, etc.)Machine Learning approachWe start from existing data; a “machine learning model”” is trained fromcompanies already labeled (by hand); on the basis of this "past experience"to!it learns to predict what SAE any new company belongs toProvidedthe machine is given several (tens of thousands of)prior samples of correctly labeled companiesSAE

Why and when should ML help here?Machine Learning approachSAEWe start from existing data; a “machine learning model” is trained fromcompanies already labeled (by hand); on the basis of this "past experience"it learns to predict what SAE any new company belongs to!AS-IS: Who classi!es companies into SAEs?Type ofcompanyClassified onsISTATauthoritativeSupervisedEntitiesBank of pervisedCompanies, etc )FinancialIntermediariesmay be:incorrectinconsistentstalemissing ( 30%)fixspotupdateautofill

DataPreprocessingFeature extractionImbalanced learningClassiﬁcationResults

PreprocessingDataFeature extractionImbalanced learningClassi!cationOriginal datasetsDataset#OriginAnagrafe Soggetti 42M Bank of ItalyListed Companies 1KBank of ItalyATECO 3.6M Ag. EntrateBalance Sheet et al. 2.2M CERVEDInfo Imprese 2.2M INFOCAMEREPlatformResults

PreprocessingDataFeature extractionImbalanced learningClassi!cationResultsOriginal datasetsDataset#OriginAnagrafe Soggetti 42M Bank of ItalyListed Companies 1KBank of ItalyATECO 3.6M Ag. EntrateBalance Sheet et al. 2.2M CERVEDPlatformInfo Imprese 4.8M INFOCAMERE

PreprocessingDataFeature extractionImbalanced learningClassi!cationResultsData ingestion (ETL)Dataset#OriginPlatformTransformLoadBig Data Analytics PlatformExtractAnagrafe Soggetti 42M Bank of ItalyOpListed Companies 1K Bank of ItalyATECO 3.6M Ag. EntrateBalance Sheet et al. 2.2M CERVEDInfo Imprese 4.8M INFOCAMERE

PreprocessingDataFeature extractionImbalanced learningClassi!cationResultsData ingestion (ETL)Dataset#OriginAnagrafe Soggetti 42M Bank of ItalyListed Companies 1KExtractATECO 3.6M Ag. EntrateTransformBalance Sheet et al. 2.2M CERVEDInfo Imprese 4.8M INFOCAMERELoadBig Data Analytics PlatformPlatformOpBank of ItalyTools

PreprocessingImbalanced learningClassi!cationResultsData ingestion (ETL)PlatformTransformInputDatasetto ML machineryOperationSingle text file1.4M records, 400MBytesEach record contains info about:a. Company structureb. Balance sheetc. Other infod. SAELoadBig Data Analytics PlatformExtractDataFeature extractionOpTools

PreprocessingImbalanced learningClassi!cationResultsData ingestion (ETL)TransformInside the ML machinery, for each companyCompany structureBalance sheet15 numeric features14 numeric featuresNum. of employeesPA-owned shares LoadBig Data Analytics PlatformExtractDataFeature extractionShare capitalPersonnel costs Other infoName & notes3 structured featuresListed (y/n)ATECOComune2 textual featurescompany namebalance notesSAEPlatformOpTools

PreprocessingDataFeature extractionImbalanced learningClassiﬁcationResultsSAE: (Un)balanced dataNumber of 1.000.000companies100.000per SAEFinancialNon-ﬁnancial10.0001.000100101SAE430 288 476 280 432 268 284 273 270 258 475 287 259 263 477 285 450 283 257

PreprocessingDataFeature extractionImbalanced learningClassiﬁcationResults

DataFeature extractionPreprocessingImbalanced learningClassi!cationSpeci!c w.r.t data typeDealing with MissingStructuredData(un)structuredDataTry to fix or -checking Use “zero” or average value Regression on other variables etc.IgnoreIt is textual and can be divided:company denomination (always present)balance notes (missing in almost 50% of the dataset)Results

DataPreprocessingFeature extractionFeature extractionImbalanced learningClassi!cationTypes of featuresResults

DataPreprocessingFeature extractionFeature extractionnumeric quantitycategorial propertytextual propertyImbalanced learningClassi!cationResultsTypes of features[direct][one-hot-encoding][tf-idf]List ofcompanyfeaturesPCA

DataPreprocessingImbalanced learningFeature extractionClassiﬁcationResults

DataPreprocessingFeature extractionnumber of samplesA couple of unbalanced classesClassiﬁcationResultsImbalanced learningover-representedunder-representedclass 1class 2

DataPreprocessingFeature extractionUnder-sampling & over-samplingClassi!cationImbalanced learningResultsSMOTEnew arti!cial samples(over-sampling)number of samplesremoved samples(under-sampling)class 1class 2

DataPreprocessingClassiﬁcationFeature extractionImbalanced learningResults

DataPreprocessingFeature extractionImbalanced learningResultsClassiﬁcation

DataPreprocessingFeature extractionImbalanced learningResultsSAE 84477432263268Sector430Sub-sectorSAE

DataPreprocessingFeature extractionImbalanced learningClassi!er hierarchynumericpropertiesClassi!er i!er 5Classi!er 4non-!nClassi!er 339384Classi!er 2Classi!er 8Classi!er 75316 SAEs31ResultsClassi!cation

DataPreprocessingFeature extractionImbalanced learningClassi!er esnon-430holding!nancial""384"430non-!n"39""5516 SAEs31ResultsClassi!cation

DataPreprocessingFeature extractionImbalanced learningEnsemble sSVMcategoricpropertiesnon-!n"39"16 valuesensembleSAE16 values

DataPreprocessingFeature extractionImbalanced learningEnsemble Neural classi!erResultsClassi!cation

DataPreprocessingFeature extractionImbalanced learningResultsClassiﬁcation

DataPreprocessingFeature extractionImbalanced learningClassiﬁcationDatasets and performance1.4 million recordsResults

DataPreprocessingFeature extractionImbalanced learningClassi!cationDatasets and performanceResultsResults1.4 million recordsof SAE-labeled data430288other SAEs

DataPreprocessingFeature extractionImbalanced learningClassiﬁcationDatasets and performanceResultsResultstraining setused to automatically learn classifier parametersvalidation setused to optimise classifier hyperparameterstest setused to evaluate classifier performance

PreprocessingFeature extractionImbalanced learningClassi!cationDatasets and performanceResultsResultstraining setused to automatically learn classifier parametersvalidation setclassifie420,000 rer has ne cordversseenbeforeused to optimise classifier hyperparameterstheDatatest setused to evaluate classifier performance

DataPreprocessingFeature extractionImbalanced learningClassi!cationPerformance metricsResultsResultsRaw, direct, relatable measure1% of samples !in the test set4000maxabsolute number of errorsmin0The standard performance measureany classifiers!that gets the !large 430 “right”surpasses 99%99%min0%minaccuracyaverage F1 scoremax100%max100%Insensitive to class size, toscore high here a classifierhas to get most things right

Imbalanced learningClassi!cationResultsClassi!cation resultsResultsminmaxmax0100%100%average F1 scoreFeature extractionaccuracyPreprocessingabsolute number of errorsData400099%0%maxminmin

PreprocessingFeature extractionImbalanced learningClassi!cationResultsClassi!cation resultsdatapreprocesseddataall .586.3neural ensemble86.11528num categorical name eddatanumerical sifierabsolute number of 95dataalways 4305.2400099%0%maxminmin

ConclusionsDealing with hybriddata is complex and differentpipelines (with ensemble techniques) are neededHierarchical structures give comfortable a-prioriknowledge but are not well suited for “ambiguous” dataA scientific paper with details on all the techniques presentedis currently under review and will be published soon.

FromproblemA business necessity to improve DQM activity efficiencyA Machine Learning solution could solve the problemA research activity was carried out in order to find the best solutionA final solution is being developed as an integration in the enterprise SWTosolution

Workshop on “Big Data & Machine Learning Applications for Central Banks”October 22nd 2019Centro Carlo Azeglio CiampiThank you for your attentionAny questions?Marco Benedetti, Gennaro Catapano,Francesco De Sclavis*, Roberto Favaroni,Giuseppe Galano, Andrea Gentili, Marco *Intern at ARTThe opinions expressed and conclusions drawn are those of theauthors and do not necessarily reflect the views of the Bank of Italy.

16 SAEs 4 5 5 1 " GBoost 430 numeric properties Classi!er hierarchy categoric properties 3. Data Preprocessing Feature extraction Imbalanced learning Results Classi!cation "" "" "non-430!nancial non-!n 38 39 " holding " GBoost Ensemble classi!er 430 SVM ense

Related Documents:

CSC 411: Lecture 03: Linear Classi cation - Department of Computer ...

Multi-class classi cation: multiple possible labels . We are interested in mapping the input x 2Xto a label t 2Y In regression typically Y Now Yis categorical Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classi cation 5 / 24 . Classi cation as Regression Can we do this task using what we have learned in previous lectures? Simple hack .

19 Views

1y ago

Towards Understanding ECG Rhythm Classi cation Using ...

In this study, we seek an improved understanding of the inner workings of a convolutional neural network ECG rhythm classi er. With a move towards understanding how a neural network comes to a rhythm classi cation decision, we may be able to build interpretabil-ity tools for clinicians and improve classi cation accuracy. Recent studies have .

56 Views

3y ago

Text Classi cation from Labeled and Unlabeled Documents using EM

algorithm. Section 6 describes a systematic experimental comparison using three classi cation domains: newsgroup articles, web pages, and newswire articles. The rst two domainsare multi-classclassi cation problems where each class isrelatively frequent. The third domain is treated as binary classi cation, with the \positive"

11 Views

1y ago

Associative Alignment for Few-shot Image Classi cation

6.2% in 5-shot learning over the state of the art for object recognition, ne-grained classi cation, and cross-domain adaptation, respectively. Keywords: associative alignment, few-shot image classi cation 1 Introduction Despite recent progress, generalizing on new concepts with little supervision is still a challenge in computer vision.

50 Views

3y ago

Industrial Structure and Innovation: Notes Toward a New ...

2The industrial classi cation system used in statistics on Mexican manufacturing plants has changed over time. In this gure we use the North American Industrial Classi cation System (NAICS), the more recent classi cation, to facilitate comparison with later years. Also, in the ENESTyC s

64 Views

2y ago

CS229 Project: Classi cation of Motor Tasks Based on ...

essential tool to calibrate and train these interfaces. In this project we developed binary and multi-class classi ers, labeling a set of 10 performed motor tasks based on recorded fMRI brain signals. Our binary classi er achieved an average accuracy of 93% across all pairwise tasks and our multi-class classi er yielded an accuracy of 68%.

33 Views

2y ago

Research Article Vastus Medialis Obliquus Muscle ...

(trochlear dysplasia, patellar height, and TT-TG distance) were evaluated as previously published. Trochlear dysplasia was assessed by transverse MRI and classi ed according to the system described by Dejour et al. [ ]. To improve the reliability of the trochlear dysplasia classi cation, we integrated Dejour s -grade classi cation (Type A D) into

21 Views

3y ago

Second Grade CCSS with I can statements

Second’Grade’ ’ Strand:(ReadingInformational(Text’ Topics( Standard( “Ican ”statements( Vocabulary(Key(Ideas(and(Details ’ RI.2.1.’Ask’andanswer .

41 Views

3y ago

Recent Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Smart People Ask for (My) Advice: Seeking Advice Boosts .

advice strategically is likely to be a different experi-ence for the advice seeker than seeking advice with the intention of using it, from the advisor’s perspec-tive, strategic advice seeking may elicit the same per-ceptual effects as authentic advice seeking because the advice seeker’s intentions (and her reliance on advice)

3y ago

177 Views

Legal Action Group The Role of Advice Services in Health Outcomes

The Role of Advice Services in Health Outcomes Evidence Review and Mapping Study June 2015 The Role of Advice Services in Health Outcomes . tor.!Our! r,!

1y ago

170 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

ProQual L2 Certificate Supporting Access to Legal Advice

R/502/7657 Communicating with legal advice clients 2 3 D/503/0822 Supporting clients to make use of the legal advice service 2 3 R/502/7660 Enabling legal advice clients to access signposting and referral opportunities 2 3 Optional Units - a minimum of 6 credits Unit Reference Number Unit Title Unit Level Credit Value

1y ago

173 Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

Priority Banking Tariff - Standard Chartered

Foreign exchange rate Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free SMS Banking Daily Weekly Monthly. in USD or in other foreign currencies in VND . IDD rates min. VND 85,000 Annual Rental Fee12 Locker size Small Locker size Medium Locker size Large Rental Deposit12,13 Lock replacement

2y ago

206 Views

legal and ethical dimensions of practice - Dovetail

Material in this Guide should never be taken as providing you or any other person with legal advice. Legal advice regarding the application of the law to a particular circumstance or situation can only come from a legal practitioner. A range of sources for legal advice can be found in the Guide.

1y ago

167 Views

How Social Welfare Legal Advice and Social Prescribing can work .

The position of social welfare legal advice and its role in London's recovery The Mayor of London and partners should position social welfare legal advice as a core pillar of Londons recovery from the OVID-19 pandemic, with a core focus on ensuring adequate funding and practical support for advice agencies to ensure ongoing viability.

1y ago

172 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Regulatory Guide RG 90 Example Statement of Advice: Scaled advice for a .

representatives and advisers who give personal advice to retail clients. It explains how and why we have developed an example Statement of Advice (SOA) for scaled advice (i.e. personal advice that is limited in scope) on personal insurance for a new retail client. The example SOA was developed in consultation with stakeholders, and we

1y ago

186 Views

Removal of licence disqualification - Legal Aid WA

agencies, permission must first be obtained from Legal Aid Western Australia. This Kit provides information about the law only and does not constitute legal advice. You should seek legal advice if you have a specific legal problem. Every effort is made to ensure that the information contai

2y ago

253 Views

Legal Information vs - txcourts.gov

giving legal advice. Legal advice is a written or oral statement that: Inter p rets some as ect of th elaw, courtles, or du s; Recomme nd s a pecific c ourse of ndu ters h ld k ein an actual or ntial legal proceeding; or 'sApplies th elaw to individu alperso n seci fic actu circums a . What is Legal Information?

1y ago

174 Views

Institutional Sector Classi Cation

It looks like you're using an ad-blocker