SCOR: Source COde Retrieval With Semantics And Order

2y ago

15 Views

2 Downloads

3.45 MB

35 Pages

Last View : 7d ago

Last Download : 3m ago

Upload by : Gia Hauser

Report this link

Download PDF

Transcription

SCOR: Source COde RetrievalWith Semantics and OrderShayan AkbarAvi KakPurdue University[An MSR 2019 Presentation]1

Source Code Retrieval vs.Automatic Bug Localization Our primary interest is in addressing problems in source code retrieval. Automatic bug localization is convenient for testing such algorithms. Nonetheless it remains that not all our conclusions may apply to themore general problem of source code retrieval.MSR 20192

Contents What Have We Done --- in a Nutshell Word2vec and Contextual Semantics Markov Random Fields (MRF) for Imposing Order SCOR retrieval framework Results ConclusionMSR 20193

What Have We Done --- In a Nutshell In previous research, we have demonstrated how one can use MRF-basedmodelling to exploit term-term dependency constraints in source coderetrieval. Speaking loosely, using term-term constraints means that you not only wantto match queries with files on the basis of the frequencies of the individualterms, but also on the basis of the frequencies of pairs of terms takentogether.MSR 20194

What Have We Done --- In a Nutshell(Contd.) Now that we know how to exploit term-term ordering constraints inretrieval, is it possible to further improve a retrieval framework withsemantics --- at least with contextual semantics for now? Contextual semantics means that we consider two terms similar if theircontextual neighborhoods are similar. Word2vec has emerged as a powerful (and popular) neural-network basedalgorithm for establishing contextual similarity between the terms.MSR 20195

What Have We Done --- In a Nutshell(Contd.) Our new work combines the ordering power of MRF with the power ofcontextual semantics you get from word2vec in a single unified retrievalframework we call SCOR SCOR has outperformed the BoW and BoW MRF based retrieval engines. The performance improvements we obtain with SCOR are statisticallysignificant but we are not setting the world on fire --- AT LEAST NOT YET.MSR 20196

What Exactly Does Word2vec Do? To get a sense of what is achieved by word2vec, suppose you analyze all ofShakespeare with word2vec, it would tell you that the words “king”, “queen”,“court”, “kingdom” are semantically related. Word2vec represents each term by a numeric vector and the “semantic” similarity oftwo different words is related to the cosine distance between the two vectors. The numeric vector for a term is referred to as its “semantic embedding”.MSR 20197

The word2vec neural-network model A single-layer neural network produces a vector space thatholds contextually semantic relationships between words. Source code files are scanned using a window to generatepairs of target terms and context terms.MSR 20198

Training pairs: Target and Context termsA window around the target term “initialize” withfour context terms “this”, “method”, “model”, and “parameters”MSR 20199

Word2vec Neural NetworkOne-hot encoding oftarget term is providedat inputSoftmax probabilities ofcontext terms arecomputed at output.After training finishes, each row of the weight matrix 𝑾𝑉 𝑁represents 𝑁-dim vector 𝑣𝑡 for a word 𝑡 in the vocabulary of size 𝑉.MSR 201910

But how to set N --- the size of the middlelayer in the word2vec NN In the word2vec NN, the size of the middle layer determines the size of thenumeric vectors you use for representing the individual software terms. When N is either too small or too large, the word embeddings lose theirdiscriminative power. This question is answered by creating a semantic-similarity benchmark forexperimenting with different values for N. This is what has been done in the past for NLP and for the medical domain, buthad not yet been attempted for the world of software.MSR 201911

SoftwarePairs-400 --- A semantic similaritybenchmark for the software world We believe that our SoftwarePairs-400 is the first semantic similaritybenchmark for the software world. It contains a list of 400 software terms along with their commonly usedabbreviations in programming.MSR 201912

Sample pairs from SoftwarePairs-400MSR 2019Software oc13

C@r: The metric used for finding the bestvector size to use for word2vec Correct at Rank r : When we compare a vector for a software termagainst the vectors for all other terms in the benchmark list, how oftendoes the correct abbreviation appear among r top-most ranks?MSR 201914

C@r results for SoftwarePairs-400 for different NMSR 201915

Getting Back to Our Main Business: How did wecreate semantic embeddings for software terms? We downloaded 35000 Java repositories from GitHub. We then constructed the word vectors (the embeddings) for 0.5 millionsoftware-centric terms from 1 billion word tokens. SCOR Word Embeddings can be found online at:https://engineering.purdue.edu/RVL/SCOR WordEmbeddings/MSR 201916

Some sample results from our database ofsoftware-centric semantic embeddingsMSR 201917

Some words along with their top-3 semanticallymost similar words as discovered by word2vecMSR ornangcreateargument18

Combining MRF based modelling with thesemantic embeddings MRF gives us a way to model term-term relationships on the basis of order. Word2vec gives us a way to model term-term relationships on the basis ofcontextual semantics. Can we combine the two and create a new class of retrieval engines?MSR 201919

Markov Random Fields (MRF) MRF is an undirected graph G whose nodes satisfy the Markov property. Nodes represent variables (for file 𝑓 and bug report terms 𝑄) Arcs represent probabilistic dependencies between nodes MRF gives us the liberty to choose different kinds of probabilisticdependencies we want to encode in the retrieval model.MSR 201920

Two Dependency Assumptions of MRF1. Full Independence Assumption2. Sequential Dependence AssumptionMSR 201921

The Full Independence (FI) Assumption 𝑓 : file node 𝑞𝑖 : query term nodes No arcs between query term nodes No dependency between query terms Same as a BoW model Relevance score computed as:𝑠𝑐𝑜𝑟𝑒𝑓𝑖 𝑓, 𝑄 𝑡𝑓(𝑞𝑖 , 𝑓)MSR 2019𝑡𝑓(𝑞𝑖 , 𝑓) : frequency of query term 𝑞𝑖 in 𝑓22

The Sequential Dependence (SD) Assumption Same set of nodes Notice arcs between query term nodes Relevance score computed as:𝑠𝑐𝑜𝑟𝑒𝑠𝑑 𝑓, 𝑄 𝑡𝑓(𝑞𝑖 𝑞𝑖 1 , 𝑓)𝑡𝑓(𝑞𝑖 𝑞𝑖 1 , 𝑓) : frequency of pair of query terms 𝑞𝑖 𝑞𝑖 1 in 𝑓MSR 201923

SCOR Retrieval FrameworkMSR 201924

SCOR: MRF Semantic EmbeddingsThe SCOR retrieval engine produces two different relevance scorescomputed using word2vec based vectors:.1. Per-Word Semantic Model2. Ordered Semantic ModelMSR 201925

SCOR architectureMSR 201926

SCOR --- Computing a Composite Score for aRepository FileCombine all measures of relevancy of a file to a query to create a compositerelevance score using a weighted aggregation of the score:𝑠𝑐𝑜𝑟𝑒𝑠𝑐𝑜𝑟 𝑓, 𝑄 𝛼 . 𝑠𝑐𝑜𝑟𝑒𝑓𝑖 𝑓, 𝑄 𝛽 . 𝑠𝑐𝑜𝑟𝑒𝑠𝑑 𝑓, 𝑄 𝛾 . 𝑠𝑐𝑜𝑟𝑒𝑝𝑤𝑠𝑚 𝑓, 𝑄 𝜂 . 𝑠𝑐𝑜𝑟𝑒𝑜𝑟𝑑𝑠𝑚 (𝑓, 𝑄)MSR 201927

ResultsMSR 201928

Highlights of the results Results on two popular datasets:1. Eclipse bug report queries taken from BUGLinks dataset2. AspectJ bug report queries taken from iBUGS dataset Experiments under two settings:1. With title of bug reports only2. With title as well as description of bug reportsMSR 201929

Highlights of the results (Contd.) SCOR outperforms FI BoW models with improvements in range 7% to 45%. SCOR outperforms pure MRF with improvements in range 6% to 30%. SCOR also outperforms BugLocator, BLUiR, and SCP-QR. SCOR word embeddings are sufficiently generic to be applied to newsoftware repositories.MSR 201930

Results on 300 AspectJ “title desc” bug reportsMSR 201931

Results on 4000 Eclipse “title desc” bug reportsMSR 201932

Comparison with various BOW models(not in MSR presentation)MSR 2019 3000 queries 300 queries33

Conclusion SCOR is a novel retrieval framework that combines MRF andword2vec to model order and semantics together. SCOR gives state-of-the-art results on source code retrieval task ofbug localization. In the process of developing SCOR we also generated semanticword embeddings for 0.5 million software-centric terms from35000 Java repositories.MSR 201934

Thank you Questions?SCOR Word Embeddings are available online and for download at:https://engineering.purdue.edu/RVL/SCOR WordEmbeddings/MSR 201935

SCOR is a novel retrieval framework that combines MRF and word2vec to model order and semantics together. SCOR gives state-of-the-art results on source code retrieval task of bug localization. In the process of developing SCOR we also generated semantic word embeddings for 0.5 million software-centric terms from 35000 Java repositories.

Related Documents:

SCOR: Supply-Chain Reference Model - Tecnoali

both to further model development and to obtain the full benefits of membership. The SCOR-model is still being developed the latest version of SCOR-model is numbered 7.0. SCOR is a management tool. It is a process reference model for supply-chain management, spanning from the supplier's supplier to the customer's customer. The SCOR-model has been

16 Views

2y ago

Forcepoint DLP Predefined Policies and Classifiers

Software Source Code: Perl Source (By content) Software Source Code: Perl Source (By file extension) Software Source Code: Python (Default) Software Source Code: Python (Wide) Software Source Code: x86 Assembly Source Code SPICE Source Code Policy for detection of integrated circuits design source code in SPICE

30 Views

1y ago

Using SCOR as a Supply Chain Management Framework for ...

Jan 15, 2010 · The SCOR model is a framework for describing a supply chain with process building blocks and business activities. It also provides a set of metrics for measuring supply chain performance and best practices for continuously improving. The primary building blocks of the SCOR model are PLAN, SOURCE, MAKE, DELIVER and RETURN. They are

33 Views

2y ago

Supply chain performance evaluation and improvement ...

2 2 3 SCOR model that decompose a problem into a hierarchical structure using Analytical Hierarchy Programming AHP 23 2 2 4 Case studies using SCOR model 24 . Ref. code: 25605422300342CMU vii 2 2 5 The relationship of SCOR model to other external factors 25 2 3 The MILP model

29 Views

2y ago

TECHNICAL NEWSLETTER - SCOR

4 SCOR P&C - TECHNICAL NEWSLETTER #44 - JULY 2018 SCOR P&C - TECHNICAL NEWSLETTER #44 - JULY 2018 5 Automation levels 4 & 5 are characterized by a high degree of uncertainty in terms of both the market and techno-logical drivers; besides the unclear jurisprudence, the lack of required digital infrastructure is one of the major constraints.

73 Views

3y ago

Integrated Supply Chain Management and SCOR Model: A ...

The SCOR model is an integrated approach in supply chain management to measure the performance of organizational standards at each level of the stage into the supply chain framework through a benchmark, gap analysis and best practice approach for sustainable development [14]. The supply chain Council (2012) assert that the SCOR-model has

18 Views

2y ago

IJOPM Linking SCOR planning practices to supply chain ...

The Supply-Chain Operations Reference (SCOR) model was developed by the Supply-Chain Council (SCC) to assist Þrms in increasing the effectiveness of their supply chains, and to provide a process-based approach to SCM (Stewart, 1997). The SCOR model provides a common process oriented language for communicating among

32 Views

2y ago

Why Companies Should Have Open Business Models

the transactions are difficult to discern. This makes it difficult to determine the overall size of activity and to know what the fair price is for a particular technology. And, of course, in highly inefficient markets a good deal of potentially valuable trade in innovation does not occur. The costs are so high and the potential value so difficult to perceive that innovation often sits “on .

53 Views

3y ago

Recent Views

Grammar as a Foreign Language - List of Proceedings

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

2y ago

445 Views

Attention is All you Need - NIPS

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

1y ago

303 Views

GSA Implementation of Google (G) Suite

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

2y ago

316 Views

Google Drive (Google Docs, Google Sheets, Google Slides)

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

2y ago

388 Views

Quick Guide of Using Google Home to Control Smart Devices

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

1y ago

326 Views

Elaboração de Provas Online usando o Formulário Google Docs

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

11m ago

123 Views

ACS WASC Templates

File upload, Folder upload, Google Docs, Google Sheets, or Google Slides. You can also create Google Forms, Google Drawings, Google My Maps, etc. Share with exactly who you want — without email attachments. Search or sort your list of files, folders, and Google Docs. Preview files and Google Docs.

2y ago

366 Views

Google Drive - San Bernardino City Unified School District

Google Apps All of the Google applications that are available upon logging into Google.com (G , Gmail, Gphotos, Gdrive, etc.). Google Suite Google’s online cloud based office companion applications (Docs, Sheets, Slides). Google Drive Google’s online cloud storage and file sharing/collaboration application.

2y ago

378 Views

Single Sign On for Google Apps with NetScaler Unified Gateway

Google Apps for Work is a suite of cloud computing productivity and collaboration applications provided by Google on a subscription basis. It includes Google’s popular web applications including Gmail, Google Drive, Google Hangouts, Google Calendar and Google

2y ago

295 Views

Serviceteil

Google 84, 87, 124 Google 110 Google AdWords 101, 103 Google Alerts 127 Google Analytics 89 Google Maps 100, 110, 173 Google-Maps 63 Google Places 100, 103, 124 Graphiken 66 H Haftung 170 Haftungsausschluss 72 Hausfarbe 11 Headline 35 Heilmittelwerbegesetz 14, 69, 163 Heilversprechen 164 HONcode 78 HTML 58 HWG 31 I Imagefilm 31

2y ago

336 Views

Best practices for managing identities when you move to Google Cloud

Google Cloud. To provide t he informat ion an organizat ion would ne e d to transfer data and ownership from one Google Account to anot her for s ome of t he noncore Google s er vice s, such as Google Ads, Google Analyt ics, or DV360. Intende d audience Organizat ion administrators. Sta planning Google Cloud / Google Wor kspace migrat ion. Key .

1y ago

481 Views

MANAGERIAL FINANCE - GBV

of Managerial Finance page 2 Introduction to Managerial Finance 1 Starbucks—A Taste for Growth page 3 1.1 Finance and Business What Is Finance? 4 Major Areas and Opportunities in Finance 4 Legal Forms of Business Organization 5 Why Study Managerial Finance? Review Questions 9 1.2 The Managerial Finance Function 9 Organization of the Finance

3y ago

6.8K Views

Chapter 1 The roles of finance function in organisations

The roles of the finance function in organisations 4. The role of ethics in the role of the finance function Ethics is the system of moral principles that examines the concept of right and wrong. Ethics underpins an organisation’s sustained value creation. The roles that the finance function performs should be carried out in an .File Size: 888KBPage Count: 10Explore furtherRole of the Finance Function in the Financial Management .www.managementstudyguide.c Roles and Responsibilities of a Finance Department in a .www.pharmapproach.comRoles and Responsibilities of a Finance Department .www.smythecpa.comTop 10 – Functions of Business Finance in an om23 Functions and Duties of Accounting and Finance nded to you b

2y ago

335 Views

2013 National Senior Games presented by Humana Medal

3 martin cherie ann canada track & field 2 martin cherieann canada track & field 3 rossi elsie canada track & field 1 stuart pam canada track & field 2 stuart pam canada track & field 3 stuart pam canada track & field 1 stuart pam canada track & field 1 sleepers canada volleyball 3 volleyhawks canada volleyball 1 horiuchi kumi co archery

2y ago

176 Views

International Registered and Reporting Companies .

Dorel Industries Inc. Canada GLOBAL MKT Draxis Health Inc. Canada GLOBAL MKT Dundee Corp. Canada OTC DynaMotive Energy Systems Corp. Canada OTC Eiger Technology Inc. Canada OTC El Nino Ventures, Inc. Canada OTC Eldorado Gold Corp. Canada AMEX Elephant & Castle Group, Inc. Canada OTC Emgold Mining Corp. Canada OTC

1y ago

112 Views

SCOR: Source COde Retrieval With Semantics And Order

It looks like you're using an ad-blocker