Building Local Search Engines For Big Heterogeneous Data

1y ago

20 Views

2 Downloads

709.53 KB

17 Pages

Last View : Today

Last Download : 3m ago

Upload by : Bria Koontz

Report this link

Download PDF

Transcription

Building Local Search Engines for Big Heterogeneous Data Cody Hansen, Feifei Li

The Motivation Typical search interface: – – – – Schema-specific query forms Rigid schema and formats required for the underlying data Each form requires a corresponding program Not very user friendly Many inputs? Domain values?

The Objective The objective: a search-engine-style integration, search, ranking, and recommendation system: – must handle heterogeneous data sources – it is desired to be schemaless and formatless – easy to use and flexible search, ranking, and recommendation interface

The Challenges How to achieve both efficiency and effectiveness in scale? – the big data challenge – return useful and meaningful results, as well as effective rankings and recommendations Must handle millions of records, or even billions of them, in hundreds of gigabytes or even terabytes

The Search Module A search-engine-style approach:

Basic Idea A keyword-centric approach – Regardless of data types, each attribute is parsed into a set of keywords – Inverted lists to index these keywords (keyword to record ids), with our own storage engine – Another set of inverted lists to index q-grams to keywords (for approximate keyword matching) Edit Distance Threshold The Storage Engine: 3 binary files

System Architecture Main modules: parser, merger (to handle big data), flamingo builder, searcher

Searcher The searcher has the following main steps: – – – – Find approximate keywords Find RIDs Merge them Make Recommendations and Rankings

Merger MergeSkip algorithm designed for q-gram merging. Basic idea is keep a pointer in each list. When you fail an ID, do a binary search for the next number in each of the lists

Example of MergeSkip 1 minHeap 5 13 1 3 Jump 5 10 13 15 10 15 5 13 15 7 17 10 15 Count threshold T 4 10 10

Other Features Also support – Column specific search: column keyword, or column “keyword1 keyword2 ” – Exact search: exact keyword (search anywhere), or column keyword (search on that column) – Can combine them in anyway, e.g., cody title “stdent florida” tallahssee education state exact hansen cody, tallahssee: approximate search anywhere stdent florida: approx search on title state: exact search on education hansen: exact search anywhere

Other Issues How to achieve effective ranking and recommendation? – TF-IDF style approach – Associations – Ontology How to build the indices and storage engine extremely fast and scalable? – Use MapReduce to do this in parallel Use a cluster of commodity machines for search as well? How to handle streaming updates efficiently?

Associations Goal: Find the words that appear together at least T times. TID Keywords 1 134 2 235 3 1235 4 25

Results Craiglist data: 1.7 billion records, 300GB. LinkedIn data: 12 million records, 10GB. A few Million unique keywords A single linux machine running ubuntu 12.9 and mysql server 5.1, with 12GB ram, 2TB disk, and a single Intel CPU X3470@2.93GHz

Results (continued)

Results (continued) u: number of keywords searched k: number of recommendations made Query efficiency in second:

A live demo http://datagroup.cs.utah.edu/colu mbuscout.php

Building Local Search Engines for Big Heterogeneous Data Cody Hansen, Feifei Li . The Motivation Typical search interface: . Basic idea is keep a pointer in each list. When you fail an ID, do a binary search for the next number in each of the lists . 10 10 Example of MergeSkip 1 3 5 10 10 15 5 7 17

Related Documents:

Bruksanvisning för bilstereo Bruksanvisning for bilstereo ... - Jula

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

376 Views

1y ago

10 tips och tricks för att lyckas med ert sap-projekt

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

738 Views

2y ago

Nordens 25 största medieföretag efter omsättning

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

339 Views

1y ago

SS 02 52 68 Ljudklassning av utrymmen i byggnader - byggtjanst.se

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

358 Views

1y ago

Apple Developer Program License Agreement (Swedish)

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

345 Views

1y ago

Certification Guidance for Engines Regulated Under: 40 CFR ...

Engines regulated by 40 CFR Part 86 typically include engines used in on-highway applications such as heavy-duty gasoline fueled engines (HDGEs), heavy-duty diesel fueled engines (HDDEs), and heavy-duty engines using alternate fuels (CNG, LPG and LNG). Engines regulated by 40 CFR Part 89 include compression-ignition engines used in nonroad .

73 Views

3y ago

A Survey of Web Clustering Engines - Fondazione Ugo Bordoni

clustering engines is that they do not maintain their own index of documents; similar to meta search engines [Meng et al. 2002], they take the search results from one or more publicly accessible search engines. Even the major search engines are becoming more involved in the clustering issue. Clustering by site (a form of clustering that

15 Views

7m ago

Search Quality and Revenue Cannibalization by Competing Search Engines

though, have insisted that, since the competition is 'only a click away',2 search engines will naturally endeavour to provide the best results possible. The lack of a consensus on the incentives facing search engines creates a degree of ambiguity with respect to the appropriate regulatory stance vis-à-vis search engines' provision of .

12 Views

7m ago

Recent Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Dangerous Defendants - Yale Law Journal

Law School, Louisiana State University Paul M. Hebert Law Center, Roger Williams University School of Law, Rutgers Law School, Sandra Day O'Connor College of Law, Southern Methodist University Dedman School of Law, University of Georgia School of Law, and University of Utah S.J. Quinney College of Law. For institutional support, I am grateful .

1y ago

169 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

APPLYING TO LAW SCHOOL - University of Pennsylvania

You will apply to law school through the Law School Admission Council (LSAC). 1 6 4 5 3 2 Individual Law School Application Personal Statement Law School Resume 1-3 Letters of Recommendation Dean’s Letter/Certification LSAC Law School Report with official academic transcript(s) and LSAT score(s)

2y ago

160 Views

OF THE LAW LIBRARY - University at Buffalo Libraries

the Law School. 1910 Bang's Law Library is sold, and a fund is established to develop a Law School Library (with many notable donors); students pay an extra 10 library fee. 1936-37 Law Library adds 6,300 books, allowing the Law School to become accredited by the American Bar Association. Law School moves to the new Ellicott Square Building in

1y ago

88 Views

CRIMINAL LAW: CASES, MATERIALS, AND LAWYERING

UTK Distinguished Professor of Law, University of Tennessee College of Law; John T. Parry, professor of law, Lewis & Clark Law School; Penelope Pether, professor of law, Villanova University School of Law. --Third edition. pages cm Includes index. ISBN 978-0-7698-8270-3 1. Criminal law--Unit

2y ago

189 Views

A Trail Guide to Careers in Environmental Law

law, constitutional law, property law, bankruptcy law, criminal law, food and drug law, land use planning law, and international law. A distinctive aspect of environmental practice is the role of science in advocacy efforts.

3y ago

241 Views

Accounting Technicians Diploma (ATD) Examination Syllabus

Apply law of contract and tort in various scenarios Apply general principles of business law in practice. CONTENT 2.1 Elements of the legal system 2.1.1 Nature, purpose and classification of law - Meaning of law - Nature of law - Purpose of law - Classification of law - Law and morality 2.1.2 Sources of law - The Constitution

3y ago

216 Views

PRINCIPLES OF BUSINESS LAW - DPHU

ABE Diploma in Business Administration Study Manual PRINCIPLES OF BUSINESS LAW Contents Study Unit Title Page Syllabus i 1 Nature and Sources of Law 1 Nature of Law 3 Historical Origins 6 Sources of Law 9 The European Community and UK Law: An Overview 13 2 Common Law, Equity and Statute Law 23 Custom 25 Case Law 26 Nature of Equity 32

3y ago

285 Views

Building Local Search Engines For Big Heterogeneous Data

It looks like you're using an ad-blocker