Kpmurphy@google From Big Data Google Research To

2y ago
6 Views
2 Downloads
6.17 MB
43 Pages
Last View : 12d ago
Last Download : 3m ago
Upload by : Callan Shouse
Transcription

From Big Datato Big KnowledgeKevin MurphyGoogle Researchkpmurphy@google.comJoint work with Luna Dong, Evgeniy Gabrilovich,Geremy Heitz, Wilko Horn, Panos Ipeirotis, Ni Lao, WeiLwun Lu, Thomas Strohmann, Shaohua Sun, Chun HowTan, Robert West, Wei Zhang, and othersCIKM industry talk, San Francisco, CA, October 31, 2013

Big Data is everywhere

From Big Data to Big KnowledgeWe are drowning in information and starving for knowledge.--- John Naisbitt. What does all this data “mean”? Words are ambiguous. e.g., “Taj Mahal” We need to move from “strings” to “things”.

Google’s Knowledge Graph 500M nodes (entities)3.5B edges (facts)1500 node types35k edge typesExtension of Freebase.comSource: Brian Karlak, Google Faculty Summit, China, Dec 2012

Knowledge PanelsSource: -knowledge-graph-things-not.html

Freebase is created by merging many data sourcesKGMassive entity linkage problem!Source: John Giannandrea, CIKM 2011 industry talk

A fragment of Freebase (in RDF format)/comim moage n/topicSource: Brian Karlak, Google Faculty Summit, China, Dec 2012/

The long tail of knowledge Freebase is large, but still veryincomplete:Relation% unknownin Freebase Profession68%Place of 94%We need automatic knowledgebase construction methods cf AKBC workshop at CIKM.Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 41/

Outline From strings to things Reading the web Asking the web Asking people Open issuesKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Machine reading There are many academic groups (e.g., CMU, UW,MPI) that have developed methods to extract factsfrom large text corpora. At Google, we have developed a similar system,except it is 10x bigger. In addition, we use “prior knowledge” to help reducethe error rate.Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Fact extraction from text Template matching methodsPatrick Newport ,who has been working at IHS Global Insight, noted.PER/m/101/people/person/employmentORG/m/102 Machine learning (binary classifiers trained on text /parse tree features)Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Wrapper inductionKevinMurphy,“AutomaticallyCIKM industry talk, SanFrancisco,CA, rsforweb sources”, Raposo et al 2007

Fact extraction from tablesNeed to create hidden column containing CVT or blank node, to represent the 3-tupleKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Webmaster annotationExample taken from http://en.wikipedia.org/wiki/Microdata (HTML)Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Predicting facts given prior knowledge Perform association rule mining* on Freebase graph, to findnoisy rules (features passed to a learned classifier).BarackObamamarried-toofnt-repaMichelle ObamaofSashaObamatrenpa* “Random Walk Inference and Learning in A Large Scale Knowledge Base”, Ni Lao et al, 2011Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

A “neural” prior model Train a deep neural network* to predict the probabilityof arbitrary facts, cf. tensor factorization.P(subject, predicate,object)hidden cateobject* Similar to “Learning Structured Embeddings of Knowledge Bases”, Bordes et al, 2011Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

A “neural” prior model - Halloween version Train a deep neural network* to predict the probabilityof arbitrary facts, cf. tensor factorization.P(subject, predicate,object)hidden cateobjectKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Knowledge Vault* fuses all these signals together Data from web Unstructured textSemi-structuredDOM treesStructuredWebTables “Prior” data fromFB* Details in a paper submitted to WWW’14 (Dong et al) S,P,O .99 S,P,O , .96 S,P,O .76

Benefits of information fusionKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Benefits of prior knowledge2x as many highconfidence factsAKBC workshop at CIKM 2013, San Francisco, CA, October 27, 2013

Example: Barry Richter, studied at, UW-Madison “In the fall of 1989, Richter accepted a scholarship to theUniversity of Wisconsin, where he played for four years andearned numerous individual accolades .”“The Polar Caps' cause has been helped by the impact ofknowledgeable coaches such as Andringa, Byce and former UWteammates Chris Tancill and Barry Richter.” Fused extraction confidence: 0.14Prior knowledge: Barry Richter, born in, Madison Barry Richter, lived in, Madison Final belief (fused with prior): 0.61AKBC workshop at CIKM 2013, San Francisco, CA, October 27, 2013

Outline From strings to things Reading the web Asking the web Asking people Open issuesKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Knowledge based completion using Question Answering* Even after large-scale machine reading of the web,many facts are still unknown. We can use web-based question-answering to performtargeted completion of missing attributes (pull vs pushmodel). Main issue: what questions should we ask?* Details in a paper submitted to WWW’14 (West et al)Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

The importance of asking the right questionKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

The importance of asking the right questionKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Learning which questions to askColor mean reciprocalrank of true answerGOODKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013BAD

How many questions should we ask?Performance increases,then plateausKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Asking too many questions can hurt performancePerformance gets worseKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Why does performance differ?Open classKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013Closed class

Precision-recall curves About 25% of the high confidence facts werenot discovered by the “read the web”approach. Accuracy is higher for closed-classpredicates.Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Outline From strings to things Reading the web Asking the web Asking people Open issuesKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Freebase is community generated/ editedKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Knowledge panel feedbackKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Knowledge panel feedbackKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Knowing who to trust*Use a binary classifier, trained on featuresderived from user contribution history, topredict the probability the contribution iscorrect.*Details in a paper submitted to WSDM’14 (Tan et al)Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Asking the right people*Place an ad asking users to take a quiz.Use ad optimization system to figure outwhich kinds of users to show the ad to.*Details in a paper submitted to WWW’14 (Ipeirotis and Gabrilovich)Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Outline From strings to things Reading the web Asking the web Asking people Open issuesKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

New entities/m/?“The Polar Caps' cause has been helped by the impact ofknowledgeable coaches such as Andringa, Byce and former UWteammates Chris Tancill and Barry Richter.”/m/02ql38b40M entities in Freebase, still missing many!AKBC workshop at CIKM 2013, San Francisco, CA, October 27, 2013

New relations/people/person/education . /education/educational instituteIn the fall of 1989, Richter accepted a scholarship to theUniversity of Wisconsin, where he played for four years andearned numerous individual accolades .”/people/person/?35k types of relations in Freebase, still missing many!AKBC workshop at CIKM 2013, San Francisco, CA, October 27, 2013

Implicitly stated informationJoanne Schieble was just twenty-three and attending graduate school inWisconsin when she learned she was pregnant. Her father didn't approveof her relationship with a Syrian-born graduate student, and social customsin the 1950s frowned on a woman having a child outside of marriage. Toavoid the glare, Schieble moved to San Francisco and was taken in by adoctor who took care of unwed mothers and helped arrange adoptions.Originally, a lawyer and his wife agreed to adopt the new baby. But whenthe child was born on February 24, 1955, they changed their minds. Claraand Paul Jobs, a modest San Francisco couple with some high schooleducation, had been waiting for a baby. When the call came in the middleof the night, they jumped at the chance to adopt the newborn, and theynamed him Steven Paul. Joanne Schieble, /people/person/parents, Steve Jobs Steve Jobs, /people/person/date-of-birth, 2/24/55 Steve Jobs: The Man Who Thought Different”Source:“KevinMurphy, CIKMindustry talk, San Francisco, CA, October 31, 2013

Assessing trustworthiness of sourcesKevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Fictional contexts /en/abraham lincoln,/people/person/profession,/en/vampire hunter ?Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Summary1.Knowledge Vault is the largest repository of automatically extractedstructured knowledge on the planet.2.We can extract more information by asking the rightquestions from the web and/or people.3.We are only extracting a small fraction of the facts on the web.Kevin Murphy, CIKM industry talk, San Francisco, CA, October 31, 2013

Kevin Murphy Google Research kpmurphy@google.com Joint work with Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Panos Ipeirotis, Ni Lao, Wei-Lwun Lu, Thomas Strohmann, Shaohua Sun, Chun How Tan, Ro

Related Documents:

Grammar as a Foreign Language Oriol Vinyals Google vinyals@google.com Lukasz Kaiser Google lukaszkaiser@google.com Terry Koo Google terrykoo@google.com Slav Petrov Google slav@google.com Ilya Sutskever Google ilyasu@google.com Geoffrey Hinton Google geoffhinton@google.com Abstract Synta

Google Brain avaswani@google.com Noam Shazeer Google Brain noam@google.com Niki Parmar Google Research nikip@google.com Jakob Uszkoreit Google Research usz@google.com Llion Jones Google Research llion@google.com Aidan N. Gomezy University of Toronto aidan@cs.toronto.edu Łukasz Kaiser Google Brain lukaszkaiser@google.com Illia Polosukhinz illia .

Google Meet Classic Hangouts Google Chat Google Calendar Google Drive and Shared Drive Google Docs Google Sheets Google Slides Google Forms Google Sites Google Keep Apps Script D

Google Drive (Google Docs, Google Sheets, Google Slides) Employees are automatically issued a Kyrene Google account. Navigate to drive.google.com. Use Kyrene email address and network password to login. Launch in Chrome browser for best experience. Google Drive is a cloud storage sys

Big Metadata: When Metadata is Big Data Pavan Edara Google LLC epavan@google.com Mosha Pasumansky Google LLC moshap@google.com ABSTRACT The rapid emergence of cloud data warehouses like Google Big-Query has redefined the landscape of data analytics. With the growth of data volumes, such systems need to scale to hundreds of EiB of data in the .

Configuration needs Google Home app. Search "Google Home" in App Store or Google Play to install the app. 3.1 Set up Google Home with Google Home app You can skip this part if your Google Home is already set up. 1. Make sure your Google Home is energized. 2. Open the Google Home app by tapping the app icon on your mobile device. 3.

2 Após o login acesse o Google Drive ou o Google Docs e selecione a ferramenta Google Forms (Formulários). Clique na caixa de Ferramentas do Google, localizada no canto direito superior da tela e selecione o Google Drive. Na tela do Google Drive clique em New , opção More e selecione Google Forms. OBS: É possível acessar o google

Body Anatomy Semester 1 / Autumn 10 Credits Each Course is composed of Modules & Activities. Modules: Cardio-thoracic IMSc MIAA Musculo-skeletal IMSc Abdominal IMSc MIAA Each Module is composed of Lectures, Reading Lists, MCQ self-assessments, & Discussion Boards. These Modules are taught on the following Programmes, or are incorporated into blended Courses which teach students enrolled .