Computing And Universal Stochastic Inference

2y ago
28 Views
2 Downloads
3.15 MB
28 Pages
Last View : Today
Last Download : 3m ago
Upload by : Nora Drum
Transcription

Beyond calculation:Probabilistic Computing Machinesand Universal Stochastic InferenceVikash K. MansinghkaDecember 17, 2011NIPS Workshop on Philosophy and Machine LearningMonday, January 9, 121

AcknowledgementsEric JonasKeith BonawitzMax GasnerBeau CroninMonday, January 9, 12Cap PetschulatCameron FreerJosh TenenbaumDan RoyTom KnightGerry SussmanNoahGoodmanTomasoPoggio2

Computers were built for calculation anddeductioncompute, v: to determine by mathematical means; to calculateMonday, January 9, 123

Probabilistic inference seems central to intelligence, but alsocumbersome, intractable, so we simplify and approximateGenerativeProcessSimulation is easyDataInference is hardP(model data) ExponentialdomainMonday, January 9, 12P(model) P(data model)P(data)Exponentialnormalizer4

The mind and brain accomplish far more than our smartest computersystems, and they do it with far less. We need greater expressivenessand tractability, for making both inferences and decisions.30 watts,100Hz,sees, hears,navigates,negotiatesrelationships,led team thatbuilt WatsonGenetic & physical constraintsSensedataCognition, perception, actionMonday, January 9, 1280 kilowatts,3.55 GHz,worldJeopardy!champion,via statisticalcalculationFunction f(), as programthat calculatesUniversalTuringMachinexOutput f(x)5

Outline Motivation: the capability and efficiency gaps with biology Phenomenon: examples of probabilistic programming systems Philosophy: a new mathematical model of computation Potential: computing machines for which induction, abduction are naturalMonday, January 9, 126

CAPTCHAs are easy to make, hard to breakGenerating CAPTCHAs: easy{N,A,V,I,A}Breaking CAPTCHAs: hardGoogle CAPTCHAMonday, January 9, 12OCR (CVPR 2010)7

Breaking simple CAPTCHAs by running a randomizedCAPTCHA generator backwardsInputCaptchaGuessedExplanationMonday, January 9, 128

Breaking simple CAPTCHAs by running a randomizedCAPTCHA generator backwardsGenerator program that outputs random CAPTCHAsObserved CAPTCHAProbabilisticProgrammingSystemHow it ran: glyphs {N,A,V,I,A}, .(different sample each time)Monday, January 9, 129

Breaking simple CAPTCHAs by running a randomizedCAPTCHA generator backwards1. Randomly choose some glyphs (with font and location)GlyphFontXY uniform(A,Z) uniform(0,2) uniform(0,W) uniform(0,H)A19819J214010Q1437S0983J180153. Add some noise(spatial noise pixelwise errors)Inference2. Render to an image4. Observe that the output image we’reinterpretingMonday, January 9, 1210

Breaking simple CAPTCHAs by running a randomizedCAPTCHA generator backwards(define(define(define(definew 200)h 70)rate 0.5)maxglyphs 12)(define blank (image/make w h))1. Randomly choose some glyphs(define maybe sample glyph(lambda ()(if (bernoulli rate)(image/glyph/sample w h)#f)))(define glyphs(vector/remove(vector/drawmaybe sample glyphmaxglyphs)#f))Monday, January 9, 122. Render to an image(define rendered(image/scene/draw blank glyphs))3. Add some noise(define blur radius(continuous-uniform:double-drift 0 10))(define blurred(image/blur rendered blur radius))(define constant0(discrete-uniform 1 127))(define constant255(discrete-uniform 128 254))(define blurred with noise(image/interpolateblurred constant0 constant255))4. Observe that the output matches the target image(define observed (image/load "image.png"))(observe(image/stochastic binarizeblurred with noise)observed)11

Breaking simple CAPTCHAs by running a randomizedCAPTCHA generator backwardsGenerator program that outputs random CAPTCHAsRandom walk(MCMC)over executionhistories;landscapedefinedlocally byP(history,data)Observed CAPTCHAProbabilisticProgrammingSystemHow it ran: glyphs {N,A,V,I,A}, .(different sample each time)Converges well due to inclusion of (overlooked) randomnessFast iterations due to conditional independence (asymptotics, locality,parallelism) and software systems engineering (small state, fast updates)Monday, January 9, 1212

Computer vision as “inverse Pixar”, usingstochastic MATLABPosterior geometry, rendered(known lighting,unknown mesh; weak smoothness prior)TargetImage(Wingate et al, 2011)Monday, January 9, 1213

Applications of Probabilistic ProgrammingSystems, including Church1. Nuclear safety via CTBT monitoring(Arora, Russell, Sudderth et al, 2009-2011)2. Tracking multiple targets from video, radar(Arora et al 2010; Oh et al 2009)3. Automatic control system synthesis(Wingate et al 2011)4. Information extraction from unstructured text5. Automatic statistical analyst6. Clinical reasoning and pharmacokinetics(McCallum et al 2009)(Mansinghka et al 2009; 2011 in prep)(Mansinghka et al in prep)7. Human learning as probabilistic program induction(Stuhlmuller et al 2011,Tenenbaum; Kemp; Griffiths; Goodman)Monday, January 9, 1214

Probabilistic computing: Computation asstochastic inference, not deterministic calculationSpace of possibilities P(H)Function f()(as a probabilistic program that guessespossible explanations and actions)(as a program that t f(x)Monday, January 9, 12Input xUniversalStochasticInferenceMachineData D(as ed probable explanation P(H D) or satisficing decision(key idea: different each time)15

Probabilistic computing: Computation asstochastic inference, not deterministic calculationTuring embedding:H (x, f(x))D: Hx xUniversality: P(H), Dcontain arbitrary stochasticinferenceProb. program induction:H prob. program text D: spec checker Meta-reasoning:H model of agent D: agent’s behavior Monday, January 9, 12Space of possibilities P(H)(as a probabilistic program that guessespossible explanations and actions)UniversalStochasticInferenceMachineData D(as ed probable explanation P(H D) or satisficing decision(key idea: different each time)16

Probabilistic computing: Computation asstochastic inference, not deterministic calculationAI systems,models of cognition,perception and eMachinesParallel StochasticFinite State MachinesProbabilistic CommodityHardwareHardwareMansinghka 2009Monday, January 9, 12Space of possibilities P(H)(as a probabilistic program that guessespossible explanations and actions)UniversalStochasticInferenceMachineData D(as ed probable explanation P(H D) or satisficing decision(key idea: different each time)17

Snapshot of the field: new languages,systems, architectures, theory10 researchprototypelanguages,2 universalChurch: 5 implementations.BLOGNew Probabilistic Architecturesfor Universal InferenceWingate, Stuhlmuller, Goodman 2011Arora, Russell et al 2010Goodman, Mansinghka et al 2008Monday, January 9, 12FigaroNew Theorems in ProbabilisticComputability and ComplexityAckerman, Freer, Roy 2011 (& in prep)Freer, Mansinghka, Roy 2010Haeupler, Saha, Srinivasan 2010, Propp & Wilson 199618

These machines make sampling more naturalthan optimization and integrationClaim: sampling is easier than both optimization and integrationExplaining away becomes natural (and a question of convergence), butcalculating low probabilities exactly may be nearly impossibleMonday, January 9, 1219

These machines make sampling more naturalthan optimization and integrationDominates the optimum; has negligible massClaim: sampling is easier than both optimization and integrationExplaining away becomes natural (and a question of convergence), butcalculating low probabilities exactly may be nearly impossibleMonday, January 9, 1220

These machines make sampling more naturalthan optimization and integrationClaim: sampling is easier than both optimization and integrationExplaining away becomes natural (and a question of convergence), butcalculating low probabilities exactly may be nearly impossibleMonday, January 9, 1221

These machines make sampling more naturalthan optimization and integrationMany expectations (e.g. test functions for rareevents) are hard to estimateClaim: sampling is easier than both optimization and integrationExplaining away becomes natural (and a question of convergence), butcalculating low probabilities exactly may be nearly impossibleMonday, January 9, 1222

What is the computational complexity ofstochastic inference? (not P, NP, #P, BPP, .)Bayes reduction targets are usually easy:SAT, k-color, .see e.g. phase transitions (Selman et al),semi-random sources (Vazirani et al),smoothed analysis (Spielman et al)Usually hard, basis of crypto(via generation of instancesthat are hard in practice):factoring, graph isoMonday, January 9, 1223

What is the computational complexity ofstochastic inference? (not P, NP, #P, BPP, .)define x1 (flip)define x2 (flip)define x3 (flip).observe (or (not x1) x2 x3)observe (or x2 (not x4) x5).define x (multivariate-normal 0vec (* eps1 I))define y (multivariate-normal (* A x) (* eps2 I))observe ( (norm (- y b))) eps3Conditional Distributionto be (KL-approx) Simulated(0-entropy anduniform SPACEMIP {EXP}EXPSPACEIP {EXP} 20 lines code for Latent Dirichlet Allocation observe (get-word “doc1” 0) “hello”observe (get-word “doc1” 1) “there”observe (get-word “doc2” 0) “church”.SEHEXP {NP}PEXPNEXP {NP}NEEAM {EXP}NEXP/polyMA {EXP}EXP/polyNEXPBPEEBPEXPEXPEXPHEE EXPQRGESPACERGAlmost-PSPACEQPSPACEPSPACECohPL {infty}CHMP {#P}AvgEP {PP}PP/polyEHP {#P[1]}BP.PPMPPHSigma PHeurBPPAM[polylog]BPP {NP}PZKAmpP-BQPBPQPYPRBQPZBQPQNCZQPQCFLAL L/poly SAC 1NL/poly LL/polyAC 1CSLBPLDCFLLogFewFewULFOLLULR HLQNC 1LNC 1QACC 0PT 1PL 1TALLYTC 0QAC 0MAC 0AC 0/polySAC 0AC 0 SAC 0NC 0PBPREGACC 0AC 0[2]QNC f 0QNC 0UAPUPpolyLSCRLTC 0/polySPARSEFewPMod 3PNT*NTGCSLCFLLFewFewLQPLIN PSPPEPNLINSPACENLINLINPL1NAuxPDA pSPLbetaPbeta 2PFewNC 2L {DET}SAC 1C LNLEQPHalfPPNCMod 5PRP {PromiseUP}P {FewP}QPQRNCModPLWPPUSSUBEXPRPZPPP-CloseBHBH 2EcompNPCheckC PWPPUEZPERQPSF 2APPAWPPP {NP[log]}RPETreeBQPSF 3A 0PPBPP {path}P {NP[log 2]}BPEBPPNPPPP {QMA}Delta 2PMA EWAPPFHAvgPQS 2PS 2PSBPMAP/logfrIPZPP ma 2PRP {NP}NISZK hNESF 4AmpMPDelta P {ne}BQP/qlogP/polyIC[log,poly]QMIP {le}MIP*(k 5)-PBPMAJORITY4-PBP3-PBP2-PBPPARITYNONEMonday, January 9, 1224

arbitrary Church value (e.g., a number, list, string, data structure, procedure, etc.)The Complexity of Exact Stochastic Inferenceby RejectionIn Church, applying the query procedure to exp and pred (e.g., by evaluating the expression (query exp pred )) results in a sample from the conditional distribution p(x y 1) of X given than Y 1.One way this can be used to encode Bayesian inference is by letting X be a tuple (H, D) of some hypothesisand some data sampled from the prior, and letting the predicate check that D is equal to the actual data thatwas observed.Proposition. Let N be the number of attempts before a rejection sampler for (query exp pred )succeeds when using samples from the distribution induced by exp and pred . Assume the applicationof pred to any input consumes no randomess, i.e., pred is deterministic.Then N is geometrically distributed with mean exp DKL ((query exp pred ) (eval exp )) .Pr accept 0.302H X S s s1020304050s125102050sFigure 1: The following example illustrates that rejection sampling doesn’t always scale poorly with the dimension of the problem. Let {Xi }i 1 be independent Bernoulli( 12 )-distributed random variables. For each n, define S n ni 1 Xi andconsider the prior distribution Pn on X1:n (X1 , . . . , Xn ) and, for each s {0, 1, . . . , n}, the posterior distribution Pn s onX1:n given S n s. (left) The probability that a sample from Pn satisfies S n s (and is thus a sample from the posteriorPn s ) is simply the marginal probability that P(S n s), which follows a Binomial(n, 12 ) distribution. In this figure wehave plotted the probability mass function for the Binomial(n, 12 ) distribution for the cases n 10, 20, 30, 40, 50. ForSemantic--- KL(posterior, prior) --- not syntactic (treewidth, dimensionality, .)each size n and observation s, the plot indicates the probability that a sample from Pn is accepted as a sample from Pn s .nSameas linein PAC-Bayes:if peakeasyeasy towherelearniff)probability of, tracing theThe blueKLdashedpasses through theof toeachsample,probabilitythenmass functions (almost2 2s 2s tailschains)acceptinga samplefrom Pn for PRoy.Thisfunctionisgivenbytheexpression2(s ) 1/2Markov, and evidentlydecaysn 2sat a polynomial rate. The same rate of decay is achieved asymptotically for the point n2 c for a constant c. (center)Monday, January 9, 1225

Probabilistic Programming and Machine LearningBig data and ML emphasis is on scaling flat statistical modelsto GB/TB. These models require 1-20 PLOC (probabilistic lines of code)Bigger payoff: somewhat stochastic systems that marry structure andabstraction with statistics: 100-1K probabilistic LOC and GB/TBAllocating scarceresources withlearned models,real constraintsMonday, January 9, 12Machine perceptionand control: vision asinv. graphics; controlas inv. dynamicsNLP/MT/IX/.:jointly model lexicon,syntax, alignment,entities, coref, topicsDatabase Fusion &Cleanup: treat messy,incomplete rowsas evidence, not realityModeling andSimulation: inferaccurate parameters,realistic structure26

Engineering stochastic machines“With our artificial automata we are moving much more in the dark than nature appears tobe with its organisms. We are, and apparently, at least at present, have to be much more‘scared’ by the occurrence of an isolated error and by the malfunction which must bebehind it. Our behavior is clearly that of overcaution, generated by ignorance.”– John von Neumann, The General and Logical Theory of Automata“For over two millennia, Aristotle’s logic has ruled over the thinking of western intellectuals.All precise theories, all scientific models, even models of the process of thinking itself, have inprinciple conformed to the straight-jacket of logic. But from its shady beginnings devisinggambling strategies and counting corpses in medieval London, probability theory andstatistical inference now emerge as better foundations for scientific models, especially those ofthe process of thinking and as essential ingredients of theoretical mathematics, even thefoundations of mathematics itself. We propose that this sea change in our perspective willaffect virtually all of mathematics in the next century.”— David Mumford, The Dawning of the Age of StochasticityMonday, January 9, 1227

What if stochastic inference was as fast, cheap andubiquitous as deterministic calculation?mmkmSpace of possibilitiesApplicationsUniversal StochasticInference Mansinghka 2009Monday, January 9, nations28

stochastic inference, not deterministic calculation AI systems, models of cognition, perception and action Parallel Stochastic Finite State Machines Probabilistic Hardware Commodity Hardware Specialized Inference Modules Universal Inference Machines Mansinghka 2009 Universal Stochasti

Related Documents:

Stochastic Variational Inference. We develop a scal-able inference method for our model based on stochas-tic variational inference (SVI) (Hoffman et al., 2013), which combines variational inference with stochastic gra-dient estimation. Two key ingredients of our infer

2.3 Inference The goal of inference is to marginalize the inducing outputs fu lgL l 1 and layer outputs ff lg L l 1 and approximate the marginal likelihood p(y). This section discusses prior works regarding inference. Doubly Stochastic Variation Inference DSVI is

2 Classical mean-field variational inference 3 Stochastic variational inference 4 Extensions and open issues (Hoffman et al., 2013) . Stochastic variational inference 4096 systems health communication service billion language care road 8192 service systems health com

of inference for the stochastic rate constants, c, given some time course data on the system state, X t.Itis therefore most natural to first consider inference for the earlier-mentioned MJP SKM. As demonstrated by Boys et al. [6], exact Bayesian inference in this settin

Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M arti-cles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional varia

extended quite naturally to stochastic linear hybrid systems. 3 Parameter Inference Algorithm for Stochastic Linear Hybrid Systems In this section, we propose an algorithm for hybrid system model inference, assess its complexity, and prove its convergence to a local optimum. The structure o

Variational inference has experienced a recent surge in popularity owing to stochastic approaches, which have yielded practical tools for a wide range of model classes. A key benefit is that stochastic variational inference obviates the tedious process of deriving analytical expressions

with representatives from the Anatomy sector. These relate to the consent provisions of the Human Tissue Act 2004 (HT Act), governance and quality systems, traceability and premises. 3. The Standards reinforce the HT Act’s intention that: a) consent is paramount in relation to activities involving the removal, storage and use of human tissue; b) bodies of the deceased and organs and tissue .