Monte Carlo Tree Search - Stanford University

2y ago

18 Views

2 Downloads

2.81 MB

66 Pages

Last View : 2d ago

Last Download : 3m ago

Upload by : Jamie Paz

Report this link

Download PDF

Transcription

Monte Carlo Tree SearchCmput 366/609 Guest LectureFall 2017Martin Müllermmueller@ualberta.ca

Contents 3 1 Pillars of Heuristic Search Monte Carlo Tree Search Learning and using Knowledge Deep neural nets and AlphaGo

Decision-Making One-shot decision making Example - imageclassification Analyze image, tell what’sSource: http://cs231n.github.io/assets/classify.pngin it Sequential decision-making Need to look at possiblefutures in order to make agood decision now

Heuristic Search State space (e.g. game position; location ofrobot and obstacles; state of Rubik’s cube) Actions (e.g. play on C3; move 50cm North;turn left) Start state and goal Heuristic evaluation function - estimatedistance of a state to goal

Three plus one Pillars ofModern Heuristic Search Search algorithm Evaluation function, heuristic Simulation We have had search evaluation for decades(alphabeta, A*, greedy best-first search, ) Combining all three is relatively new Machine learning is key

Alphabeta Search Classic algorithm for games Search evaluation, no simulation Minimax principle My turn: choose best move Opponent’s turn: they choose movethat’s worst for me

αβ Successes (1) Solved games -proven value of starting position checkers (Schaeffer et al 2007) Nine men’s morris (Gasser 1994) Gomoku (5 in a row) (Allis 1990) Awari, 5x5 Go, 5x5 Amazons,.

αβ Successes (2) Not solved, but super-human strength: chess (Deep Blue team, 1996) Othello (Buro 1996) shogi (Japanese chess, around 2013?) xiangqi (Chinese chess, around 2013?)

αβ Failures Go General Game Playing (GGP) Why fail? Focus on Go here

Go Classic Asian board game Simple rules, complex strategy Played by millions Hundreds of top experts - professionalplayers Until recently, computers much weakerthan humans

Go Rules Goal: surround Empty points Opponent (capture) Win: control moreStart: empty boardthan half the board14a12411359711281063

End of Game End: both players pass Territory - intersections surrounded by one player The player with more (stones territory) wins the game Komi: adjustment for first player advantage (e.g. 7.5 points)

Why does αβFail in Go? Huge state space,depth and width of game tree 250 moves on average game length 250 moves average Until very recently:no good evaluation function

Monte Carlo Methods Popular in the last 10 years Hugely successful in many applications Backgammon (Tesauro) early example Go (many) Amazons, Havannah, Lines of Action, . Planning, energy management,mathematical optimization, solveMDP,.

Monte Carlo Simulation No evaluation function? Noproblem! Simulate rest of game usingrandom moves (easy) Score the game at the end(easy) Use that as evaluation (hmm,but.)

The GIGO Principle Garbage in, garbage out Even the best algorithmsdo not work if the inputdata is bad Making random movessounds pretty bad. How can we gain anyinformation from playingthem?

Well, it Works! For some games, anyway Even random moves often preserve somedifference between a good position and abad one The rest is (mostly) statistics

Basic “Flat” MonteCarlo Search Algorithm1. Play lots of random games starting witheach possible move2. Keep winning statistics for each move3. Play move with best winning percentage

V(s) 2/4 0.5Current position sExampleSimulation1100Outcomes

How to Improve?1. Better-than-random simulations2. Add game tree (as in αβ)3. Add knowledge as bias in the game tree4. AlphaGo

1. Better Simulations Goal: strong correlation between initialposition and result of simulation Try to preserve wins and losses How?

Use Knowledge inSimulations MoGo-style patterns Tactical rules Machine learning using features andfeature weights

MoGo-Style Patterns 3x3 or 2x3 patterns Apply as response near last move

Building a betterRandomized Policy Use rules, patternsto set probabilities for each legal move Learn probabilities From human games From self-play

2. Add Game Tree First idea: Use αβ Use simulations directly as anevaluation function for This fails: Too much noise Too slow

Monte CarloTree Search Idea: use results of simulationsto guide growth of the game tree Exploitation:focus on promising moves Exploration: focus on moves whereuncertainty about evaluation is high Two contradictory goals?

UCB Formula Multi-armed bandits (slot machines inCasino) Which bandit has best payoff? Explore all arms, but: Play promising arms more often Minimize regret from playing poor arms

Some Statisticsrandom Takesamples from fixed probabilitydistributionWith many trials,average outcomewill converge to theexpected outcomeConfidence bounds:true value isprobably withinthese bounds

UCB Idea UCB Upper confidence bound Take next sample for the arm for whichUCB is highest Principle:optimism in the face of uncertainty

UCT Algorithm Kocsis and Szepesvari (2006)UCB in each node Applyof a game tree Which node to expand next? Start at root (current state)in tree, choose child n that Whilemaximizes:UCTValue(parent, n) winrate(n) C * sqrt(ln(parent.visits)/n.visits)

UCTValue(parent, n) winrate(n) C * sqrt(ln(parent.visits)/n.visits) winrate(n) . exploitation term - averagesuccess of n so far 1/n.visits . part of exploration term - explore ln(parent.visits) . part of exploration term explore all nodes at least a little bit C . exploration constant - how important isexploration relative to exploitation?nodes with very few visits - reduceuncertainty

Slides adapted fromDavid Silver’s

Summary - MonteCarlo Tree Search Amazingly successful in games and inprobabilistic planning (PROST system) Top in Backgammon, Go, GeneralGame Playing, Hex, Amazons, Lines ofAction, Havannah,. Similar methods work in multiplayergames (e.g. card games), planning,puzzles, energy resource allocation,.

MCTS Comments Very successful in practice Scales OK to parallel machines Why and how does it work? Still poorly understood Some limitations (see next slide)

Adding MachineLearned Knowledge toMCTS Game-specific knowledge can overcomelimitations Two case studies Learning with simple features Deep convolutional neural nets andAlphaGo

Why Learn Knowledge? In Go, usually only a small number of goodmoves Human masters strongly prune almost allother moves - and it works! It takes time for noisy simulations torediscover these bad moves every time So - let’s learn it.

Example of Knowledge Learned move values Blue goodGreen badUse as initial biasin the MCTS tree(in-tree, not in playouts)Search will initially focuson probably good movesSearch can still discoverother moves later

Simple Knowledge Fast machine-learned evaluation function Supervised learning from master games Simple features express quality of moves Algorithms learn weights for individualfeatures, and combinations of features Training goal: move prediction- what did the master play?

Simple KnowledgeExamplesProperties of a candidate move Help to predict whether that move is good Examples: location on board local context, e.g. 3x3 pattern capture/escape with stones, “ladder” liberties, cut/connect, eye,.

How to LearnFeatures? Standard approach in MCTS (Coulom): Each feature has a weight If a move has several features, then:move value is the product (or sum)of the feature weights Improvement: take interactions of featuresinto account (Wistuba, Xiao)

Learning Example Professional game records about 40.000 games from badukmovies.com about 10 Million positions, 2.5 billion movecandidates Label all moves in all positions in all gameswith their features Each feature has a unique ID number

Example of Labeled CandidateMoves for One Position.0 16 21 80 85 117 122 136 11220 21 41 81 85 117 122 124 11270 21 40 82 85 117 122 11250 21 39 81 85 117 122 11340 21 38 80 85 117 122 11340 21 37 79 85 117 122 11340 21 36 78 85 117 122 11340 21 41 73 85 117 122 123 142001 10 18 22 77 85 117 122 128 18830 . move not played1 . move played16, 21, . feature IDs

Training Total data: about 65GB Learn model: values for all features usingstochastic gradient descent Use a validation set to check progress 5-10% of data, kept separate Iterate over data until 3x no improvement Keep the model that does best onvalidation set Best result: about 39% move prediction

Examples

Computer Go Before AlphaGo Summary of state of the artbefore AlphaGo: Search - quite strong Simulations - OK, but hard toimprove Knowledge Good for move selection Considered hopeless forposition evaluationWho is better here?

Neural Networks (1) Deep convolutional neural networks(DCNN) Large, multilayer networks None of the limitations of simple features Learn complex relations on the board Originally trained by supervised learning 2015: Human-level move prediction (57%)

Neural Networks (2) AlphaGo (2016) Start with supervised learning for DCNN Improve move selection by self-play andreinforcement learning (RL) Learned value network for evaluation Integrate networks in MCTS Beat top human Go player 4-1 in match

Value Network (2016) Given a Go position Computes probability ofwinning Static evaluation function Trained from millions of Gopositions labeled with self-playgame result (win, loss) Trains a deep neural network

AlphaGo Zero (2017) Learn Go without human knowledge Train by RL, only from self play Start with random play, continuously updateneural net Train a single net for both policy and value

AlphaGo Zero Details Policy net is trained by running MCTS (!) Move selection frequency mapped to probability MCTS: no more simulations!!! Only in-tree phase Evaluate leaf node by value net Update value net from result at end of game Becomes stronger than previous AlphaGo

AlphaGo ZeroComments Architecture is a lot more elegant Strong integration of learning and MCTS MCTS used to define the learning target forpolicy MCTS uses thelearned net at every step Requires massive, Google-scale resources to train

Alpha Zero Just published on arxiv, Dec 5, 2017 Apply AlphaGo Zero approach to chess, shogi(Japanese chess) Remove Go-specific training details Simplify training procedure for network Learns to beat top chess, shogi programs Requires massive, Google-scale resources to train

Alpha Zero Results

Where do we Gofrom Here? Which problems can we use this for? The methods are quite general,not game-specific We need an internal model of theproblem in order to learn from self play Can we use similar approaches when wehave lots of data to define an approaximatemodel?

Is the Game of GoSolved Now? No! AlphaGo is incredibly strong But it is all heuristics AlphaGo still makes mistakes 5x5, 5x6 Go are solved Can play some full-board 19x19puzzles perfectly usingcombinatorial game theory

Solving Go EndgamePuzzles

Game of Hex Connect two sides ofyour own color No draws Some similarities to Go,some differences Very hard game of purestrategyImage: https://ilk.uvt.nl/icga/games/hex/hex0m.gif

MoHex (1) MoHex: world’s strongest Hex program Developed by Ryan Hayward’s group inAlberta Open source Won last four Computer Olympiads

MoHex (2)Game-specific enhancements: Hard pruning - provably bad or inferiormoves Very strong exact endgame solver -uses an search algorithm called depthfirst proof-number search See https://webdocs.cs.ualberta.ca/ hayward/hex/

Learn more aboutmodern heuristic search,MCTS and AlphaGo Course Cmput 496 Search, Knowledge and Simulations From the basics to AlphaGo Second run starting Winter 2018 Low math content, focus on concepts andcode examples

Summary (1) Monte Carlo methods revolutionizedheuristic search in games and planning Modern algorithms use all three: search,knowledge and simulation Except Alpha Zero Machine learning to improve knowledge,e.g. feature learning, deep neural nets

Summary (2) Alpha Zero combines all these methodseffectively - superhuman strength in Go,chess, shogi MCTS: Many very successful applications,still not well understood in general Newest development: tightly integratesearch and deep learning Future challenge: extend to exactsolutions?

Dec 05, 2017 · Example of Labeled Candidate Moves for One Position. 0 16 21 80 85 117 122 136 1122 0 21 41 81 85 117 122 124 1127 0 21 40 82 85 117 122 1125 0 21 39 81 85 117 122 1134 0 21 38 80 85 117 122 1134 0 21 37 79 85 117 122 1134 0 21 36 78 85 117 122 1134 0 21 41 73 85 117 122 123 142 0 0 1 10 18 22 77 85 117 122 128 1883 0. move not played 1. move played 16, 21, . feature IDs

Related Documents:

The Markov Chain Monte Carlo Revolution

The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract The use of simulation for high dimensional intractable computations has revolutionized applied math-ematics. Designing, improving and understanding the new tools leads to (and leans on) fascinating mathematics, from representation theory through micro-local analysis. 1 IntroductionCited by: 343Page Count: 24File Size: 775KBAuthor: Persi DiaconisExplore furtherA simple introduction to Markov Chain Monte–Carlo .link.springer.comHidden Markov Models - Tutorial And Examplewww.tutorialandexample.comA Gentle Introduction to Markov Chain Monte Carlo for .machinelearningmastery.comMarkov Chain Monte Carlo Lecture Noteswww.stat.umn.eduA Zero-Math Introduction to Markov Chain Monte Carlo .towardsdatascience.comRecommended to you b

28 Views

2y ago

An introduction to Quasi Monte Carlo methods - Nambafa

Quasi Monte Carlo has been developed. While the convergence rate of classical Monte Carlo (MC) is O(n¡1 2), the convergence rate of Quasi Monte Carlo (QMC) can be made almost as high as O(n¡1). Correspondingly, the use of Quasi Monte Carlo is increasing, especially in the areas where it most readily can be employed. 1.1 Classical Monte Carlo

17 Views

1y ago

Fourier Analysis of Correlated Monte Carlo Importance ...

Fourier Analysis of Correlated Monte Carlo Importance Sampling Gurprit Singh Kartic Subr David Coeurjolly Victor Ostromoukhov Wojciech Jarosz. 2 Monte Carlo Integration!3 Monte Carlo Integration f( x) I Z 1 0 f( x)d x!4 Monte Carlo Estimator f( x) I N 1 N XN k 1 f( x k) p( x

36 Views

2y ago

Introduction to Markov Chain Monte Carlo - Cornell University

Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution - to estimate the distribution - to compute max, mean Markov Chain Monte Carlo: sampling using "local" information - Generic "problem solving technique" - decision/optimization/value problems - generic, but not necessarily very efficient Based on - Neal Madras: Lectures on Monte Carlo Methods .

15 Views

7m ago

Equity Valuation Models

vi Equity Valuation 5.3 Reconciling operating income to FCFF 66 5.4 The financial value driver approach 71 5.5 Fundamental enterprise value and market value 76 5.6 Baidu’s share price performance 2005–2007 79 6 Monte Carlo FCFF Models 85 6.1 Monte Carlo simulation: the idea 85 6.2 Monte Carlo simulation with @Risk 88 6.2.1 Monte Carlo simulation with one stochastic variable 88

89 Views

3y ago

Monte Carlo -II: Clinical impact Outline

Electron Beam Treatment Planning C-MCharlie Ma, Ph.D. Dept. of Radiation Oncology Fox Chase Cancer Center Philadelphia, PA 19111 Outline Current status of electron Monte Carlo Implementation of Monte Carlo for electron beam treatment planning dose calculations Application of Monte Carlo in conventi

54 Views

2y ago

Introduction to Sequential Monte Carlo Methods

J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems , JASA, 1998 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources Pierre Del Moral, Feynman-Kac

25 Views

2y ago

Computational Geometry Aspects of Monte Carlo Approaches ...

Computational Geometry Aspects of Monte Carlo Approaches to PDE Problems in Biology, Chemistry, and Materials Monte Carlo Methods for PDEs A Little History on Monte Carlo Methods for PDEs Early History of MCMs for PDEs 1.Courant, Friedrichs, and Lewy: Their pivotal 1928 paper has probabilistic interpretations and MC algorithms for linear elliptic

16 Views

1y ago

Recent Views

Consumer Guide to Auto Insurance - csimt.gov

consumer guide to auto insurance contents introduction to auto insurance 1 understanding your auto insurance policy 2 required auto insurance 3 optional types of auto insurance 4-5 getting the right coverage 6 accidents and violations 7 how to shop for auto insurance 8 shopping tips 9 frequently asked questions 10-11 insurance complaints/when you have a problem 12

2y ago

805 Views

your guide to understanding auto ins in nh - New Hampshire

Hampshire Insurance Department does not mandate or set Auto Insurance Rates. Auto Insurance Rates will vary by insurance company. This guide is intended to give New Hampshire consumers basic information on auto insurance. It suggests ways to: Lower the cost of your auto insurance, shop for Auto insurance and, file an auto insurance claim.

1y ago

449 Views

OWNER'S GUIDE - NinjaKitchen

auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. frozen drinks smoothies puree med high pulse low / dough. auto auto auto. please keep these important safeguards in mind when using the . appliance: mportant: make sure that the .

1y ago

285 Views

Quotes within Quotes: When Single (') and Double (") Quotes . - SAS

Here the outside double quotes are replaced by a single quote and the apostrophe is replaced by two single quotes. This works because when the parser sees two single (or double) quotes immediately following each other, the parser resolves them into one quote mark after the closing quote has been determined.

1y ago

237 Views

What These Inspirational Quotes Say

Self Motivation Quotes Success Quotes Teacher Quotes And after reading all of these inspirational quotes you’d like to share which quotation is . -- Brian Tracy "You must constantly ask yourself these questions: Who am I around? What are they doing to me? Wha

2y ago

302 Views

Consumer Guide Auto Insurance - Tennessee

Auto insurance doesn't cover paying off your loan if your car is damaged and its market value is less than what you owe. Auto dealers and lenders may offer guaranteed auto protection (GAP) insurance for this purpose. Your auto insurance will cover you if you drive into Canada. To drive into Mexico, however, you'll need to buy Mexican auto .

1y ago

199 Views

NAIC Consumer Shopping Tool for Auto Insurance

Whether you are buying auto insurance for the first time, or shopping to be sure you are getting the best deal, you already know how important auto insurance is. By law in most states, if you own a car, you must have some auto insurance. Remember, there is no such thing as a "full coverage" auto insurance policy. Policies are made up of

1y ago

185 Views

Personal insurance - Car & Business insurance King Price Insurance

The king's insurance options 5 Things you need to know 7 The stuff you need to do 14 How to claim 16 Our commitment to you 20 Car insurance 22 Car warranty 37 Shortfall cover 45 Scratch and dent 46 Tyre and rim 48 Motorbike insurance 53 Trailer and caravan insurance 64 Watercraft insurance 68 Home contents insurance 77 Buildings insurance 89

1y ago

673 Views

REVIEW OF AUTOMOBILE INSURANCE RATES - Consumers' Association of Canada

In the summer of 2003 the Association compiled over 7,000 auto insurance rate quotes from sources across Canada. In the case of those provinces in which private insurers provide auto insurance the study ensured that the rate quotes obtained reflected the range of prices likely to be found in those markets.

1y ago

213 Views

Broadway towing winchester ky

MO 77 Motors: Rock Hill, SC 7th Avenue Auto Salvage: Fargo, ND 81 Auto Parts & Recycling : Salem, VA 82 Auto Wrecking: Brookfield, OH #9 Truck & Auto Parts (No US Shipping) : Tottenham, ON 97 Auto Wrecking Shull's Towing: Brewster , WA 98 Auto Recyclers: Brooksville, FL 99 Auto Dismantler: Stockton, CA A & A Auto & Truck LLC:

2y ago

465 Views

All about auto insurance - Option Consommateurs

of insurance companies with which they have agreements. Insurance agents: agents work for a specific insurance company. Before you decide to do business with either a broker or an agent, check out prices, the products being proposed and the quality of the service. Buying auto insurance 4 All about auto insurance

1y ago

230 Views

A Message from Our President - Fox Valley Corvette

Bob Jass Chev-rolet 630-365-6481 Auto Parts 25% in most cas-es Ron Westphal Chevrolet 630-898-9630 Auto Parts 25% in most cas-es Thomsons Auto Parts 630-879-6363 Auto Parts 10% in most cas-es American Mod-ern Insurance Co. Collector Car Auto Insurance 10% on Collector Auto Polic

2y ago

225 Views

Quotations - Free Website Builder: Create free websites

cards, but sometimes, playing a poor hand well." . 50th Birthday Quotes 60th Birthday Quotes And there are more. Funny Birthday Quotes Cute Birthday Quotes . it a try, itʼs free. Triumph over failure can be a

2y ago

267 Views

The Top 100 Motivational & Inspirational Quotes for 2015

I've spent hours crawling through the web trying to find the best quotes to keep me motivated and inspired all throughout the New Year. I've saved hundreds of quotes on my laptop and figured that words alone could motivate and inspire me. but if I couple the quotes

2y ago

329 Views

Inspirational Quotes - Guideposts

Inspirational Quotes Inspiring quotes are like vitamins for the soul. From the heartfelt to the humorous, the words of wisdom you’ll find here will strengthen your faith, lift your spirits, and even spark a positive change in your life. This collection of some our favorite inspirational quotes from religious figures, world leaders, authors,

2y ago

553 Views

Monte Carlo Tree Search - Stanford University

It looks like you're using an ad-blocker