Decision - UNSW

2y ago
12 Views
2 Downloads
402.32 KB
26 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Macey Ridenour
Transcription

DecisionUnpacking the Exploration–Exploitation Tradeoff: ASynthesis of Human and Animal LiteraturesKatja Mehlhorn, Ben R. Newell, Peter M. Todd, Michael D. Lee, Kate Morgan, Victoria A.Braithwaite, Daniel Hausmann, Klaus Fiedler, and Cleotilde GonzalezOnline First Publication, April 6, 2015. rn, K., Newell, B. R., Todd, P. M., Lee, M. D., Morgan, K., Braithwaite, V. A.,Hausmann, D., Fiedler, K., & Gonzalez, C. (2015, April 6). Unpacking theExploration–Exploitation Tradeoff: A Synthesis of Human and Animal Literatures. Decision.Advance online publication. http://dx.doi.org/10.1037/dec0000033

Decision2015, Vol. 2, No. 2, 000 2015 American Psychological Association2325-9965/15/ 12.00 http://dx.doi.org/10.1037/dec0000033This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.Unpacking the Exploration–Exploitation Tradeoff: A Synthesis ofHuman and Animal LiteraturesKatja MehlhornBen R. NewellCarnegie Mellon University andUniversity of GroningenUniversity of New South WalesPeter M. ToddMichael D. LeeIndiana UniversityUniversity of CaliforniaKate MorganVictoria A. BraithwaiteUniversity of St. AndrewsThe Pennsylvania State UniversityDaniel HausmannKlaus FiedlerUniversity of ZürichUniversity of HeidelbergCleotilde GonzalezCarnegie Mellon UniversityMany decisions in the lives of animals and humans require a fine balance betweenthe exploration of different options and the exploitation of their rewards. Do youbuy the advertised car, or do you test drive different models? Do you continuefeeding from the current patch of flowers, or do you fly off to another one? Do youmarry your current partner, or try your luck with someone else? The balancerequired in these situations is commonly referred to as the exploration– exploitationtradeoff. It features prominently in a wide range of research traditions, includinglearning, foraging, and decision making literatures. Here, we integrate findingsfrom these and other often-isolated literatures in order to gain a better understanding of the possible tradeoffs between exploration and exploitation, and we proposenew theoretical insights that might guide future research. Specifically, we explorehow potential tradeoffs depend on (a) the conceptualization of exploration andexploitation; (b) the influencing environmental, social, and individual factors; (c)the scale at which exploration and exploitation are considered; (d) the relationshipand types of transitions between the 2 behaviors; and (e) the goals of the decisionsity of Heidelberg; Cleotilde Gonzalez, Department ofSocial and Decision Sciences, Carnegie Mellon University.Katja Mehlhorn and Cleotilde Gonzalez were supportedby the National Science Foundation Award Number1154012 to Cleotilde Gonzalez. The ideas in this articleoriginated from discussions in a workshop entitled Predicting Choice from Exploration, organized by Cleotilde Gonzalez and Katja Mehlhorn at the 9th Invitational ChoiceSymposium, Huis ter Duin, Noordwijk, The Netherlands,June 12–16, 2013.Correspondence concerning this article should be addressed to Katja Mehlhorn, University of Groningen, Deptartment of Artificial Intelligence, Nijenborgh 9, 9747 AGGroningen, The Netherlands. E-mail: s.k.mehlhorn@rug.nlKatja Mehlhorn, Department of Social and DecisionSciences, Carnegie Mellon University, and Departmentof Artificial Intelligence, University of Groningen; BenR. Newell, School of Psychology, University of NewSouth Wales; Peter M. Todd, Cognitive Science Program and Department of Psychological and Brain Sciences, Indiana University; Michael D. Lee, Departmentof Cognitive Sciences, University of California; KateMorgan, School of Biology, University of St. Andrews;Victoria A. Braithwaite, Center for Brain, Behavior andCognition, Department of Ecosystem Science and Management, The Pennsylvania State University; DanielHausmann, Department of Psychology, University ofZürich; Klaus Fiedler, Institute of Psychology, Univer1

2MEHLHORN ET AL.maker. We conclude that exploration and exploitation are best conceptualized aspoints on a continuum, and that the extent to which an agent’s behavior can beinterpreted as exploratory or exploitative depends upon the level of abstraction atwhich it is considered.Keywords: exploration– exploitation tradeoff, learning, foraging, decision making, decisionThis document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.theoryConsider the following scenarios. (a) Youwork for the Widget Corporation and you arepaid according to how many functional widgetsyou can produce. You have access to two widget machines but can only use one at a time. Onthe first day, you know nothing about the machines so you pick one at random and startwork. After 10 functional widgets, the machineproduces a faulty one. What do you do? Do youtolerate the single faulty widget and persevere,or do you try your luck on the other machine?(b) You are a hummingbird feeding in a field offlowers. You pick one patch of flowers andbegin to drink the nectar. How long should youremain at that patch before seeking another?Would you leave when all the flowers have beenexhausted? What if the nectar in nearby patcheshas already been harvested? (c) You arrive in acity for a few days and have a range of restaurants to choose from. Do you try as many different restaurants as you can, or do you look forrestaurants of a specific type? Toward the end ofyour visit, do you stop searching for new restaurants and revisit the ones that you enjoyedmost? When do you make this switch in yourstrategy? (d) You are a college student on thedating market. Your goal might be to find apartner for life, or you might be more interestedin dating as many people as you can. Howwould you approach those goals? Could youcombine them into a perfect search strategy?What if the partner of your choice is not interested in you? (e) You finally decide to buy anew car. Do you search the Internet for information about different car companies, or do youtrust your own experience and stick with yourcurrent company? Once you have chosen a dealership, how many cars do you look at? Howlong do you test drive a specific car before youdecide to buy it?Many approaches to the analysis of decisionbehavior would characterize these scenarios asrepresentations of a tradeoff between exploration and exploitation (e.g., in reinforcementlearning [RL] and neuroscience: Cohen,McClure, & Yu, 2007; in foraging: Cook,Franks, & Robinson, 2013; in binary riskychoice: Gonzalez & Dutt, 2011; in organizational learning: Gupta, Smith, & Shalley, 2006;for a review, see Hills, Todd, Lazer, Redish, &Couzin, 2015). Remaining at an option— be it amachine, patch of flowers, restaurant, partner,or car—allows for exploitation, that is, makingthe most of where you are. A switch to anotheroption, going somewhere else to see if you canget a better reward—fault-free widgets, morenectar, better food, a higher reproductive value,or a faster car— exemplifies exploration. Although these concepts seem quite simple on thesurface (e.g., staying is exploitation; switchingis exploration), the definitions, processes, andelements surrounding exploration and exploitation behavior are not simple at all. In fact,exploration– exploitation tradeoffs are considered one of the more fundamental challenges inour understanding of adaptive control and behavior (Cohen et al., 2007).The theoretical analysis of exploration–exploitation tradeoffs is complicated in severalways. First, the concepts of exploration, exploitation, and a tradeoff between the two are usedin a wide range of literatures, from animal behavior to human behavior and involving different terminologies, methodologies, and perspectives. The disparity and breadth of theseconcepts across such large and diverse literatures makes it difficult to synthesize existentknowledge into a coherent view. Researchersworking in different areas operationalize exploratory and exploitative behavior in differentways, and this diversity motivates disagreements within conclusions about the essence ofexploratory and exploitative behaviors. For example, researchers have debated assumptionsabout the exact elements that constitute exploratory and exploitative behavior, and about whatdefines a tradeoff between them. Second, exploration– exploitation tradeoffs may depend on a

This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.UNPACKING THE EXPLORATION–EXPLOITATION TRADEOFFlarge number of environmental, individual, andsocial factors. The literature documenting thesedifferent factors is extensive, and there is littleor no attempt to integrate their results (see Cohen et al., 2007; and Gupta et al., 2006, forlaudable exceptions). Third, exploration and exploitation behaviors, and consequently, a potential tradeoff between the two, might not alwaysbe clearly identifiable. That is because theseconcepts can be described and understood ondifferent spatial and temporal scales, as well asalong different continua of behavior. Consequently, behaviors that might be understood asexploratory on one level of analysis might beseen as exploitative on another level, and evenwithin a specific level of analysis, behaviorsmight have explorative and exploitative components.Our goal is to provide an up-to-date synthesisof the exploration– exploitation literature bybringing together knowledge from different disciplines including human decision making, neuroscience, organizational learning, animal foraging, mate choice, and formal modelingapproaches. To achieve this goal, we discuss thevarious challenges mentioned above and suggest a simple and straightforward framework forthe theoretical analysis of potential tradeoffsbetween explorative and exploitative behaviors.Our synthesis illustrates the complexities surrounding these concepts and their tradeoffs. Ouranalyses highlight three elements needed in aunification of exploration and exploitation research: An exploration– exploitation continuum,different types of transitions between these twostates, and the role of agents’ goals in thisprocess. These elements illustrate that the explore– exploit distinction often may not be adirect choice that an agent makes, but rather isan explanatory framework that researchers canapply to the agent’s behavior to understand howto solves problems of the agent’s interactionwith the environment.A Synthesis of Exploration–ExploitationLiteraturesIn an attempt to integrate a diverse and widerange of literatures, we organize our synthesisaround three main themes: (a) concepts anddefinitions of exploration and exploitation; (b)environmental, individual, and social factorsthat influence exploration and exploitation; and3(c) spatial and temporal scales that may influence how exploration and exploitation are conceptualized.Concepts and Definitions of Explorationand ExploitationWithin the current literature, definitions ofexploration and exploitation differ across atleast three dimensions: behavioral patterns ofthe agent, values and uncertainty of the choiceoptions, and outcomes obtained from a choice(cf. Todd, Hills, & Robbins, 2012). These threedimensions can be mapped to aspects of what asearching agent does, what the agent bases itsdecisions on, and what the agent gets out of thesearch. Focusing on these dimensions shapesthe analysis and understanding of explorationand exploitation behavior in a particular situation, as well as any conclusions about theirtradeoffs.The agents’ behavioral patterns are, perhapsnot surprisingly, the most common dimensionused to define exploration and exploitation inthe animal foraging literature, where researchoften relies on the observation of behavior(Kramer & Weary, 1991; Nonacs, 2010). Behavioral patterns have also been considered, forexample, in research on human informationsearch (Hills, Todd, & Goldstone, 2010) andbinary choice (Gonzalez & Dutt, 2011). In general, behavior is interpreted as exploration if italternates between patches or options, is unfocused, and is variable over time. Behavior isinterpreted as exploitation if it remains within apatch or option, is focused, and is stable overtime. Therefore, the hummingbird remaining atits patch of flowers would be considered asexploiting the patch, while it would be considered as exploring if it alternates betweenpatches (Nonacs, 2010). However, as we willdiscuss below, the distinction between exploration and exploitation is not always clear basedon behavioral patterns alone. For example, theclassification of a behavior as staying (exploit)rather than switching (explore) depends on thespatial and temporal scales of observation.The values of choice options and the uncertainty associated with knowledge of those values are most prominently used to define exploration and exploitation in the RL literature. Insome classic RL models, exploitation is definedas choosing the option that has the higher sub-

This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.4MEHLHORN ET AL.jective value and exploration is defined aschoosing any other option at random (Sutton &Barto, 1998). For example, the prominent epsilon-greedy model mainly chooses the optionwith the greatest observed rate of reward (exploitation), but chooses an alternative at random(exploration) with some small probability ofepsilon. The related epsilon-decreasing modelallows the rate of exploration to change overtime, but preserves the basic notions of expected value guiding exploitation and randomness guiding exploration. Other RL modelshave stressed the importance of uncertainty fordefining exploration and exploitation. For example, in the restaurant scenario, explorationcan be defined in terms of choosing an optionwith greater uncertainty, while exploitation isdefined as opting for greater certainty (e.g., Lee,Zhang, Munro, & Steyvers, 2011). As pointedout in the neuroscience literature, the role ofuncertainty might additionally be moderated bythe agent’s expectations (Aston-Jones & Cohen,2005). For example, if you know that yourmachine periodically produces a faulty widget,one faulty widget will not stop you from usingthis machine (a situation that would be termedas expected uncertainty by Aston-Jones & Cohen, 2005). However, if the machine starts producing more and more faulty widgets (unexpected uncertainty), you might decide to givethe other machine a try. Uncertainty and valuealso play a role in the animal literature, whereexploration has been associated with choosingoptions with uncertain rewards and variable values, and exploitation has been associated withchoosing options with known rewards and stable values (Krebs, Kacelnik, & Taylor, 1978).Finally, exploration and exploitation havebeen discussed with respect to the outcomes thatare obtained by the searching agent, which mayinclude information, other types of resource rewards, or both. In many research areas, exploration is assumed to provide the agent with theopportunity for learning and obtaining information, while exploitation is assumed to provideexplicit outcomes such as caloric or monetaryrewards (neuroscience and RL: Cohen et al.,2007; foraging: Cook et al., 2013; decisionmaking: Hills & Hertwig, 2010; organizationallearning: March, 1991). For example, in “observe-or-bet” tasks (Navarro & Newell, 2014;Rakow, Newell, & Zougkou, 2010; Tversky &Edwards, 1966), participants can either obtaininformation (explore) or obtain monetary rewards (exploit). In each trial of these tasks,participants choose between (a) observingwhich of two lights comes on, and therebygaining information about the underlying probabilities, and (b) betting on which light willcome on, and thereby receiving rewards if theyguess correctly. More importantly, if theychoose to observe, participants receive no reward, and if they choose to bet, they receive nofeedback as to which light comes on, therebyallowing the researchers to distinguish between“pure” exploration and exploitation with respectto the observed outcomes. However, the distinction between information and rewards is notalways as straightforward as in this example. Inmost real-life situations, rewards tend to alsoprovide the agent with information about thequality of the selected option (Gupta et al.,2006) and hence the distribution of rewardsavailable. In some situations, agents might receive information about foregone payoffs innonselected alternatives (Yechiam & Busemeyer, 2006); and even when no material rewards are obtained during exploration in a givensituation, information search in itself can berewarding if it delivers positive experiences(Denrell & Le Mens, 2011; Gonzalez & Dutt,2012; Mehlhorn, Ben-Asher, Dutt, & Gonzalez,2014).The three dimensions we highlight here areby no means mutually exclusive. Most conceptualizations of exploration and exploitation arebased on more than one of them. However, therespective contributions of these dimensions toa given conceptualization are not always clearand this can be especially problematic if theconsidered dimensions lead to opposing conclusions. An example of this can be found in therecent human decision making literature, whereresearchers have disagreed about the interpretation of exploration and exploitation behavior ina popular “sampling paradigm” of binary choice(Gonzalez & Dutt, 2011, 2012; Hills &Hertwig, 2010, 2012). In this paradigm, participants can sample outcomes without consequences from two choice options for as long asthey want, before making a final consequentialchoice between the two (Hertwig, Barron,Weber, & Erev, 2004). Hills and Hertwig(2012) argue that this paradigm presents “pureexploration” during the sampling phase, because participants only receive information

This document is copyrighted by the American Psychological Association or one of its allied publishers.This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.UNPACKING THE EXPLORATION–EXPLOITATION TRADEOFFwhile it presents “pure exploitation” at thechoice phase because they receive only rewards.In contrast, Gonzalez and Dutt (2012) argue fora gradual transition from exploration to exploitation within the sampling phase, because theyfind a decrease in alternation between the options over the course of sampling. One way ofaccounting for these two different perspectiveson the data is that the first arises from a focus onobtained outcomes (information vs. rewards) asthe defining dimension, while the second comesfrom a focus on behavioral patterns (whichoptions are being sampled). But the completepicture must include consideration of how aparticular search fits into a sequence of searchesover time, where rewards from one search maybecome information to guide later searches andhence influence what is exploration versus exploitation in a particular behavioral pattern (ahierarchical view we return to in the Discussion).In addition to affecting our understanding ofexploratory and exploitative behaviors, assumptions about the underlying dimensions may alsoinfluence the nature of the considered tradeoff.For example, a distinction between explorationand exploitation based on behavioral patternsplaces the costs and benefits of staying at aresource versus switching to another one in thespotlight (Charnov, 1976), while a distinctionbased on values and uncertainty of the choiceoptions may switch the

June 12 16, 2013. Correspondence concerning this article should be ad-dressed to Katja Mehlhorn, University of Groningen, Dep-tartment of Arti cial Intelligence, Nijenborgh 9, 9747 AG Groningen, The Netherlands. E-mail: s.k.mehlhorn@rug.nl This document is copyrighted by the American P

Related Documents:

Work Health and Safety Plan The UNSW Work Health and Safety Plan (2020-2023) is aligned with UNSW's strategic priorities and themes as outlined in the UNSW 2025 Strategy. This plan will assist UNSW in preventing work-related injury and occupation disease for UNSW workers, students and visitors, meet its duty of

School of Education Guidelines on Assessment Policy and Procedures 2020 v.1 (subject to change) ASSESSMENT All UNSW Arts & Social Sciences students are required to follow UNSW Academic Policies and UNSW Arts & Social Sciences Guidelines and Protocols while they are enrolled in their program.

Step inside the UNSW HTH The main entrance on High Street will be generous with multiple entry points, welcoming the community to the UNSW HTH and wider Randwick Health & Innovation Precinct. Integrated, equitable and accessible entry to the UNSW HTH, Plaza and easy links through to the Sydney Children's Hospital Stage 1 and the Children's

EMAIL: k.evon@unsw.edu.au Head of School Professor Simon Killcross Room 1013, Mathews Building PHONE: (02) 9385 3034 EMAIL: skillcross@psy.unsw.edu.au First Year/General Education Enquiries Lynne Bester and Stephanie Roughley Room 1011, Mathews Building PHONE: (02) 9385 3236 EMAIL: Firstye

UNSW Global Pty Limited PO Box 6666 UNSW 1466 Australia Telephone: 61 2 9385 0555 Email: admissions@unswglobal.unsw.edu.au . The CAAW Letter will only be issued if you provide the invoice/receipt number, along with proof of payment and return

1. What is decision theory?.5 1.1 The decision disciplines 5 1.2 Decision processes 7 1.3 Decision matrices 11 1.4 Classification of decision theories 13 1.4.1 Normative and descriptive theories 14 1.4.2 Individual and collective decision-making 15 1.4.3 Degrees of knowledge 16 2.

Oct 18, 2014 · A decision problem is characterized by decision alternatives, states of nature, and resulting payoffs. The decision alternatives are the different possible strategies the decision maker can employ. The states of nature refer to future events, not under the control of the decision maker, which

Decision theory and Decision analysis Decision Analysis De nition (B. Roy):\consists in trying to provide answers to questions raised by actors involved in a decision process using a model" Answers:\Optimal solution" or \Good decision" is absent Models:formalized or not Brice Mayag (LAMSADE) Introduction to Decision Modeling Chapter 0 18 / 36