1y ago

13 Views

2 Downloads

696.78 KB

7 Pages

Transcription

.IFleet Assignment Using Collective IntelligenceNicolas E. Antohe*, Stefan R. Bieniawskit, and nan M. KroolStanford University, Stanford, CA 94305David H . WolpertSNASA Ames Research Center, Moffett Field, CA 94035P r o d u c t distribution theory is a new collective intelligence-based framework foranalyzing and controlling distributed systems. Its usefulness in distributed stochasticoptimization is illustrated here through an airline fleet assignment problem. Thisproblem involves the allocation of aircraft to a set of Rights legs in order to meetpassenger demand, while satisfying a variety of linear and non-linear constraints. Overthe course of the day, the routing of each aircraft is determined in order to minimizethe number of required flights for a given fleet. The associated flow continuity andaircraft count constraints have led researchers to focus o n obtaining quasi-optimalsolutions, especially at larger scales. In this paper, the a u t h o r s propose t h e applicationof this new stochastic optimization algorithm to a non-linear objective “cold start”fleet assignment problem. Results show t h a t the optimizer can successfully solve suchhighly-constrained problems (130 variables, 184 constraints).IntroductionSCHEDCLE development, a crucial aspect of profitableSchedule Designairline management, involves many steps, includingschedule design, fleet assignment, aircraft routing, and crewpairing.’ In this project, we assume that schedule designhas been finalized; the focus is on Feet assignment, that isthe assignment of available aircraft to the scheduled flights,and on aircraft routing, the sequence of flights to be flown byeach aircraft throughout the day (Figure 1). Typical fleetassignment objectives include minimizing assignment costor maximizing the profit from each flight. In our case, theobjective is to meet the passenger demand throughout theday with the lowest total landing and takeoff (LTO) costs.Fleet assignment problems can be classified as either“warm start”, in which case an existing assignment is usedas a starting point, or “cold start”, in which only the fleetsize, aircraft types, and passenger demand are known.2Fleet assignment and aircraft routing problems have beensolved using various optimization methods, including integerlinearneighborhood search,‘ and geneticalgorithms.6An alternate approach pursued here is to distribute theoptimization among agents that represent, for example,members of the fleet or the airports in the network. Formulating the problem as a distributed stochastic optimizationallows for the application of techniques from machine learning, statistics, multi-agent systems, and game theory. Thecurrent work leverages these fields by applying a CollectiveIntelligence (COIN) technique, Product Distribution (PD)theory, t o a sample fleet assignment problem. Typicallyin stochastic optimization approaches probability distributions are used to help search for a point in the variablespace which optimizes the objective function. In contrast,in the P D approach the search is for a probability diitribution across the variable space that optimizes an associatedLagrangian. Since the probability distribution is a vectorin a Euclidean space, the search can be done via gradientbased methods even if the variable space is categorical. Similar techniques have been successfully applied to a variety ofdistributed optimization problems including network rout-Fleet AssignmentFig. 1 Airline Schedule DevelopmentGCIFig. 2The %airport, 2S-cHproblem.ing, computing resource allocation, and data collection byautonomousThe next section of the paper details the formulation ofthe optimization problem. This is followed by a descriptionof the COIN and PD theory framework. Finally, resultsfrom an example fleet assignment problem are presented.These results validate the predictions of this theory, andindicate its usefulness as a general purpose technique fordistributed solutions of constrained optimization problems.Problem StatementThe objective is to determine the aircraft routing and resident fleet size at each airport that minimizes the landing andtakeoff fees levied by airports while meeting demand. The9-airport, 20-flight directed arc sample problem (Figure 2)is used t o demonstrate the performance of the approach.The passenger demand on each arc is given as a function oftime (determined as part of the schedule design). The dayis split into six 4-hour segments. It is assumed that each‘Doctoral Candidate, AIAA Student Member.‘Doctoral Candidate, AIAA Member.‘Professor, ATAA Fellow.scientist.h§Senior e s e a r 1

. PaxCapacity wCost Factor F(w)2001001.01.53002.030I fThe three types of variables are: u2,,,the number of aircraft assigned to flight arc i at time segment j , ?&, thenumber of resident aircraft at airport k , and w,the passenger capacity of the airplane. The resident fleet is theniiiiilel ul' airpianes at each airport at the start and end ofthe day, which must be the same to repeat the schedule thenext day. The allowable ranges for the variables are :.0.0.0.01 1 0Total LTO FeesNumber of aircraft on each arcResident fleet at each airportAirplane passenger capacityCONSTRAINTS: Passenger demandAssignment continuityResident fleet conservationTotal fleet size.M CMINIMIZE:VARIABLES:'For example, for our 9-city, 20-arc case:.t .0.-1.0The resident fleet size SIk at each airport k must equalthe -number of airplanes S F k at the end of the day so theschedule can be restarted the following day. In equationform, we require:fS F kSIk vk-sIk50with:1The airports in this sample problem contribute 9 residentfleet constraints. Finally, the total fleet size F is enforcedusing:s l k -F 5 0kThe total daily LTO cost is a function of F ( w ) (see Table l ) and the number of segments flown. The non-linearobjective function can be written as:There are 20 arcs and 6 time segments in this problem,which, with 9 airports and 1 aircraft type, results in a totalof 130 variables. Constraints are required to ensure thatpassenger demand Dl,3 is met in full by capacity Cz,3foreach arc, at each time segment. There are 20 arcs and 6 timesegments, for a total of 120 passenger demand constraints.For these non-linear constraints to be satisfied:-ct,jf Dt,, I 0with:ct,, 'Ur,,While the framework supports multiple aircraft models, inthis example problem the fleet is composed of a single aircraft type, for which the passenger capacity is a variable.Assignment continuity ensures that an airplane can only beassigned to an arc if an airplane is available at the originating airport. With 9 airports and 6 time segments, 54continuity constraints are included. Defining S k , j as thestate of the fleet at airport k at the beginning of time increment j , we require:-Sk,j 5 0where:i1The M matrix is used to tally outbound aircraft for eachairport during a time segment. Likewise, N is used to determine the inbound aircraft to be added to an airport pool.This results in a total of 184 constraints.Collective Intelligence and ProductDistribution TheoryCollective Intelligence (COIN) is a framework for designing a collective, defined as a group of agents with a specifiedworld utility or system-level objective. In the case of thefleet assignment problem, the agents match the two typesof variables: the number of airplanes assigned to each routefor each time segment, and the size of the resident fleet ateach airport. The world utility for this problem is the objective described above: total LTO fees.The COIN solution process consists of the agents selecting actions (a value from the variable space) and receivingrewards based upon their private utility functions. Theserewards are then used by the agents to determine their nextchoice of action. The process reaches equilibrium when theagents can no longer improve their rewards by changingactions. Product Distribution (PD) theory formalizes andsubstantially extends the COIN framework."-"In particular PD theory handles constraints, a necessity for problemssuch as fleet assignment. The core insight of P D theory is toconcentrate on how the agents update the probability distributions across their possible actions rather than specificallyon the joint action generated by sampling those distributions.PD theory can be viewed as the information-theoretic extension of conventional full-rationality game theory to thecase of bounded rational agents. Information theory showsthat the equilibrium of a game played by bounded rationalagents is the optimizer of a Lagrangian of the probability distribution of the agents' joint-moves. In any game,bounded rational or otherwise, the agents are independent,with each agent 2 choosing its move x t at any instant by sampling its probability distribution (mixed strategy) at thatinstant, q l ( z , ) . Accordingly, the distribution of the jointmoves is a product distribution, P ( z ) n , q t ( z 2 )In. thisrepresentation, all coupling between the agents occurs indirectly; it is the separate distributions of the agents {q,}.

3cthat‘are coupled, while the actual moves of the agents areindependent. As a result the optimization of the Lagrangianizii !x Zozc k 2 c m ely distributed manner.When constraints are included, the bounded rational equilibrium optimizes the expected value of the world utilitysubject to those constraints. Updating the Lagrange parameters weighting the constraints focuses the agents moreand more on the optimal joint pure strategy.This approach provides a broadly applicable way to castany constrained optimization problem as the equilibratingprocess of a multi-agent system, together with an efficientmethod for that equilibrating process.The next section reviews the game-theoretic motivationof PD theory. This is followed by the details of the resultingdistributed constrained optimization algorithm used t o solvethe fleet assignment problem.Bounded Rational G a m e TheoryIn noncooperativegame theory one has a set of N players.Each player z has its own set of allowed pure strategies. Amixed strategy is a distribution q.(xl) over player 2’s possiblepure strategies.Each player z also has a private utility function g. thatmaps the pure strategies adopted by all N of the playersinto the real numbers. Given mixed strategies of all theplayers, the expected utility of player i is:In a Nash equilibrium, every player adopts the mixedstrategy that maximizes its expected utility, given the mixedstrategies of the other players. Nash equilibria require theassumption of full rationality, that is, every player i cancalculate the strategies of the other players and its own associated optimal distribution.In the absence of full rationality, the equilibrium is determined based on the information available t o the players.Shannon realized that there is a unique real-valued quantification of the amount of syntactic information in a distribution P ( y ) . This amount of information is the negative ofthe Shannon entropy of that distribution:sociated q is given by the minimizer of the Lagrangian:where the subscript on the expectation value indicates thatit is evaluated under distribution q7 and the {pa}are “inversetemperatures” 0% l/To implicitly set by the constraints onthe expected utilities.The mixed strategies minimizing the Lagrangian are related to each other via(3)where the overall proportionality constant for each i is setby normalization, andIThe subscript q(,) on the expectation value indicates thatit is evaluated according the distributionq3. The expectation is conditioned on player i making move x l . InEq. ( 3 ) the probability of player i choosing pure strategy xtdepends on the effect of that choice on the utilities of theother players. This reflects the fact that the prior knowledgeconcerns all the players equally.Focusing on the behavior of player a, consider the case ofmaximal prior knowledge. Here the actual joint-strategy ofthe players and therefore all of their expected utilities areknown. For this case, trivially, the maxent principle saysthe “estimate” q is that joint-strategy (it being the q withmaximal entropy that is consistent with our prior knowledge). The same conclusion holds if our prior knowledgealso includes the expected utility of player i.Removing player 2’s strategy from this maximal priorknowledge leaves the mixed strategies of all players otherthan i, together with player i’s expected utility. Now theprior knowledge of the other players’ mixed strategies canbe directly incorporated into a maxent Lagrangian for eachplayer,nJ,,L(q1)Hence, the distribution with minimal information is theone that does not distinguish at all between the various y,Le., the uniform distribution. Conversely, the most informative distribution is the one that specifies a single possibley. Given some incomplete prior knowledge about a distribution P(y), this says that the estimate P ( y ) should containthe minimal amount of extra information beyond that already contained in the prior knowledge about P(y). Thisapproach is called the maximum entropy (maxent) principle and it has proven useful in domains ranging from signalprocessing t o supervised learning.i3NOWconsider an external observer of a game attemptingto determine the equilibrium, that is the joint strategy thatwill be followed by real-world players of the game. Assumethat the observer is provided with a set of expected utilitiesfor the players. The best estimate of the joint distribution qthat generated those expected utility values, by the maxentprinciple, is the distribution with maximal entropy, subjectto those expectation values.To formalize this approach, we assume a finite number ofplayers and of possible strategies for each player. Also, toagree with convention, it is necessary to f i p the sign of eachgl SO that t h e associated player i wants t o minimize thatfunction rather than maximize it.For prior knowledge consisting of the set of expectedutilities of the players { e l } , the maxent estimate of the as-- *- p*[G - E(g,)l - Sz(qt) aJ[ - qJ(x3)gZ(x)]-st(qz)G!Z3The solution is a set of coupled Boltzmann distributions:(4)FoIlowing Nash, Brouwer’s fixed point theorem can be usedto establish that for any non-negative values {p}, there mustexist at least one product distribution given by the productof these Boltzmann distributions (one term in the productfor each i).The first term in L1 is minimized by a perfectly rational player. The second term is minimized by a perfectlyzrrational player, Le., by a perfectly uniform mixed strategyq,. So p, in the maxent Lagrangian explicitly specifies thebalance between the rational and irrational behavior of theplayer. In the limit, 0 - 00, the set of q that simultaneously minimize the Lagrangians is the same as the set ofdelta functions about the Nash equilibria of the game. Thesame is true for Eq. (3). In fact, Eq.(3) is just a special caseof(4),where all player’s share the same private utility,G. Such games are known as team games. This relationshipreflects the fact that for this case, the difference between themaxent Lagrangian and the one in(2) is independent ofq,. Due to this relationship, the guarantee of the existenceof a solution to the set of maxent Lagrangians implies theexistence of a solution of the form R.(3).a.a.

4-Optimization Approach/,Given that the agents in a multi-agent system arebounded rational, if they play a team game with worldutility G, their equilibrium will be the optimizer of G. Furthermore, if constraints are included, the equilibrium will bethe optimizer of G subject to the constraints. The equilibrium can be found by minimizing the Lagrangian in Eq. (2)where the prior information set is empty, e.g. for all i,IE* (0).Start\IInitializationL17-Monte CarloSamplingFor each1sampleSpecifically for the unconstrained optimization problem,min G(5)z- IIassume each agent sets one component of Z as that agent'saction. The Lagrangian L t ( q z )for each agent as a functionof the probability distribution across its actions is,ConstrantsLi(q%) E[G(&,z(z))]- TS(qi) q*(z,)E[G(z,,z(,))(z,]- TS(qi)5,Eq (5)i1DistributionsUpdateMultipliersEq (6)Jwhere G is the world utility (system objective) which depends upon the action of agent i, x,, and the actions ofz(zl)pzjisthe other agents, xizj. The cxpectaiioll EiG(zt.2,evaluated according t o the distributions of the agents otherthan i:i#i1The entropy S is given by:,/-\ Stop iL,I23Fig. 3Each agent then addresses the following local optimizationproblem,min L, (4,)4,5%The Lagrangian is composed of two terms weighted bythe temperature T : the expected reward across 2's actions,and the entropy associated with the probability distribution across z's actions. During the minimization of theLagrangian, the temperature provides the means to trade-offexploitation of good actions (low temperature) with exploration of other possible actions (high temperature).The minimization of the Lagrangian is amenable to solution using gradient descent or Newton updating since boththe gradient and the Hessian are obtained in closed form.Using Newton updating and enforcing the constraint on total probability, the following update rule is obtained:q*(z,) ---t q*(z*)- aqa(z*)xwhere Q plays the role of a step size. The step size is requiredsince the expectations result from the current probabilitydistributions of all the agents.Constraints are included by augmenting the world utilitywith Lagrange multipliers, A,, and the constraint functions,CJ(5) G(5) G(5) Algorithm Flow- Chart.Role of Private UtilitiesPerforming the update involves a separate conditionalexpected utility for each agent. These are estimated either exactly if a closed form expression is available or withMonte-Carlo sampling if no simple closed form exists. InMonte Carlo sampling the agents repeatedly and jointlyIID (identically and independently distributed) sample theirprobability distributions to generate joint moves, and theassociated utility values are recorded. Since accurate estimates usually require extensive sampling, the G occurringin each agent 2's update rule can be replaced with a privateutility gz chosen to ensure that the Monte Carlo estimationof E(g,Iztr,)has both low bias (with respect to estimatingE(G1xE)and low variance.'*Intuitively bias represents the alignment between the private utility and world utility. With zero bias, updates whichreduce the private utility are guaranteed to also reduce theworld utility. It is also desirable for an agent to distinguishits contribution from that of the other agents: variance measures this sensitivity. With low variance, the agents canperform the individual optimizations accurately without alarge number of Monte-Carlo samples.Two private utilities were selected for use in the fleet assignment problem, Team Game (TG) and Wonderful LifeUtility (WLU). These are defined as:X,cJ(5)35 non-negative.)The update rule for thewhere the (areLagrange multipliers is found by taking the derivative ofthe augmented Lagrangian with respect to each Lagrangemultiplier, giving:where 77 is a separate step size.For the team game, the local utility is simply the worldutility. For WLU, the local utility is the world utility minusthe world utility with the agent action "clamped" by thevalue CL,. Here the clamping value fixes the agent actionto its lowest probability action. Both of these utilities havezero bias. However, due to the subtracted term, WLU hasmuch lower variance than TG.

5Table 2 Passenger demand for each arc as a function oftime.Detailed AlgorithmThe algorithm used to solve the fleet assignment andaircraft routing problem presented here is illustrated in Figure 3. The initialization step involves setting each agent'sprobabilities to uniform over its possible moves. The Lagrange multipliers for all the constraints are initialized tozero. A loop is then repeated until convergence. Within theloop the Monte-Carlo sampling is performed, after whichthe private utilities for each agent are computed. The number of function evaluations required depends on the privateutility. Team Game requires one function evaluation foreach Monte-Carlo sample, while the generic version of theWonderful Life utility requires as many function evaluationsas there are variables. Often the structure of the objectivefunction and constraints can be exploited in the evaluationof WLU t o avoid unnecessary function calls.799 In orderto demonstrate the performance of the algorithm withoutany such preprocessing, the present work makes no effortto exploit the structure of the fleet assignment formulationdescribed above. Due to the discrete nature of the agentmoves, the regression step involves simply averaging the private utility received for each agent's move over the MonteCarlo samples. Data aging is used within the regression topreserve information fiom previous iterations. The previousprivate utility estimates are weighted by a factor y comparedwith the new samples during the regression. The probabilities and Lagrange multipliers are then updated according toEqs. (5) and ( 6 ) , respectively. Eq. (5) automatically enforcesthe constraint on total probability but does not prevent negative probabilities. To prevent negative probabilities theprobability update is modified by setting a l l componentsthat would be negative to a small positive value, typically1xand then re-normalizing. Finally, the convergencecriterion is checked, in this case a combination of the normsof the probability and Lagrange multiplier updates: uF55 1 x 1 0 - with:iZlIf the criterion is not met, the sampling and update processis repeated.Table 3 Number of flights assigned to each arc at eachtime segment.RDFC1HThe resident fleet at each airport required tooptimally solve the problem.Fig. 4ResultsLinearized Constraints and Objective F'unctionFor assessment purposes, both the PD framework andAMPL-CPLEX'5.16 were applied to a linearized version ofthe 9-city, 20-arc fleet assignment problem (CPLEX doesnot support non-linear constraints).In this case, the passenger capacity w was fixed to 100.The linear objective becomes:with the total fleet passenger capacity:c,,j 100.The problem features a time-dependant, asymmetric demand structure as shown in Table 2.To enhance the convergence speed, the objective wassquared, effectively dramatizing the topology of the problem:IBoth optimization tools reached global minimum with afleet size of 43 aircraft: 228 flights are required, yieldingan objective of 51,984. The number of flights assigned toeach route is shown in Table 3, and the resident fleet size ateach airport is illustrated in Figure 4. In order to capturethe stochastic nature of the approach, the optimization wasrepeated 20 times. The figures show averages and rangesfor the minimum objective in each block of Monte-Carlosamples. Each iteration is an update to the probability distributions using a single block of Monte-Carlo samples.The importance of selecting the appropriate private utilityis shown in Figure 5. For each utility, the best temperature

6x 0100200 2/4.0 02080100IterationsFig. 5 Comparison of convergence with two private utilities (200 Monte Carlo samples).7.5 56040iterationsEffects of temperature on convergence (200Monte Carlo samples, WLU, 20 runs).Fig. 71 o4200 MC samples1000 MC samdes7-0.-6.5 -W0a06-5.5 -5,:10203040504'iterationsFig. 6 Increasing sampling speeds convergence (Temperature 10, WLU, 20 runs).was selected, 10 for WLU and 1000 for TG. The results showthat WLU performs considerably better than Team Game.This is consistent with previous applications ofAs illustrated in Figure 6, the number of Monte Carlosamples between updates affects the rate of convergence. Inalmost all cases, 50 samples were not sufficient to find theminimum objective. With 200 samples, the minimum wasfound in 18 of 20 cases. Increasing the number of samplesto 1000 resulted in all cases converging to the minimum.Similarly, selecting the correct temperature influences theoptimization process (Figure 7). A low temperature ( T l )did not allow enough exploration, while a high temperature(T 100) slowed convergence. For this example, a moderatetemperature (T 10) offered the best trade-off between exploration and exploitation. In particular, the case with thelowest temperature rapidly converged to an infeasible minimum. The objective then grew as the Lagrange multipliersincreased. The optimizer, at this low temperature, is unableto explore other regions of the design space.Non-Linear Constraints and Objective FunctionFollowing the above linearized study, the methodologywas applied to the fully non-linear problem: the PD framework's ambivalence towards constraint and objective typesmakes it ideal in this case. The PD optimizer obtains asolution that is very close to optimum: it selects the correct aircraft, with 200 passenger capacity (resulting in an204060a0100IterationsYlg. 8Lomparlson or oprlmum r o rrmt: iiieurr eu{UAC;Uairplane capacity) and non-linear (variable airplane ca-pacity) problems.LTO cost factor of 1.5) and requires a total of 146 flightsto meet demand. The objective is therefore 47,961. Figure 8 compares the convergence rate of the variable aircrafttype problem with the performance of the fixed aircraft typeproblem - while the convergence takes slightly more iterations, no changes in the parameter settings were been made.For comparison purposes, running AMPL-CPLEX with afixed passenger capacity of 200 resulted in a solution with142 flights required, for an objective of 45,369. When thePD framework is run with fixed aircraft capacity, this sameresult is obtained.ConclusionA collective-intelligence framework was successfully applied to a sample fleet assignment problem and yielded globally optimum solutions. With the basic framework provento handle highly-constrained design spaces with non-linearconstraints and objectives, a fleet assignment problem ofmore realistic size can be approached. The function evaluation was carefully formulated to allow for scalability andautomation, and features such as transfer passengers, environmental considerations, and maintenance visit requirements can be implemented. Exploring other types of agents(perhaps airports or arcs) and developing problem-specificlocal utilities may also yield faster convergence rates and

70,req6re fewer Monte Carlo samples.nr.,,,,ILCLGL ubbU'Etschmaier, M., and Mathaisel, D., "Airline Scheduling: AnOverview," Transportation Science 19, 1985, pp. 127-138.'Rushmeier, R., and Kontogiorgis, A., "Advances in t h e Optimization of Airline Fleet Assignment," Transportation Science 31,1997, pp. 159-169.3Abara, J., "Applying integer linear programming to the fleetassignment problem," Interfaces 19, 1989.4Hane, C., et al., "The fleet assignment problem: Solving a largescale integer program," Mathematical Programming 70, 1995, pp.211-232.'Ahuja, R., e t al., "A very large scale neighborhood search algorithm for the quadratic assignment problem," Submitted to INFORMS Journal on Computing, 2002.'Chung, T., and Chung J., "Airline Fleet Assignment UsingGenetic Algorithms," GECC02002, 2002 Genetic and EvolutionaryComputation Conference, New York, New York, July 11-13, 2002, p.255.'Tumer, K., Agogino A., and.Wolpert, D. H., "Learning sequencesof actions in collectives of autonomous agents," In Proceedings of theFirst International Joint Conference on Autonomous and MdtiAgent Systems, Bologna, Italy, July 2002.'Wolpert, D. H., Tumer, K., "Collective Intelligence, Data Routing, and Braess' Paradox," Journal of Artificial Intelligence Research,2002.'Wolpert, D.H., Wheeler, K., Tumer, K., "Collective Intelligencefor Control of Distributed Dynamical Systems," Europhysics Letters,vol. 49 issue 6, 708-714, 2000."Wolpert, D.H., "Information theory - the bridge connectingbounded jrational game theory and statistical physics", in ComplexEngineering &stems, D. Braha, Ali Minai, and Y. Bar-Yam (Editors), Perseus books, in ipress.IlBieniawski, S., Wolpert, S., and Kroo I., "Discrete, Continuous, and Constrained Optimization Using Collectives," 10thAIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Albany, New York 30 Aug - 1 Sep 2004.12Macready, W., Bieniawski, S., Wolpert, D., "Adaptive MultiAgent Systems for Constrained Optimization," Submitted t o AAAl2004.13hlackay, D., "Information theory, inference, and learning algcrithms," Cambridge University Press, 2003.14Duda, R., Hart, P.,and Stork, D., P a t t e r n Recognition, Wiley,2001.15Fourer, R., Gay, D., and Kernighan, B., AMPL: a ModelingLanguage for Mathematicnl Pmgmmming, Boyd and Fraser, 1993."Anon., ILOG AMPL CPLEX System User's Guide, ULOG,2002.

fleet constraints. Finally, the total fleet size F is enforced using: slk - F 5 0 k This results in a total of 184 constraints. Collective Intelligence and Product Distribution Theory Collective Intelligence (COIN) is a framework for design- ing a collective, defined as a group of agents with a specified world utility or system-level objective.

Related Documents: