Estimation Of Space–Time Branching Process Models In .

2y ago
10 Views
2 Downloads
1.79 MB
11 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Joanna Keil
Transcription

JASA jasa v.2007/01/31 Prn:8/02/2008; 13:181F:jasaap06341r1.tex; (R) p. 1Estimation of Space–Time Branching Process Modelsin Seismology Using an EM–Type Algorithm2361Alejandro V EEN and Frederic P. S CHOENBERG6246356789101112131464Maximum likelihood estimation of branching point process models via numerical optimization procedures can be unstable and computationally intensive. We explore an alternative estimation method based on the expectation-maximization algorithm. The method involvesviewing the estimation of such branching processes as analogous to incomplete data problems. Using an application from seismology, weshow how the epidemic-type aftershock sequence (ETAS) model can, in fact, be estimated this way, and we propose a computationallyefficient procedure to maximize the expected complete data log-likelihood function. Using a space–time ETAS model, we demonstrate thatthis method is extremely robust and accurate and use it to estimate declustered background seismicity rates of geologically distinct regionsin Southern California. All regions show similar declustered background intensity estimates except for the one covering the southern sectionof the San Andreas fault system to the east of San Diego in which a substantially higher intensity is observed.KEY WORDS: Branching process models; Earthquakes; Epidemic-type aftershock sequence model; Expectation-maximization algorithm; Maximum likelihood; Space–time point process 86970717273741. INTRODUCTIONOgata and Akaike (1982) in the context of branching processesin seismology. By coupling two well-established estimationmethods (EM and a partial information approach), we are ableto present a highly robust and accurate estimation procedurethat can be used to estimate even very complex branchingprocess models. To demonstrate the properties of our proposedmethod, we use a space–time ETAS model to simulate earthquake catalogs and then compare the results of the EM-typeestimation algorithm to the conventional ML procedure using anumerical optimization approach.Following a description in Section 2 of self-exciting pointprocess models for earthquake occurrences, we describe someof the problems with conventional ML estimation of such models in Section 3. Section 4 describes a proposed alternative estimation method, based on the EM algorithm, and shows itsrobustness and accuracy using simulations. The EM-type algorithm is then used in Section 5 to estimate background seismicity rates for Southern California, and the article will concludewith a discussion in Section 6.Point process models have long been used to describe earthquake occurrences (Vere-Jones 1970, 1975). See Ogata (1999)for a nice review. Some of the early applications fitted Neyman–Scott-type models in which main shocks are viewed as cluster centers, each of which may trigger a random number ofaftershocks with magnitudes not larger than the main shock(Vere-Jones 1970; Hawkes and Adamopoulos 1973). More recent work has favored the use of branching process models inwhich all earthquakes can trigger aftershocks, and among these,the epidemic-type aftershock sequence (ETAS) model is considered to be one of the standard models in seismology (Ogata1988, 1998).ETAS and other branching process models are commonlyestimated using maximum likelihood (ML). However, closedform solutions are usually not available, and numerical maximization algorithms must be employed. In such situations, computational difficulties can arise, especially if the models arecomplex, multidimensional, and nonlinear, as this often leadsto multimodal or extremely flat log-likelihood functions.The view of branching process models as incomplete dataproblems suggests the use of the expectation-maximization(EM) algorithm in order to attain maximum likelihood estimates (MLEs) (Dempster, Laird, and Rubin 1977). In this context, the information about which event “triggers” each otherevent is unobservable and can be described probabilistically.The EM algorithm involves maximizing the expected completedata log-likelihood, which in the context of branching pointprocess models is based on the probabilistic incorporation ofthe branching structure and is usually easier to maximize. Using an analogous expression, conventional ML maximizes theincomplete data log-likelihood.In this article, we show how an EM-type algorithm can becombined with a partial information approach in certain steps.This relates to partial likelihood maximization, which was introduced by Cox (1975) and which was briefly discussed ejandro Veen is Researcher, IBM T. J. Watson Research Center, YorktownHeights, NY 10598 (E-mail: aveen@us.ibm.com). Frederic P. Schoenberg isProfessor, Department of Statistics, University of California, Los Angeles, CA90095-1554 (E-mail: frederic@stat.ucla.edu). This material is based on worksupported by the National Science Foundation under grant 0306526. We thankYan Kagan, Yingnian Wu, Ilya Zaliapin, and anonymous referees for helpfulcomments, and the Southern California Earthquake Center for its generosity insharing its data. All computations were performed in R.2. SELF–EXCITING POINT PROCESSES ANDTHE ETAS MODEL95Consider a simple, temporal point process N on [0, )adapted to a filtration Ht . Assuming it exists, the conditionalintensity λ(t Ht ) is defined as the unique, nondecreasing,H predictable process such that N ([0, t)) λ(t Ht ) dt is anH-martingale. In this representation, H must contain the historyof the process up to time t, denoted as Ht {ti : ti t} with tias the time event i occurs, but H may contain additional information as well. Because the finite-dimensional distributionsof such a point process are uniquely determined by its conditional intensity (Daley and Vere-Jones 2003), one way to modela point process is via its conditional intensity.In self-exciting point processes,the conditional intensity is given by λ(t Ht ) μ i : ti t g(t ti ), where μ 0, 0 for nonnegative v and equals 0 otherwise, and g(v) g(v)dv 1 in order to ensure stationarity (Hawkes01971a,b). Early applications of self-exciting point processes toearthquake occurrence models can be found in Hawkes 113114115 0 American Statistical AssociationJournal of the American Statistical Association? 0, Vol. 0, No. 00, Applications and Case StudiesDOI 10.1198/0162145080000001481116117118

JASA jasa v.2007/01/31 Prn:8/02/2008; 13:18F:jasaap06341r1.tex; (R) p. 22123456789Journal of the American Statistical Association, ? 0Adamopoulos (1973), Lomnitz (1974, chap. 7), and Kagan andKnopoff (1987).A particularly important example of a self-exciting pointprocess is the ETAS model, which was first introduced by Ogata(1988) and is widely used to describe earthquake occurrences.Early forms of the ETAS model (Ogata 1988) only took magnitudes and earthquake occurrence times into account: λ(t Ht ) μ g(t ti , mi ),i:ti t1011121314151617where the history of the process Ht {(ti , mi ) : ti t} also includes earthquake magnitudes mi , μ is the (in this case constant) background intensity of earthquake occurrences, and g(·)is the so-called “triggering function,” because it describes howearthquakes trigger aftershocks. One possible triggering function suggested in Ogata (1988) isK0g(τi , mi ) ea(mi M0 ) ,(τi c)(1 ω)1819202122232425262728293031323334353637where τi t ti is the time elapsed since earthquake i, K0 0is a normalizing constant governing the expected number ofdirect aftershocks triggered by earthquake i, the parametersc, a, ω 0, and M0 is the “cutoff magnitude,” that is, the lowestearthquake magnitude in the dataset under consideration. Theterm K0 /(τi c)(1 ω) describing the temporal distribution ofaftershocks is known as the modified Omori–Utsu law. Whilethe literature in seismology usually lets ω 1, the interpretation of the modified Omori–Utsu law as a probability densityfunction requires strictly positive values for ω.The ETAS model has since been extended to describe thespace–time–magnitude distribution of earthquake occurrences(Ogata 1993b, 1998). A version suggested in Ogata (1998) usescircular aftershock regions where the squared distance betweenan aftershock and its triggering event follows a Pareto distribution:λ(t, x, y Ht )38 μ(x, y) 39g(t ti , x xi , y yi , mi ),(1)i : ti t4041 with triggering function4243g(t ti , x xi , y yi , mi )44454647484950515253545556575859 ea(mi M0 )K0,(1 ω)(t ti c)((x xi )2 (y yi )2 d)(1 ρ)(2)where (xi , yi ) represents the epicenter of earthquake i, d 0and ρ 0 are parameters describing the spatial distribution oftriggered seismicity, and the history of the process up to time tis now defined as Ht {(ti , xi , yi , mi ) : ti t}. One characteristic of this model is that the aftershock zone does not scalewith the magnitude of the triggering event. It has been suggested that aftershock zones increase with main shock magnitudes, and some recent research seems to support this view(Kagan 2002b). Examples of how this can be incorporated intothe ETAS model can be found in Kagan and Knopoff (1987)and Ogata (1998).Summing things up, the ETAS model can be described as abranching process with immigration (spontaneous backgroundearthquakes). The aftershock activity is modeled through a triggering function consisting of two terms, one of which modelsthe expected number of aftershocks for earthquake i while theother models the temporal or space–time distribution of the triggered aftershocks.6061626364653. MAXIMUM LIKELIHOOD ESTIMATION OFTHE ETAS MODEL666768Conventional ML estimation attempts to maximize log λ ti , xi , yi Hti (θ ) i T y1 y0707172x1λ(t, x, y Ht ) dx dy dt,069(3)x074the incomplete data log-likelihood function of model (1), whereθ (μ, K0 , a, c, ω, d, ρ) is the parameter vector and [x0 , x1 ] [y0 , y1 ] [0, T ] is the space–time window in which the dataset(xi , yi , ti , mi ) is observed (Ogata 1998; Daley and Vere-Jones2003, chap. 7). The term “incomplete data” signifies that (θ )does not incorporate any information about the branching structure, and it is used to distinguish it from the (in practiceunattainable) complete data log-likelihood function introducedin Section 4. Typically, (3) is maximized by using a numericaloptimization routine, because no closed-form solution is generally available. Unfortunately, in cases where the log-likelihoodfunction is extremely flat in the vicinity of its maximum, suchoptimization routines can have convergence problems and canbe substantially influenced by arbitrary choices of starting values. To distinguish the conventional MLE computed by numerical maximization from the one based on the EM-type algorithmpresented later, we will denote the former by θ̂ num and the latter by θ̂ EM . Similarly, the vector components will be denotedby ω̂num , ω̂EM , and so on.An illustration may be helpful to demonstrate some of thedifficulties encountered when directly maximizing (3) using numerical methods. Figure 1 shows a simulated earthquake catalog of 638 events, using model (1). The space–time windowused for this simulation is similar to the Southern Californiadataset described in Section 5. Background earthquakes aresimulated on an area of 8 of longitude by 5 of latitude over aperiod of 7,500 days (approximately 20 years). Parameter values, as shown in Table 1, are chosen to approximate those usedin descriptions of earthquake catalogs, based on Ogata (1998)as well as discussions with UCLA seismologists Yan Y. Kaganand Ilya Zaliapin. For simplicity, a truncated exponential distribution is used to model earthquake magnitudes in accordancewith the Gutenberg–Richter law (Gutenberg and Richter 1944):fGR (m) 73βe β(m M0 00101102103104105106107108,)(4)1 e β(MGR M0where fGR (m) is the probability density function, β log(10),max 8, where the lower threshold is theand M0 2 m MGRapproximate current threshold (since 2001) above which catalogs of the Southern California Seismic Network (SCSN) arebelieved to be complete (Kagan 2002a, 2003) and the upperthreshold is the approximate magnitude of the strongest California earthquakes in historic times, the 1857 Fort Tejon earthquake and the “great” San Francisco earthquake of 1906.max75109110111112113114115116117118

JASA jasa v.2007/01/31 Prn:8/02/2008; 13:18F:jasaap06341r1.tex; (R) p. 3Veen and Schoenberg: Estimation of Space–Time Branching Process 711372147315741675177618192021222377Figure 1. Simulated earthquake process using a space–time–magnitude ETAS model. This figure shows a simulated earthquake catalog usingmodel (1) with a parameterization as described in Table 1. The simulated catalog consists of 638 earthquakes (241 background events and397 aftershocks). The spatial distribution is presented in (a), although 32 (triggered) earthquakes are not shown because they are outside thespecified space–time window. The temporal distribution is shown in (b). The spike in activity starting on day 5,527 is caused by a magnitude5.33 earthquake and its aftershocks and corresponds to the cluster on the lower right side of (a).242526272829303132333435363738394041424346The use of numerical methods to maximize (3) can be problematic in cases where the log-likelihood is extremely flat, unless some supervision is imposed and intelligent starting valuesare used. Figure 2, for instance, shows the incomplete data loglikelihood for variations of each component of the parametervector by up to 50% around the MLE. The function is quite flataround θ̂ num , especially with regard to the parameters μ, K0 , c,and d; as a result these parameters are difficult to estimate, andthey generally are associated with rather large standard errors aswell as numerical challenges during the estimation procedure.The parameters a, ω, and ρ, on the other hand, show much morepeaked log-likelihood functions and can, hence, be estimatedmore stably.The issue of log-likelihood flatness can be aggravated in amultidimensional context. In Figure 3, two parameters are varied while the others remain constant at their MLEs. Again,(3) can stay extremely flat along certain trajectories, even forlarge deviations from the MLEs. The parameter c, for instance,Table 1. Specification of the space–time–magnitude ETAS model (1)used for simulation4748ParameterSpace–time windowfor background eventsValue4950515253μ(x, y)K0acω.0008.00003052.3026.01.5[0 , 8 ] [0 , 5 ] [0, 7,500 days]Parameters of the magnitude 0maxMGR28NOTE: This specification uses a homogeneous background intensity (measured in eventsper day per squared degree) that does not depend on the location. The time is measured indays; spatial distances are measured in degrees. A truncated exponential distribution (4) isused to simulate magnitudes.can be increased to more than four times its MLE and ω increased to double its MLE, yet the log-likelihood function isreduced only very slightly. The problem of log-likelihood flatness becomes increasingly severe as more and more parametersare varied at once. In more realistic settings, where ETAS models are estimated for actual earthquake catalogs, none of theparameters would be known in advance.Note that flatness in the log-likelihood function does not necessarily imply that accurate estimation of the parameters in theETAS model is unimportant. Some of the ETAS parametershave physical interpretations that can be used by seismologiststo characterize earthquake catalogs (see Ogata 1998, and thereferences therein). Further, the accurate estimation of ETASparameters is important for comparisons of certain parametersfor different catalogs as well as for studies of bias in ETAS parameter estimation.In cases where the log-likelihood function is extremely flat,the choice of starting values can influence the results. In Figure 4, conventional ML estimation is performed using eightdifferent starting values and a standard Newton–Raphson optimization routine. The true values of Table 1 are used as starting values for all parameters except K0 and a, which are variedas indicated in the figure. In two cases, the estimation resultsare quite close to the true θ . In four cases, the algorithm findsreasonable estimates for most parameters but fails to find an acceptable estimate for K0 : In fact, the algorithm does not changethe starting value for K0 at all, even though it is roughly 33%off its true value. In two of the cases, the algorithm fails to converge.As shown in Figure 4, even if the starting values are closeto the MLE, any departure from an optimal choice of tolerancelevels and stopping criteria for the numerical maximization procedure can lead to poor convergence results. In this example, theNewton-type algorithm used to find the MLEs actually yieldsbetter results if the starting values for K0 are not too close 05106107108109110111112113114115116117118

JASA jasa v.2007/01/31 Prn:8/02/2008; 13:1841F:jasaap06341r1.tex; (R) p. 4Journal of the American Statistical Association, ? 4731574167517761819202122Figure 2. Flatness of the incomplete data log-likelihood function for varying one parameter at a time. This figure demonstrates the relativeflatness of the incomplete data log-likelihood function (3) with respect to the components of the parameter estimate θ̂ num (two graphs are shownto improve the legibility). The log-likelihood stays very flat when μ, K0 , c, and d are varied around their MLEs. This indicates potentiallyhigh standard errors and/or numerical challenges for the estimation of these parameters. The parameters a, ω, and ρ have much more peakedlog-likelihood functions and can, hence, be estimated more easily.2326272829303132333435363779808183the true values, as the computation of gradients may actually bemore accurate in locations farther away from the true parametervalue.Another problem often encountered in practice is that thelog-likelihood (3) can be multimodal (see, e.g., Ogata andAkaike 1982). Whenever the numerical optimization routineconverges to a solution, it is quite difficult to determine whetherit has converged to a local maximum or to the global maximum.Even if the log-likelihood is unimodal, in cases where the loglikelihood is flat there can be numerical multimodality due torounding errors, the way these errors affect intermediate and final results, and the way values are stored in memory. In caseswhere the log-likelihood surface is extremely flat such as thoseshown in Figures 3 and 4, such numerical problems can explainwhy θ̂ num can be very far from the true parameter value. Thismakes it difficult to perform simulation studies of bias and asymptotic properties for which an automatic procedure would bedesirable.While the main focus of this article is to present how branching process models can be estimated using an EM-type algorithm, a few remarks will be added here on how to improveconventional ML estimation via a numerical maximization ofthe incomplete data log-likelihood. One approach is to maximize (3) with respect to log(c) and log(d), in place of c and d.This procedure is computationally more stable because c andd can be quite ill constrained. Also, the simplex algorithm 850109511105211153112541135556575859114Figure 3. Flatness of the incomplete data log-likelihood in multidimensional settings. The problem of flatness of the incomplete datalog-likelihood function (3) (shown in gray levels) can be aggravated in a multidimensional context. In this analysis, pairs

to multimodal or extremely flat log-likelihood functions. The view of branching process models as incomplete data problems suggests the use of the expectation-maximization (EM) algorithm in order to attain maximum likelihood esti-mat

Related Documents:

Feature Branching: Branching based on the features to be developed Release Branching: A branch is created to stabilize the release and later merged to main branch after the release. Quality Branching: Branching done for different teams with focus on quality TFS allows merging of code from different branches.

TFS Branching Guidance Page 4 Question Are work items a part of the branching/merging model? Answer No, work items exist at a logical level that is distinct from that of the source control repository. That means the branching of the source code or merging between two branches has no impact on work items:

The best folder structure depends which branching model your team chooses. (We discussed branching models in Part 1 of this guide.) For simplicity, this section assumes you are using Service and Release Isolation, one of the most common branching structures, although we will mention some of the others as well.

A spreadsheet template for Three Point Estimation is available together with a Worked Example illustrating how the template is used in practice. Estimation Technique 2 - Base and Contingency Estimation Base and Contingency is an alternative estimation technique to Three Point Estimation. It is less

Introduction The EKF has been applied extensively to the field of non-linear estimation. General applicationareasmaybe divided into state-estimation and machine learning. We further di-vide machine learning into parameter estimation and dual estimation. The framework for these areas are briefly re-viewed next. State-estimation

nonlinear state estimation problem. For example, the aug-mented state approach turns joint estimation of an uncertain linear system with afne parameter dependencies into a bilinear state estimation problem. Following this path, it is typically difcult to provide convergence results [6]. Joint parameter and state estimation schemes that do provide

into two approaches: depth and color images. Besides, pose estimation can be divided into multi-person pose estimation and single-person pose estimation. The difficulty of multi-person pose estimation is greater than that of single. In addition, based on the different tasks, it can be divided into two directions: 2D and 3D. 2D pose estimation

Jonathan Sutherland-Cropper 1971 Alison Summers 1971 Dinah Stehr 1971 Matthew Simpson 1971 Christine Ryan 1971 . Frances Anne Hutchinson 1971 John Homann 1971 David Hill 1971 Richard Hield 1971 Robert Haydon 1971 Lynette Harrison 1971 Michael Harris 1971 Diana Hardwicke 1971 Piers Harden 1971 John Handmer 1971 Anne Hamilton 1971 Tom Hall 1971 Peter Greed 1971 Margaret Gray 1971