Multivariate Longitudinal Data Analysis For Actuarial Applications

1y ago
13 Views
2 Downloads
3.58 MB
28 Pages
Last View : 8d ago
Last Download : 3m ago
Upload by : Kian Swinton
Transcription

Multivariate longitudinal data analysis foractuarial applicationsPriyantha Kumara and Emiliano A. Valdezastin/afir/iaals Mexico Colloquia 2012Mexico City, Mexico, 1-4 October 2012P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis1/28

OutlineIntroductionSome literatureThe model specificationNotationKey features of our approachMultivariate joint distributionChoice for the marginals: the class of GB2Case studyGlobal insurance demandAdditional work intendedSelected referenceP. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis2/28

IntroductionIn the presence of repeated observations over time, the naturalapproach for data analysis is univariate longitudinal model.(e.g. Shi and Frees, 2010 and Frees et al, 1999)Repeated observations over time for many responses requiremultivariate longitudinal framework and is increasing inpopularity in data analysis, e.g. biometrics.There is a developing interest on multivariate longitudinalanalysis in actuarial context (e.g Shi, 2011).Model accuracy, and further understanding, can be improvedby incorporating dependency among multiple responses.Very often because of simplicity, response variables aretypically assumed to have multivariate normal distribution.P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis3/28

Some literatureFrees, E.W. (2004). Longitudinal and panel data: analysis and applications inthe social sciences. Cambridge University Press, Cambridge.The random effects approachReinsel, G. (1982). Multivariate repeated-measurement or growth curvemodels with multivariate random-effects covariance structure. Journal ofthe American Statistical Association 77: 190-195.Shah, A., N.M. Laird, and D. Schoenfeld (1997). A random effects modelwith multiple characteristics with possibly missing data. Journal of theAmerican Statistical Association 92: 775-79.Fieuws, S. and G. Verbeke (2006). Pairwise fitting of mixed models forthe joint modeling of multivariate longitudinal profiles. Biometrics 62:424-431.Seemingly unrelated regressions (SUR) approachRochon, J. (1996) Analyzing bivariate repeated measures for discrete andcontinuous outcome variable. Biometrics 52: 740-50.Copula approachLambert, P. and F. Vandenhende (2002). A copula based model formultivariate non normal longitudinal data: analysis of a dose titrationsafety study on a new antidepressant. Statistics in Medicine 21:3197-3217.Shi, P. (2011). Multivariate longitudinal modeling of insurance companyexpenses. Insurance: Mathematics and Economics. In Press.P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis4/28

Our contributionMethodologyWe propose the use of a random effects model to capturedynamic dependency and heterogeneity, and a copula functionto incorporate dependency among the response variables.Multivariate longitudinal analysis for actuarial applicationsWe intend to explore actuarial-related problems withinmultivariate longitudinal context, and apply our proposedmethodology.NOTE: Our results are very preliminary at this stage.P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis5/28

NotationSuppose we have a set of q covariates associated with n subjectscollected over T time periods for a set of m response variables.Let yit,k denote the responses from ith individual in tth time periodon the k th response. By letting yit (yit,1 , yit,2 , . . . , yit,m )0 fort 1, 2, . . . , T , we can express Yi (yi1 , yi2 , . . . , yiT ).Covariates associated with the ith subject in tth time period on thek th response can be expressed as xit (xit,1 , xit,2 , . . . , xit,m )where xit,k (xit1,k , xit2,k , . . . , xitp,k ) for k 1, 2, .m.We use αik to represent the random effects componentcorresponding to the ith subject from the k th response variable.G (αik ) represents the pre-specified distribution function of randomeffect αik .P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis6/28

Key features of our approachObviously, the extension from univariate to multivariatelongitudinal analysis.Types of dependencies captured:the dependence structure of the response using copulas provides flexibilitythe intertemporal dependence within subjects andunobservable subject-specific heterogeneity captured throughthe random effects component - provides tractabilityThe marginal distribution models:any family of flexible enough distributions can be usedchoose family so that covariate information can be easilyincorporatedOther key features worth noting:the parametric model specification provides flexibility forinference e.g. MLE for estimationmodel construction can accommodate both balanced andunbalanced data - an important feature for longitudinal dataP. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis7/28

Copula functionFor arbitrary m uniform random variables on the unit interval,copula function, C, can be uniquely defined asC(u1 , . . . , um ) P (U1 u1 , . . . , Um um ).Joint distribution:F (y1 , . . . , ym ) C(F1 (y1 ), . . . , Fm (ym )),where Fk (yk ) are marginal distribution functions.Joint density:f (y1 , . . . , ym ) c(F1 (y1 ), ., Fm (ym ))mYfk (yk ),k 1where fk (yk ) are marginal density functions and c is thedensity associated with copula C.P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis8/28

Multivariate joint distributionSuppose we observe m number of response variables over T timeperiods for n subjects. Observed data for subject i is{(yi1,1 , yi1,2 , . . . , yi1,m ), . . . , (yiT,1 , yiT,2 , . . . , yiT,m )}so thatYit (yit,1 , yit,2 , . . . , yit,m ) for i 1, 2, . . . , n and t 1, 2, . . . , Tis the ith observation in the tth time period corresponding to mresponses. The joint distribution of m response variables over timecan be expressed asH(yi1 , . . . , yiT ) P(Yi1 yi1 , . . . , YiT yiT ).If {αik } represent random effects with respect to the k th responsevariable, conditional joint distribution at time t isH(yit αi1 , . . . , αim ) C(F (yit,1 αi1 ), . . . , F (yit,m αim )).P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis9/28

- continuedConditional joint density at time t:h(yit αi1 , . . . , αim ) c(F (yit,1 αi1 ), . . . , F (yit,m αim ))mYf (yit,k αik )k 1where F (yit,k αik ) denotes the distribution function of k thresponse variable at time t. If ω represents the set of parameters inthe model, the likelihood of the ith subject is given byL(ω (yi1 , . . . , yiT )) h(yi1 , . . . , yiT ω).We can writeZh(yi1 , . . . , yiT ω) Zh(yi1 , . . . , yiT αi1 , . . . , αim ).αi1αimdG (αi1 ) · · · dG (αim )Under independence over time for a given random effect:h(yi1 , . . . , yiT αi1 , . . . , αim ) P. Kumara and E.A. Valdez, U of ConnecticutTYt 1h(yit αi1 , . . . , αim )Multivariate longitudinal data analysis10/28

- continuedZTYZ. αi1h(yit αi1 , . . . , αim )dG (αi1 ) · · · dG (αim )αim t 1and from the previous slides, we haveZTYZ. αi1c(F (yit,1 αi1 ), . . . , F (yit,m αim ))αim t 1mYf (yit,k αik )dG (αi1 ) · · · dG (αim )k 1Then, we can write the log likelihood function asXilognZZ.αi1T YmYc(F (yit,1 α1 ), . . . , F (yit,m αm ))αim t 1 k 1 f (yit,k αik )dG (αi1 ) · · · dG (αim )P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysiso11/28

Choice for the marginals: the class of GB2The model specification is flexible enough to accommodate anymarginals; however, for our purposes, we chose the class of GB2distributions. For Y GB2(a, b, p, q) with a 6 0, b, p, q 0:Density function:fy (y) a y ap 1 baqB(p, q)(ba y a )(p q)where B (·, ·) is the usual Beta function.Distribution function: (y/b)aFy (y) B; p, q1 (y/b)awhere B (·; ·, ·) is the incomplete Beta function.Mean:B (p 1/a, q 1/a)E(Y ) b.B(p, q)P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis12/28

GB2 regression through the scale parameterSuppose x is a vector of known covariates:We have: Y x GB2(a, b(x), p, q), whereb(x) α β 0 x0Define residuals εi Yi e (αi β xi ) so thatlog Yi αi β 0 xi log εiwhere εi GB2(a, 1, p, q)).PP plots can then be used for diagnostics.See also McDonald (1984), McDonald and Butler (1987)P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis13/28

Case study - global insurance demandSource: Swiss Re Economic Research & ConsultingResponse variables that can be used for insurance demand:Insurance density: Premiums per capitaInsurance penetration: Ratio of insurance premiums to GDPInsurance in force: Outstanding face amount plus dividendSome common covariates that have appeared in the literature:IncomeUrbanizationGDP growthDependency ratioInflationDeath ratioEducationP. Kumara and E.A. Valdez, U of ConnecticutLife expectancyMultivariate longitudinal data analysis14/28

About the data setData set2 responses: life and non-life insurance5 predictor variables75 countries (originally, later removed 3 countries)6 years data (from year 2004 to year 2009)Variables in the modelDependent variablesNon-life densityPremiums per capita in non-life insuranceLife densityPremiums per capita in life insuranceIndependent variablesGDP per capitaRatio of gross domestic product (current US dollars) to total populationReligiousPercentage of Muslim populationUrbanizationPercentage of urban population to total populationDeath ratePercentage of deathDependency ratioRatio of population over 65 to working populationSources: Swiss Re sigma reports through the Insurance Information Institute (III); World BankP. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis15/28

Multiple time series plotNon-life insuranceLife insurance200080006000IrelandUSAUK2000Premiums per capita10000Ireland40003000Switzerland1000Premiums per capita4000Netherland00USA2004200520062007YearP. Kumara and E.A. Valdez, U of tivariate longitudinal data analysis16/28

Multiple time series plot: removed 3 countriesAfter removing Ireland, Netherlands and the UK in the dataset:Life insurance0050010002000premiums per capita15001000premiums per capita200030002500Non-life insurance2004200520062007yearP. Kumara and E.A. Valdez, U of tivariate longitudinal data analysis17/28

Some summary statisticsSummary statistics of variables in year 2004 to 2009:VariableNon-life insuranceLife insuranceGDP per capitaDeath rateUrbanizationReligiousDependency ratioMinimumMaximumMean(0.74, 1.26)(0.49, 1.28)(375.20, 550.90)(1.50, 1.52)(11.92, 13.56)(0.01,0.01)(1.25, 1.39)(2427.61, 2857.40)(3058.58, 3803.76)(56311.50, 94567.90)(16.17, 17.11)(100,100)(99.61, 99.61)(29.31, 33.92)(386.28, 516.99)(503.87, 697.39)(13896.60, 20524.50)(7.87, 8.00)(64.90, 66.29)(22.12, 22.12)(14.89, 15.55)Correlation withLife insurance(0.75, 0.80)(0.77, 0.82)(0.09, 0.11)(0.37, 0.42)(-0.30, -0.29)(0.57, 0.61)Correlation withNon-life insurance(0.75, 0.80)(0.90, 0.91)(0.06, 0.07)(0.45, 0.46)(-0.30, -0.28)(0.57, 0.60)Correlation matrix of covariates in year 2004 to 2009:GDP per capitaDeath rateUrbanizationReligiousDependency ratioGDP percapita(0.01, 0.03)(0.49, 0.52)(-0.29, -0.25)(0.58, 0.62)P. Kumara and E.A. Valdez, U of ConnecticutDeathrate(-0.16, -0.15)(-0.38, -0.34)(0.53, 0.54)UrbanizationReligious(-0.14, -0.13)(0.30, 0.32)(-0.53, -0.52)Multivariate longitudinal data analysisDependencyratio-18/28

Scatter plots of the two response variablesYear 2005Year 00500100015002000Pearson correlation: 0.78Pearson correlation: 0.77Year 2007Year 2008Year 20090100005001000150020002500Pearson correlation: 0.7505001000 1500 2000 2500Pearson correlation: 0.7805001000 1500 2000 2500Pearson correlation: 0.74x-axis: non-life insurance and y-axis: life insuranceMultivariate longitudinal data analysisP. Kumara and E.A. Valdez, U of Connecticut25001000 2000 30003000200020001000002500Pearson correlation: 0.8030000000100010002000200030003000Year 200419/28

Scatter plots of the ranked response variables0.00.20.40.60.81.00.0 0.2 0.4 0.6 0.8 1.0Year 20060.0 0.2 0.4 0.6 0.8 1.0Year 20050.0 0.2 0.4 0.6 0.8 1.0Year .81.00.0 0.2 0.4 0.6 0.8 1.0Year 20090.0 0.2 0.4 0.6 0.8 1.00.00.0Year 20080.0 0.2 0.4 0.6 0.8 1.0Year 20070.40.00.20.40.60.81.00.00.20.40.6x-axis: non-life insurance and y-axis: life insuranceP. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis20/28

Histograms of two responses from year 2004 to 2009Life density: Year 200400202040Non-life density: Year 20040500100015002000250005001000200025003000Life density: Year 20050015203040Non-life density: Year fe density: Year 200600 10202540Non-life density: Year fe density: Year 20070010153025Non-life density: Year 00Life density: Year 20080015103020Non-life density: Year 20081500050010001500200025003000010003000Life density: Year 2009001510 2030Non-life density: Year 2009200005001000P. Kumara and E.A. Valdez, U of te longitudinal data analysis400021/28

Model calibrationMarginals: GB2 with regression on the scale parameterGaussian copula:C(u1 , u2 ; ρ) Φρ (Φ 1 (u1 ), Φ 1 (u2 ))Natural assumption for random effect for the k th response: αik N 0, σk2P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis22/28

Model estimatesParameterCovariatesGDP per capitaReligiousUrbanizationDeath rateDependency ratio (old)Univariate fitted model for insurance demandNon-life insurance densityLife insurance densityEstimate Std Errorp-val Estimate Std 40.1390GB2 300.00000.00000.0000Random aussian copula:ParameterρP. Kumara and E.A. Valdez, U of ConnecticutEstimate0.5174Std Error0.0315p-val0.0000Multivariate longitudinal data analysis23/28

PP plots of the residuals for marginal diagnosticsNon-life Insurance0.40.60.80.20.40.60.80.0 0.2 0.4 0.6 0.8 1.0sample probability0.01.00.00.20.40.60.8theoretical probabilityYear 2007Year 2008Year 20090.20.40.60.81.0theoretical probabilityP. Kumara and E.A. Valdez, U of Connecticutsample probability0.00.20.40.60.8theoretical probability1.01.00.0 0.2 0.4 0.6 0.8 1.0theoretical probabilitysample probability0.00.0 0.2 0.4 0.6 0.8 1.01.00.0 0.2 0.4 0.6 0.8 1.0sample probabilitysample probability0.2Year 2006theoretical probability0.0 0.2 0.4 0.6 0.8 1.00.0sample probabilityYear 20050.0 0.2 0.4 0.6 0.8 1.0Year 20040.00.20.40.60.81.0theoretical probabilityMultivariate longitudinal data analysis24/28

PP plots of the residuals for marginal diagnosticsLife Insurance0.40.60.20.40.60.0 0.2 0.4 0.6 0.8 1.0sample probability0.00.80.00.20.40.60.8theoretical probabilityYear 2007Year 2008Year 20090.20.40.60.8theoretical probabilityP. Kumara and E.A. Valdez, U of Connecticutsample probability0.00.20.40.60.8theoretical probability0.0 0.2 0.4 0.6 0.8 1.0theoretical probabilitysample probability0.00.0 0.2 0.4 0.6 0.8 1.00.80.0 0.2 0.4 0.6 0.8 1.0sample probabilitysample probability0.2Year 2006theoretical probability0.0 0.2 0.4 0.6 0.8 1.00.0sample probabilityYear 20050.0 0.2 0.4 0.6 0.8 1.0Year 20040.00.20.40.60.8theoretical probabilityMultivariate longitudinal data analysis25/28

Additional work intendedImplementing diagnostic tests for model validation.Handling unbalanced and missing data.Identifying more actuarial-related problems within amultivariate longitudinal framework.e.g. there is an ongoing interest in loss reserving using multipleloss triangle.Alternative approach:Use multivariate generalized linear models for response in eachtime period and use copula to capture the inter-temporaldependence.(Possible) handling discrete response variables incorporatingjitters.P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis26/28

Selected referenceBeck, T. and Webb, I. (2003). Economic, Demographic andinstitutional determinants of life insurance consumption acrosscountries. World Bank Economic Review 17: 51-99Browne, M. and Kim, K. (1993). An International analysis oflife insurance demand. The Journal of Risk and Insurance 60:616-634Browne, M., Chung, J., and Frees, E.W. (2000). Internationalproperty-liability insurance consumption. The Journal of Riskand Insurance 67: 73-90Outreville, J. (1996). Life insurance market in developingcountries. The Journal of Risk and Insurance 63: 263-278Shi, P. and Frees, E.W. (2010). Long-tail LongitudinalModeling of Insurance Company Expenses. Insurance:Mathematics and Economics 47: 303-314P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis27/28

- Thank you -P. Kumara and E.A. Valdez, U of ConnecticutMultivariate longitudinal data analysis28/28

Multivariate longitudinal analysis for actuarial applications We intend to explore actuarial-related problems within multivariate longitudinal context, and apply our proposed methodology. NOTE: Our results are very preliminary at this stage. P. Kumara and E.A. Valdez, U of Connecticut Multivariate longitudinal data analysis 5/28

Related Documents:

Introduction to Multivariate methodsIntroduction to Multivariate methods – Data tables and Notation – What is a projection? – Concept of Latent Variable –“Omics” Introduction to principal component analysis 8/15/2008 3 Background Needs for multivariate data analysis Most data sets today are multivariate – due todue to

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

3 4 Exploratory data analysis of univariate longitudinal data 81 3.4.1 Faraway's approach 81 3.4.2 What did Faraway miss? 82 3.5 Exploratory data analysis of multivariate longitudinal data 87 3.5.1 About the data 87 3.5.2 Exploring mean trend conditionally by covariate gender 90 3.6 Conclusion and discussion 92

6.7.1 Multivariate projection 150 6.7.2 Validation scores 150 6.8 Exercise—detecting outliers (Troodos) 152 6.8.1 Purpose 152 6.8.2 Dataset 152 6.8.3 Analysis 153 6.8.4 Summary 156 6.9 Summary:PCAin practice 156 6.10 References 157 7. Multivariate calibration 158 7.1 Multivariate modelling (X, Y): the calibration stage 158 7.2 Multivariate .

dice" multivariate longitudinal data in the spirit of exploratory data analysis. The next section describes longitudinal data, sets up a notation, and describes the types of questions that are typical for this kind of data. Section 3 describes approaches for studying mean trends and Section 4 describes approaches for exploring individual .

An Introduction to Multivariate Design . This simplified example represents a bivariate analysis because the design consists of exactly two dependent or measured variables. The Tricky Definition of the Multivariate Domain Some Alternative Definitions of the Multivariate Domain . “With multivariate statistics, you simultaneously analyze

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

This study investigated microRNA and mRNA expression and protein function associated with DNA repair in human oocytes and embryos. MicroRNAs have been shown to down-regulate and in some cases to stabilise the expression of several genes including repair genes. The first aim of this study was to analyse the differences in the expression of microRNAs and their target mRNAs involved in repair .