Inference On Association Measure For Bivariate Survival .

3y ago
15 Views
2 Downloads
236.87 KB
35 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Harley Spears
Transcription

Inference on Association Measure for BivariateSurvival Data with Hybrid CensoringBy SUHONG ZHANG, YING ZHANG, KATHRYN CHALONERDepartment of Biostatistics, The University of Iowa,C22 GH, 200 Hawkins Drive, Iowa City, IA 52242, kathryn-chaloner@uiowa.eduAND JACK T. STAPLETONDepartment of Internal Medicine, The University of Iowa and Iowa City VA MedicalCenter, SW54-15 GH, 200 Hawkins Drive, Iowa City, IA, 52242 U.S.A.jack-stapleton@uiowa.eduSummaryA two-stage semiparametric estimator is proposed to estimate the association measure forbivariate survival data that are subject to hybrid censoring: one event time is right censoredand the other is observed as current status data, or subject to interval censoring case 1. Thebivariate data are assumed to follow a copula model, in which the association parameter isof primary interest. The consistency and asymptotic normality of the proposed estimatorare established based on empirical process theories. Simulation studies indicate that theestimator performs quite well with a moderate sample size. The method is applied to amotivating HIV example, which studies the effect of GB virus type C (GBV-C) co-infectionon the survival of HIV infected individuals.1

Some key words: Association measure; Bivariate survival model; Copula; Current statusdata; Empirical process; GBV-C; HIV; Kendall’s τ ; Right censored data1.INTRODUCTIONEvent times are often subject to various types of censoring. The most common censoring caseis right censoring, which happens when the event has not occurred at the end of the studyor the subject withdraws from the study. Interval censoring occurs when the event, such asthe clearance of an infection, is only known to occur within an interval. A special case ofinterval censored data is current status data, or interval censoring case 1 data (Groeneboom& Wellner, 1992), which happens when it is only feasible to know whether the event hasoccurred or not by a random monitoring time C. Specifically, let T denote the event time,then one observes (C, ), where I(T C) and I(·) is the indicator function.In this manuscript, we consider a pair of positive random event times (T10 , T20 ), where T10is right censored by a random time C1 , and the only observation on T20 is its status: whetheror not T20 exceeds a random monitoring time C2 . For each individual, one observesX {(T1 , T2 , 1 , 2 ) : T1 min(T10 , C1 ), T2 C2 , 1 I(T10 C1 ), 2 I(T20 C2 )}.(1)This hybrid censoring data structure is observed in a motivating example, which studiesthe association between the HIV survival, time from HIV seroconversion to death, and theinfection of a harmless virus called GB virus type C (GBV-C).Some prior studies suggest that GBV-C delays the progression of HIV disease (Xiang1

et al., 2001; Tillmann et al., 2001; Williams et al., 2004), while a few other studies fail tofind a beneficial effect (Birk et al., 2002; Björkman et al., 2004; Van der Bij et al., 2005).These studies compared the survival curves for HIV-infected subjects with or without GBVC infection. However, the two sample comparison method used in these studies does notadjust for the duration of GBV-C infection, which may vary from subject to subject due toits self-clearance nature, and may be the potential source of contradictive results. Williamset al. (2004) conducted the most comprehensive GBV-C study to date, and found thatGBV-C is associated with prolonged survival when a selected cohort from the MulticenterAIDS Cohort Study (MACS) is examined at 5-6 years after HIV seroconversion, but noassociation has been found when examined at 12-18 months. Longitudinal GBV-C testingon more than two time points in HIV infected individuals are not readily available from anyother studies. Besides the evaluation at baseline (early measurement to select co-infectedindividuals), GBV-C status is only monitored once during the follow up for each individualin the MACS sub-cohort. Therefore, the GBV-C status is well fitted to the current statusdata structure. It is well understood that HIV survival is subject to right censoring. Thelack of a proper analytical tool for this type of data motivates us to re-analyze the MACSsub-cohort data from Williams et al. (2004), by developing a new bivariate analysis methodthrough modeling the association of HIV survival and the duration of GBV-C infection.Bivariate and multivariate survival data have been studied extensively in statistical literatures. Liang et al. (1995) and Oakes (2000) reviewed some recent developments for analysisof multivariate failure time data. Copula based survival models are considered, for example, by Hougaard (1989), Oakes (1989), Shih & Louis (1995) and Wang & Ding (2000), to2

study the association between two event times. Shih & Louis (1995) examined the association of the bivariate data that are both subject to right censoring, through a two-stagesemiparametric estimation procedure. At the first stage, the marginal survival functions areestimated consistently by nonparametric maximum likelihood estimator (NPMLE). At thesecond stage, a dependency structure is imposed by using a copula model, and the NPMLEsof two marginal survival functions are plugged into the likelihood to form a pseudolikelihood,then the association parameter is estimated through a pseudolikelihood approach. Wang &Ding (2000) proposed a parallel two-stage semiparametric method for the bivariate currentstatus data. Both papers show that the proposed estimators of the dependence measureconverge in distribution to normal distributions with the n1/2 rate, without showing the consistencies in the first place. In this manuscript, we model the association of bivariate eventtimes using copula models and estimate the association parameter through the two-stageprocedure as well, but we focus specifically on the data structure where one of the pairedevent time data is right-censored and the other is observed as current status data, as statedin (1).The classical method for estimating the NPMLE of marginal survival function for rightcensored data is widely cited as the Kaplan-Meier estimator (1958). For current status data,Turnbull (1976) derived a self-consistency equation and used the EM algorithm to computethe NPMLE of distribution function. Groeneboom & Wellner (1992) introduced the ConvexMinorant Algorithm (CMA) to solve for the NPMLE of distribution function as well. Huang& Wellner (1997) reviewed recent progress in interval censored survival data, which includescurrent status data as a special case.3

Our main goal in this manuscript is to develop an inference procedure to study theassociation of bivariate survival data with hybrid censoring structure aforementioned. Adirect application of this development is to investigate the association between HIV survivaland the duration of GBV-C infection to see if they are positively correlated or not. Theknowledge of this association may lead to a potential HIV treatment.2.SOME PRELIMINARIES2.1 Copula modelsA copula is often referred to as the multivariate distribution function whose marginaldistributions are uniform over [0, 1]. Consider a bivariate uniform random variable (U1 , U2 ),a copula C is defined asC(u1 , u2) P r(U1 u1 , U2 u2 ).Any continuous random variable can be transformed from the uniform random variable on[0, 1], therefore, copula can be used to construct a multivariate distribution with any marginaldistributions.For Bivariate distribution function H with univariate marginal distribution functions Fand G, the associated copula function C isCα : [0, 1]2 [0, 1] that satisfiesHα (x, y) Cα (F (x), G(y)),where α is called the association parameter, see Nelsen (2006). A bivariate survival functioncan be defined in a similar way.4

Copula model provides a convenient way to express the joint distribution of two or morerandom variables. A copula facilitates the joint distribution into two contributions: themarginal distributions of the individual variables, and the interdependency between margins.It’s sometime useful to do so if we mainly focus on either the marginal distributions only orthe interdependency only.The association parameter α defines the strength of dependency between two margins.The Kendall’s tau, denoted by τ , is related to α as below:τ 4Z 1Z010Cα (u, v)dudv 1.A collection of copulas called Archimedean copulas have been studied extensively inliterature. Suppose that ψα : [0, ] [0, 1] is a strictly decreasing function such thatψ(0)α 1, then an Archimedean copula can be generated asCα (u, v) ψα (ψα 1 (u) ψα 1 (v)), u, v [0, 1].Examples of Archimedean copulas include the following three popular sub-families:1. Gumbel (Gumbel-Hougaard) copula: Cα (u, v) exp [( log u)α ( log v)α ]1/α ,α 1, 0 u, v 12. Clayton copula:Cα (u, v) [max(u α v α 1, 0)] 1/α ,α 1 and α 6 0, 0 u, v 15

3. Frank copula: 1(e αu 1)(e αv 1)Cα (u, v) log 1 ,αe α 1α 6 0, 0 u, v 1Archimedean copulas are widely used in applications due to their simplicity, flexibility ofdependence structures, and the ability to extend to a higher dimensional problem via theassociativity property. A collection of twenty-two one-parameter families of Archimedeancopulas can be found in Table 4.1 of Nelsen (2006).2.2 Univariate survival function estimationLet Sj and Fj , j 1, 2, denote the survival function and distribution function of Tj0 , respectively.2.2.1 Right censored dataKaplan & Meier (1958) introduced the product limit esti-mator of survival function, which is a basic tool to estimate the probability of survival attime t for right censored data. S1 can therefore be estimated byŜ1 (t) Y ni di,nii:t tiwhere ni is the number at risk just prior to observation time ti , and di is the number ofdeaths at time ti .6

2.2.2 Current status dataFor i.i.d. current status data (C2i , 2i ), i 1, 2, · · · , n, theNPMLE F̂2 of the distribution function F2 maximizes the following likelihood:l(F2 ) nXi 1{ 2i log F2 (C2i ) (1 2i ) log(1 F2 (C2i ))},under the assumption that T20 and C2 are independent. Turnbull (1974) derived a selfconsistency equation for F̂2 :F̂2 (c) EF̂2 {F̃2n (c) C21 , · · · , C2n , 21 , · · · , 2n },where F̃2n is the (unobservable) empirical distribution function of the random variables00T21, · · · , T2n. This equation immediately yields the iteration steps of the Expectation Maxi-mization (EM) algorithm (Dempster et al., 1977), which can be used to solve for F̂2 .Groeneboom & Wellner (1992) introduced an algorithm called Convex Minorant Algorithm (CMA) to maximize the above likelihood from a different characterization of NPMLE.Let C2(i) be the ith order statistics of C2 and let 2(i) be the corresponding indicator, thenF̂2 can be obtained explicitly through the “max-min” formula:Pm j k 2(j).F̂2 (c2(i) ) max minm i k ik m 1As suggested by Groeneboom & Wellner (1992), the CMA is considerably faster than thecommonly used EM method, especially when the sample size is large. The NPMLE ofsurvival function Ŝ2 can be obtained by 1 F̂2 .2.3 Bivariate survival modelsFirst, we write the joint survival function into a copula structureSα (t1 , t2 ) Cα (S1 (t1 ), S2 (t2 )) α R1 ,7

where α is the association parameter. Let Fα (t1 , t2 ) denote the joint distribution functioncorresponding to Sα (t1 , t2 ).We consider a special case when T10 is right censored by a random time C1 and T20 issubject to interval censoring case 1 by a monitoring time C2 . The observed data, X, is described in (1). Throughout the manuscript, we assume the independent and noninformativecensoring.3.Copula based pseudolikelihood estimation of association parameterLet (T1i , T2i , 1i , 2i ), i 1, · · · , n, be an i.i.d. sample, each with density h(t1 , t2 , δ1 , δ2 )given bylimh1 0 h2 0 Let C1α (u, v) P [t1 T1 t1 h1 , t2 T2 t2 h2 , 1 δ1 , 2 δ2 ]h1 h2 δ1 δ2 δ1 (1 δ2 ) Fα (t1 , t2 ) Sα (t1 , t2 ) t1 t1 (1 δ1 )δ2 (1 δ1 )(1 δ2 ) S1 (t1 ) Sα (t1 , t2 )Sα (t1 , t2 ). C (u, v). u αNote that Fα (t1 , t2 ) 1 S1 (t1 ) S2 (t2 ) Sα (t1 , t2 ). Given twomarginal survival functions S1 , S2 , the likelihood of the association parameter α based onall observations isnY i 11 C1α S1 (t1i ), S2 (t2i ) δ1i δ2i C1α S1 (t1i ), S2 (t2i ) δ1i (1 δ2i )(2) (1 δ1i )δ2i (1 δ1i )(1 δ2i ) S1 (t1i ) Cα S1 (t1i ), S2 (t2i )Cα S1 (t1i ), S2 (t2i ),by omitting the parts that irrelevant in estimating α. Our main interest is to estimate theassociation parameter α. We propose to apply a two-stage pseudolikelihood approach. At8

the first stage, the marginal survival function S1 , which corresponds to the right censoreddata, is estimated by Kaplan-Meier estimator Ŝ1 , and S2 , which corresponds to currentstatus data, is estimated by Ŝ2 using Convex Minorant Algorithm. At the second stage, theestimates Ŝ1 and Sˆ2 are plugged into the likelihood (2), and the resulted log pseudolikelihoodis then maximized to get the estimator of α, α̂n , which is the solution to the pseudo scoreequation:nX l(α, Ŝ1 (t1i ), Ŝ2 (t2i ), δ1i , δ2i ) 0,Uα (α, Ŝ1 , Ŝ2 , δ1 , δ2 ) αi 1(3)wherel(α, Ŝ1 (t1 ), Ŝ2 (t2 ), δ1 , δ2 )(4) δ1 δ2 log 1 C1α (Ŝ1 (t1 ), Ŝ2 (t2 )) δ1 (1 δ2 ) log C1α (Ŝ1 (t1 ), Ŝ2 (t2 )) (1 δ1 )δ2 log Ŝ1 (t1 ) Cα (Ŝ1 (t1 ), Ŝ2 (t2 )) (1 δ1 )(1 δ2 ) log Cα (Ŝ1 (t1 ), Ŝ2 (t2 )).The pseudolikelihood estimation approach allows the functional form of the marginalsurvival functions to be flexible, and is determined by data. It’s also computationally easysince only the association parameter is left as unknown in the pseudolikelihood.4.Asymptotic properties of the pseudo estimator α̂nWe now define some notations to be used in the sequel. Let l(α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) bedefined as in (4) on [0, t01 ] [0, t02 ], t01 sup{t : P (T1 t, C1 t) 0} and t02 sup{t :P (C2 t) 0}. Suppose α is in an open set A in the real line. We let D be a constant,which may represent different values at different places.9

Before we formally state the asymptotic results, we need to define the following notations:Vα (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα2 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα,1 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα,2 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα2 ,1 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα2 ,2 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα,12 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα,1,2 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) Vα,22 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) l(α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) α 2l(α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) α2 2l(α, u, S2(t2 ), δ1 , δ2 ) u S1 (t1 ) α u 2l(α, S1 (t1 ), v, δ1 , δ2 ) v S2 (t2 ) α v 3l(α, u, S2(t2 ), δ1 , δ2 ) u S1 (t1 ) α2 u 3l(α, S1 (t1 ), v, δ1, δ2 ) v S2 (t2 ) α2 v 3l(α, u, S2(t2 ), δ1 , δ2 ) u S1 (t1 ) α u2 3l(α, u, v, δ1, δ2 ) u S1 (t1 ),v S2 (t2 ) α u v 3l(α, S1 (t1 ), v, δ1, δ2 ) v S2 (t2 ) α v 2Suppose the following regularity conditions hold:(A1) l(α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) is three-times differentiable with respect to α on [0, t01 ] [0, t02 ], for each α A, and all derivatives are continuous and uniformly boundedby some constant D.(A2) Vα,1 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ), Vα,2 (α, S1(t1 ), S2 (t2 ), δ1 , δ2 ), Vα2 ,1 (α, S1(t1 ), S2 (t2 ), δ1 , δ2 ),Vα2 ,2 (α, S1(t1 ), S2 (t2 ), δ1 , δ2 ), Vα,12 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ), Vα,1,2 (α, S1(t1 ), S2 (t2 ), δ1 , δ2 ),and Vα,22 (α, S1 (t1 ), S2 (t2 ), δ1 , δ2 ) are continuous and uniformly bounded by some constant D on [0, t01 ] [0, t02 ] , for all α A.(A3) For each α A, 0 Eα [Vα (α, S1(t1 ), S2 (t2 ), δ1 , δ2 )]2 .(A4) Let F2 , G2 be distribution functions of T20 and C2 , respectively. G2 F2 , F2 G2 ,10

and G2 has density g2 with respect to the Lebesgue measure.(A5) (ψ2 /g2 ) S2 1 is bounded and Lipchitz on [0, 1], where ψ2 is the derivative of theinfluence curve IC2 (t2 ), which is defined in the appendix.(A6) S2 , g2 and ψ2 satisfyZ0t02S2 (t2 )(1 S2 (t2 ))ψ2 (t2 )dt2 .g2 (t2 )The above regularity conditions hold for the bivariate copula models mentioned earliergiven that the marginal distribution functions are smooth.Some technical lemmas are needed for proving the asymptotic results and they are statedas follows:Lemma 1. Let Fj {f : f is a survival function on [0, t0j ]}, j 1, 2, and the class GF {Vα,1 (α, f1 (t1 ), f2 (t2 ), δ1 , δ2 ); fj Fj , j 1, 2}. Let P denote the probability measure of(T1 , T2 , 1 , 2 ), then under condition (A1)-(A2), GF is a P-Glivenko-Cantelli class, for allα A.Lemma 2. Let Fj {f : f is a survival function on [0, t0j ]}, j 1, 2 and the class HF {Vα (α, f1 (t1 ), f2 (t2 ), δ1 , δ2 ) Vα (α, S1(t1 ), S2 (t2 ), δ1 , δ2 ) : fj Fj , j 1, 2}. Let P denote theprobability measure of (T1 , T2 , 1 , 2 ), then under condition (A1)-(A2), HF is a P-DonskerClass, for all α A.Under the regularity conditions stated previously, the estimator α̂n , the solution to equation (3), is consistent and has the asymptotic normal distribution as stated in the followingtwo theorems:11

Theorem 1. Assume that the joint distribution of (T10 , T20) follows an Archimedean copulamodel with the true association parameter α α0 . Let Ŝ1 (·) be K-M estimator of S1 (·) andŜ2 (·) be the NPMLE estimator of S2 (·) by CMA. Under the regularity conditions (A1)-(A2),pα̂n α0 as n .Theorem 2. Under the regularity conditions (A1)-(A6),σ2 dn(α̂n α0 ) N(0, σ 2 ), whereVar(Q(α0 , S1 , S2 , t1 , t2 , δ1 , δ2 ))W 2 (α0 , S1 , S2 , δ1 , δ2 )withW (α0 , S1 , S2 , δ1 , δ2 ) Z 2 Vα (α0 , S1 (t1 ), S2 (t2 ), δ1 , δ2 ) dP (t1 , t2 , δ1 , δ2 )Q(α0 , S1 , S2 , t1 , t2 , δ1 , δ2 ) Vα (α0 , S1 (t1 ), S2 (t2 ), δ1 , δ2 ) I1 (T1 , 1 , α0 ) l(t2 , δ2 , S2 , G2 , ψ2 ),in whichI1 (T1 , 1 , α0 ) Zt010Zt020Mα,u (α0 , S1 (t1 ), S2 (t2 ))f (t1 , t2 )I10 (T1 , 1 )(t1 )dt1 dt2 l(t2 , δ2 , S2 , G2 , ψ2 ) [δ2 (1 S2 (t2 ))] ψ2 (t2 ) I[g2 (t2 0)],g2 (t2 )whereMα,u (α0 , S1 (t1 ), S2 (t2 )) Eδ1 δ2 t1 t2 Vα,u (α0 , S1 (t1 ), S2 (t2 ), δ1 , δ2 )andI10 (T1 , 1 )(t1 ) S1 (t1 )nZ0t11dN1 (u) P (T1 u)Z0t1oI[T1 u]dΛ1 (u) .P (T1 u)N1 (u) is defined as I[T1 u, 1 1] and Λ1 is the cumulative hazard function of T10 .The proof of the two lemmas and the two theorems are given in the Appendix.12

5.Simulation studiesThe preceding sections provide a two-stage pseudo-likelihood estimation procedure forthe association parameter between the two survival times. Although the estimator is shownto be asymptotically consistent and normally distributed, it is crucial to ascertain its finitesample performance before applying it to real problems. Simulation studies are preformedto evaluate the proposed estimator.We consider the Gumbel copula function Cα (u, v) exp [( log u)α ( log v)α ]1/α ,where α 1, and two margins are both assumed to be exponentially distributed with unitrate 1.A sample of bivariate copula random variables is generated based on conditional distribution function. Suppose that the joint distribution of the bivariate data (T10 , T20) isCα (F1 (t1 ), F2 (t2 )). We generate (T10 , T20 ) through the following steps: Generate two independent uniform (0, 1) random variables u, w. Set w P (V v U u) Cα (u, v)/ u, solve for v. Set T10 F1 1 (u), T20 F2 1 (v).Meanwhile, a sample of bivariate censoring times (C1 and C2 ) are each independently drawnfrom a uniform distribution on [0, 2.3]. In this setting, about 50% of T10 is right censored byC1 , and about 50% of T20 is subject to interval censoring case 1 by C2 as well.13

Kendall’s τ is chosen as a global association measure. For Gumbel copula, τ 1 1/α.Three different values of α are set such that the corresponding Kendall’s τ is 0.25, 0.5, and0.75. For each value of α, we conduct Monte-Carlo simulations with 1, 000 replications forsample size n 50, 100, 200 and 400, respectively.We compute the two-stage pseudolikelihood estimat

through modeling the association of HIV survival and the duration of GBV-C infection. Bivariate and multivariate survival data have been studied extensively in statistical liter-atures. Liang et al. (1995) and Oakes (2000) reviewed some recent developments for analysis of

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Stochastic Variational Inference. We develop a scal-able inference method for our model based on stochas-tic variational inference (SVI) (Hoffman et al., 2013), which combines variational inference with stochastic gra-dient estimation. Two key ingredients of our infer

2.3 Inference The goal of inference is to marginalize the inducing outputs fu lgL l 1 and layer outputs ff lg L l 1 and approximate the marginal likelihood p(y). This section discusses prior works regarding inference. Doubly Stochastic Variation Inference DSVI is

och krav. Maskinerna skriver ut upp till fyra tum breda etiketter med direkt termoteknik och termotransferteknik och är lämpliga för en lång rad användningsområden på vertikala marknader. TD-seriens professionella etikettskrivare för . skrivbordet. Brothers nya avancerade 4-tums etikettskrivare för skrivbordet är effektiva och enkla att