HST 190: Introduction To Biostatistics

2y ago
37 Views
3 Downloads
1.63 MB
34 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Ryan Jay
Transcription

HST 190: Introduction toBiostatisticsLecture 8:Analysis of time-to-event data1HST 190: Intro to Biostatistics

Survival analysis is studied on its own because survival datahas features that distinguish it from other types of data. Nonetheless, our tour of survival analysis will visit all of thetopics we have learned so far:§ Estimation§ One-sample inference§ Two-sample inference§ Regression2HST 190: Intro to Biostatistics

Survival analysis Survival analysis is the area of statistics that deals with time-toevent data. Despite the name, “survival” analysis isn’t only for analyzing timeuntil death.§ It deals with any situation where the quantity of interest is amount oftime until study subject experiences some relevant endpoint. The terms “failure” and “event” are used interchangeably for theendpoint of interest For example, say researchers record time from diagnosis to death(in months) for a sample of patients diagnosed with laryngealcancer.§ How would you summarize the survival of laryngeal cancer patients?3HST 190: Intro to Biostatistics

A simple idea is to report the mean or median survival time§ However, sample mean is not robust to outliers, meaning small increases themean don’t show if everyone lived a little bit longer or whether a fewpeople lived a lot longer.§ The median is also just a single summary measure. If considering your own prognosis, you’d want to know more! Another simple idea: report survival rate at 𝑥 years§ How to choose cutoff time 𝑥?§ Also, just a single summary measure—graphs show same 5-year rate4HST 190: Intro to Biostatistics

Summary measures such as means and proportions arehighly useful for concisely describing data. However, theycapture some features of the data but not all.§ As we see, survival curve itself is far more informative§ So, let’s estimate that!5Patientmonths1122.5334355.5667787991013HST 190: Intro to Biostatistics

Measuring survival time We measure time with variable 𝑡. The beginning of theperiod we’re studying (e.g., time of diagnosis) is 𝑡 0.§ 𝑡 can represent days, months, weeks, years, etc. Time is recorded relative to when a person enters the study,not according to the calendar.§ In other words, a person reaches the point 𝑡 1 after being in thestudy for 1 year (or month, etc).§ It doesn’t matter how far along the others are.6HST 190: Intro to Biostatistics

Person #1 enters study in 2003 Person #2 enters study in 2002 Both die at 2006 If t measured in years, person #1 dies at 𝑡 3 and person#2 dies at 𝑡 3200272003Person #1XPerson #2X2004200520062007HST 190: Intro to Biostatistics

The probability of an individual in our population survivingbeyond a given time is called the survival function, writtenas 𝑆(𝑡) Here we want to make inference about a function, not just asingle population value Seems logical to estimate 𝑆(𝑡) by taking a sample andrecording the proportion of sample who are still alive aftertime 𝑡 has elapsed§ However, it is not as simple as that sounds8HST 190: Intro to Biostatistics

Censored data Analyses of survival times often include censored data (a type ofmissingness). Valid inference in the presence of missing data is a topic ofongoing research in statistics. To do valid inference when some data are missing, we must makeassumptions. Time-to-event data often have data missing in a particular way:individuals are lost to follow-up (or the study ends) before theyexperience the event of interest. This is called (right) censoring. Censored data provide partial information: you don’t know howlong a patient lived, but you know that she/he lived at least aslong as the time before being lost to follow-up.9HST 190: Intro to Biostatistics

Why would a person be lost to follow-up? The person could have § moved to another city§ withdrawn from the study§ died of a different cause§ still be in the study without an event at the time of the analysis To do inference in the setting of missing data, we must be willingto make a big assumption§ Assumption that censoring is non-informative In other words, assume that being lost to follow-up is unrelatedto prognosis. If this assumption can’t be made, inference becomes morecomplicated and requires strong assumptions.10HST 190: Intro to Biostatistics

As an important counterexample, say researchersadminister a new chemotherapy drug to 10 cancer patientsto estimate survival time while on the drug. 5 patients can’t tolerate the side effects and drop out of thestudy If non-informative censoring were assumed, the drug wouldprobably appear falsely impressive.§ Those who dropped out were probably more ill; hence shortersurvival times were disproportionately removed from the sample.11HST 190: Intro to Biostatistics

Estimating the survival curve The Kaplan-Meier estimator provides an estimate of 𝑆(𝑡) atall time points, even if some data are censored§ Also known as product-limit estimator Using the rules of probability, we’ll see where this estimatorcomes from by choosing specific times 𝑡* , , 𝑡- , then at 𝑡. ,𝑆 𝑡. 𝑃 alive at 𝑡. 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑃 alive at 𝑡.8* 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑆(𝑡.8* )§ Put simply, probability of surviving to time 𝑡. is probability ofsurviving to 𝑡.8* and then given you made it that far, surviving to 𝑡.§ What if we applied this trick repeatedly?12HST 190: Intro to Biostatistics

𝑆 𝑡. 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑆 𝑡.8* 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑃 alive at 𝑡.8* alive at 𝑡.8; 𝑆 𝑡.8; 𝑃 alive at 𝑡. alive at 𝑡.8* 𝑃 alive at 𝑡; alive at 𝑡* 𝑆 𝑡* After writing the survival function as the product of theseindividual pieces, we can then estimate it by estimating eachpiece individually If there were no censoring, then we could simply estimate𝑃 alive at 𝑡. alive at 𝑡.8* by the sample quantity# alive at 𝑡.# alive at 𝑡.8* However, a patient who is alive but censored at time 𝑡.8* neverreally had a chance to make it to 𝑡.§ That patient was not eligible to die during the interval from 𝑡.8* to 𝑡. andtherefore shouldn’t be counted for computing survival rate in this interval.13HST 190: Intro to Biostatistics

When there is censoring, then we estimate𝑃 alive at 𝑡. alive at 𝑡.8* using the sample quantity# alive at 𝑡.# alive at 𝑡.8* # censored at 𝑡.8* Denominator counts those at risk for event at time 𝑡.§ It is exactly because of independent censoring that we can estimatethe conditional probability as𝑃 alive at 𝑡. alive at 𝑡.8* and uncensored at 𝑡.14HST 190: Intro to Biostatistics

Let’s define the following:§ 𝑑. # died at time 𝑡.§ 𝑙. # censored at time 𝑡.§ 𝑆. # still alive and not censored at 𝑡. Notice that 𝑆.8* 𝑆. 𝑑. 𝑙. , so we can write theprevious estimator as# alive at 𝑡.𝑆. 𝑙. # alive at 𝑡.8* # censored at 𝑡.8* 𝑆.8* 𝑙.8* 𝑙.8*𝑆. 𝑑.𝑑. 1 𝑆.8*𝑆.8*15HST 190: Intro to Biostatistics

So, given a set of observed time points 𝑡* , , 𝑡- , the KaplanMeier estimator of survival probability at time 𝑡. is𝑑*𝑑;𝑑.I𝑆 𝑡. 1 1 1 𝑆J𝑆*𝑆.8*§ Two key features:1) Estimated curve jumps at event times only2) Curve goes to zero if last observed time is event, not censored Consider outcomes, at 2-year intervals, for 100 patients withsome disease or other.yearfailcensored§ Estimate the survival function.16272416561988147101141252HST 190: Intro to Biostatistics

Consider outcomes, at 2-year intervals, for 100 patients withsome disease or other.§ 𝑆I 2 1 17L*JJ 0.930§ 𝑆I 4 1 L*JJ§ 𝑆I 6 1 L*JJ 0.7661 *RS*1 *SLJ 0.558 yearfail 𝑑𝑖censored 𝑙𝑖survive 𝑆𝑖Total 𝑆.8*272100 (7 2) 91100416591 (16 5) 7091619870 (19 8) 43708147101141252HST 190: Intro to Biostatistics

Estimating S(t) this way is also called the life-table methodbecause it pre-specifies the time intervals. Using software, estimates of S(t) usually defined at eachindividual time to get a smoother curveLiu R et al. NEJM 2007 Jan 18; 365(3):217-226 (Fig. 2)18HST 190: Intro to Biostatistics

Brahmer J et al. NEJM 2015; 373(2):123-13519HST 190: Intro to Biostatistics

Confidence intervals for KM estimator In addition to estimating 𝑆(𝑡) at any time point, we can alsoform a confidence interval for it as well. As with the odds ratio, we use the log-transformation forthis (which improves the normal approximation)1) Take logarithm of 𝑆(𝑡)2) CI for ln(𝑆(𝑡))3) Convert back to CI for 𝑆(𝑡)20HST 190: Intro to Biostatistics

At time 𝑡. , variance of ln 𝑆I 𝑡. . a*]\ Therefore, the 100(1 𝛼)% CI for ln 𝑆 𝑡.ln 𝑆I 𝑡. 𝑧*8f g;𝑆 a* 8*[\𝑑 𝑆 8* 𝑑 ]\ 8[\is 𝑐* , 𝑐; So the 100(1 𝛼)% CI for 𝑆 𝑡. is 𝑒 k , 𝑒 kl21HST 190: Intro to Biostatistics

Returning to our example,𝑆I 6 0.558 ln 𝑆I 6 Then the variance of ln 𝑆I 6.g𝑑 𝑆 a* 8*𝑆 8* 𝑑 0.583is𝑑*𝑑;𝑑n 𝑆J 𝑆J 𝑑*𝑆* 𝑆* 𝑑;𝑆; 𝑆; 𝑑n71619 0.008100 100 791 91 1670 70 19 Then the 100(1 𝛼)% CI for ln 𝑆I 6is 0.583 1.96 0.008 ( 0.763, 0.403) Thus, the 100(1 𝛼)% CI for 𝑆I 6 is𝑒 8J.LRn , 𝑒 8J.oJn (0.466,0.668)22HST 190: Intro to Biostatistics

Log-rank test In addition to estimating survival, we may want to comparetwo groups’ survival functions using a hypothesis test. Instead of having to test for the difference of a summarymeasure (e.g., mean survival in group 1 vs. group 2), we canactually compare the entire survival curves at once. The test for this is called the Log-Rank Test. What does it mean to say whether “two curves are thesame”? To precisely define what we’re testing, we need to define anew function called the hazard function, ℎ(𝑡)23HST 190: Intro to Biostatistics

The hazard function is defined as𝑆 𝑡 𝑆(𝑡 Δ𝑡)limΔ𝑡ℎ 𝑡 qr J𝑆(𝑡)instantaneous death rate per unit time proportion of individuals still alive Equivalently, this is an instantaneous conditional death(event) rate.§ For the exponential distribution, this is constant. In discrete time, this is the probability of an event at 𝑡 givenno event up to 𝑡.24HST 190: Intro to Biostatistics

Hazard functions are natural constructs for dealing with thecensoring problem. Using the hazard function, we can now define null andalternative hypotheses to test whether two groups have differentsurvival distributions:§ 𝐻J : ℎ* (𝑡) ℎ; (𝑡) for all 𝑡 during the study§ 𝐻* : ℎ* (𝑡) ℎ; (𝑡) at some point during the study The Log-Rank Test is a direct application of the Mantel-Haenszeltest§ The study period is subdivided into k intervals. For each interval (orevent time), a 2x2 table is created (“death/no death” vs. “exposure/noexposure”). The test statistic is calculated from the k tables just as inMantel-Haenszel The only extra thing to worry about is to remove the censoredcases in between intervals25HST 190: Intro to Biostatistics

Steps in the Log-Rank test1) Subdivide the study period into 𝑘 intervals2) Create a 2x2 table for each interval;3) Calculate the 𝑋} test statisticevent -group 1aibini1group 2cidini2ai cibi dini4) Obtain p-value and make conclusion Note: we don’t test for constant hazard like we do forMantel Haenszel common odds ratio Table for interval 𝑖:§ 𝑛.* # people neither died nor censored at 𝑡.8*§ 𝑎. # out of the 𝑛.* who died before 𝑡.§ 𝑏. # out of the 𝑛.* who didn’t die before 𝑡.26HST 190: Intro to Biostatistics

3) Calculate the;𝑋} 8† 8J.‡ lˆ§ 𝑂 - a* 𝑂 - a* 𝑎 § 𝐸 - a* 𝐸 § 𝑉 - a* 𝑉 - (‹\ Œ \ ) (‹\ Œk\ ) a*Ž - (‹\ Œ \ ) (k\ Œ[\ )(‹\ Œk\ )( \ Œ[\ ) a*(mustŽ\l (Ž\ 8*)\be 5);4) If 𝐻J is true, then 𝑋} 𝜒*;§ Rejecting a hazard ratio of 1 is equivalent to rejecting equality ofthe survival functions27HST 190: Intro to Biostatistics

Regression modeling with survival data We have performed one-sample and two-samplecomparisons using survival data. Now we will see howregression techniques can be applied as well. The goal is to determine whether various covariates affectthe hazard (risk) of the event, and if so, by how much.§ Important to understand what assumptions we will make:RegressionDataDistributional ous“Survival”continuous & 0often censored𝑦 𝑁(𝛼 𝛽𝑥, 𝜎 ; )𝑒 fŒ 𝑦 Bin 1,1 𝑒 fŒ Any suggestions?HST 190: Intro to Biostatistics

Unfortunately, survival curves don’t necessarily follow anyspecial distribution.§ Sometimes a distribution will be assumed, but inference can beseriously distorted if the distribution is not correct. In survival analysis, a special model called a proportionalhazards model (or “Cox model”) is often used. It’s oneexample of a class of models called semiparametric models. This model assumes that the hazard rate for any individual isa function of covariates 𝑋* , , 𝑋- as follows:ℎ 𝑡 ℎJ 𝑡 𝑒 Œ Œ › ›ℎ 𝑡 lnℎJ 𝑡29 𝛽* 𝑥* 𝛽- 𝑥-HST 190: Intro to Biostatistics

In this model, ℎJ 𝑡 is called the baseline hazard rate.§ Given a hazard rate ℎ 𝑡 , then holding all else constant, what is theeffect of a one unit change of 𝑋* ?ℎœ ž 𝑡 ℎJ 𝑡 𝑒 Œ Œ › › ℎŸ ¡ 𝑡 ℎJ 𝑡 𝑒 (*Œ )Œ Œ › › 𝑒 ℎJ 𝑡 𝑒 Œ Œ › › 𝑒 ℎœ ž (𝑡) Thus, it is called a proportional-hazards model because covariatechanges change the hazard function by a constant proportionally Notice: we haven’t said anything about the baseline hazard rate.We make no assumptions about its shape.§ This is why the model is called semiparametric: We don’t completelyspecify the distribution of survival times, only that covariate changesaffect the hazard rate proportionately to whatever it was.30HST 190: Intro to Biostatistics

Suppose we analyze survival times using a Cox proportionalhazards model with covariates 𝑋* gender (Female 1), 𝑋; drug dosage. What is the ratio of hazards betweena man and a woman on the same dose of the drug?ℎ¡œ Ÿ (𝑡) ℎJ 𝑡 𝑒 Œ l l 𝑒ℎ Ÿ (𝑡)ℎJ 𝑡 𝑒 l l 𝛽* is the logarithm of the “hazard ratio”, which can bethought of as the instantaneous relative risk of death perunit time of a woman vs. a man, given that both havesurvived until time t and holding other covariates constant.31HST 190: Intro to Biostatistics

When performing “Cox regression”, statistical software willprovide parameter estimates and standard errors. Thesecan be used to perform hypothesis tests and confidenceintervals for the 𝛽’s as with any other type of regression. Testing the hypothesis 𝐻J : 𝛽 0 is often used as a way oftesting whether some covariate has an effect on survivaltime (equivalent to log rank test with one binary covariate!). Finally, remember that the proportional-hazards assumptionis just an assumption.§ Before the results of a Cox regression can be taken seriously, it’susually necessary to validate this assumption using graphical checksor statistical tests.32HST 190: Intro to Biostatistics

Checking proportional hazards assumption If we look at estimated KM survival curves, we can identifyproportional and non-proportional hazards assumptions We can also perform inference on time coefficient in modelℎ 𝑡 ℎJ 𝑡 𝑒 Œ Œ › ›Œ r Lastly, we could look at Schoenfeld residuals33HST 190: Intro to Biostatistics

Summary Estimation & one-sample inference§ Kaplan-Meier estimator§ C.I. for K-M estimator Two-sample comparisons§ Log-Rank test Regression modeling§ Proportional-Hazards (Cox) model Additional topics§ Left- and Interval-censoring§ Truncation§ Competing risks34HST 190: Intro to Biostatistics

In addition to estimating survival, we may want to compare two groups’ survival functions using a hypothesis test. Instead of having to test for the difference of a summary measure (e.g., mean survival in group 1 vs. group 2), we can actually compare the entire survival curves

Related Documents:

P9HST1 9mm Luger 124 HST 1150 P9HST2 9mm Luger 147 HST 1000 P9HST3 9mm Luger P 124 HST 1200 P9HST4 9mm Luger P 147 HST 1050 P357SHST1 357 Sig 125 HST 1360 P40HST3 40 S&W 165 HST 1130 P40HST1 40 S&W 180 HST 1010 P45GHST1 45 G.A.P. 230 HST 890 P

Priserne er Nissan vejl. udsalgspriser og er vist med registreringsafgift følgende satser og regler for 2018. 1 740 2 160 2 160 1 600 1 740 1 740 1 740 1 740 1 600 1 600 1 600 1 190 1 190 1 190 1 190 1 190 1 190 1 190 Grøn ejerafgift pr. halvår, kr. 1 190 1 190 1 190 1 190 2WD (forhjul) 2WD (forhjul) DIG-T 115, 6 MT DIG-T 115, Xtronic DIG-T .

Sept. 13 HST 146 resumes HST 162 resumes Sept. 20 Sept. 27 Oct. 4 Oct. 11 Indigenous Peoples Day; Columbus Day (Federal) – no classes Oct. 18 HST 162 ends Oct. 25 HST

- HST-3000 — Handheld Services Tester 3000. In this user's guide, "HST-3000" is used to refer to the HST-3000 family of products or to the combination of a base unit and attached SIM. "HST" is also sometimes used to refer to the base unit/SIM combination. - SIM — Service Interface Module. Sometimes referred to

4 HST 190: Intro to Biostatistics The classical definition is based on the notion of equally likely outcomes §symmetry §e.g, flipping a coin or rolling a die §intuitive but not straightforward to apply in more general settings In statistical practice there are two main schools of thought or paradigms: §the frequentist paradigm

Master of Science in Biostatistics (MSIBS) Master of Science in Biostatistics and Data Science (MSBDS) 1.1.2. Core Program Coursework for All Programs Statistical Computing with SAS , Introduction to R for Data Science, Biostatistics I, Biostatistics II, Study Design and Clinical Trials, Ethics for

HST Curare Zusatz zum HST Zeitschrift f ur Medizinethnologie Erschverlauf 1.1978 - ISSN 0344-8622 Sigel Signatur Standort Bestand 305 - - 2.1979 - 13.1990 303 - - 12.1989 - 20.1997,2 431 Z 182 - 1.1978 - ZDB-Id 352835-2 HST Erziehungswissenschaft, Erziehungspraxis Zusatz zum HST Vierteljahresschr. d. Deutschsprachigen Sektion d. Weltbun-

HST Curare Zusatz zum HST Zeitschrift f ur Medizinethnologie Erschverlauf 1.1978 - ISSN 0344-8622 Sigel Signatur Standort Bestand 305 - - 2.1979 - 13.1990 303 - - 12.1989 - 20.1997,2 431 Z 182 - 1.1978 - ZDB-Id 352835-2 HST Erziehungswissenschaft, Erziehungspraxis Zusatz zum HST Vierteljahresschr. d. Deutschsprachigen Sektion d. Weltbun-