Introduction to survival analysis (slides)
Introduction
- Health economic models often rely on estimates of time-to-event (TTE; known as
survival analysis
) for outcomes.
- TTE or survival analysis:
- used to provide estimates of survivor functions and event rates that inform these models.
- characterises event rates over an observed period of time, such as the duration of a clinical trial.
- employed to extrapolate over lifetime.
- Cost-effectiveness estimates can be sensitive to the methods applied in modelling survival data.
Outline
- Fundamental concepts:
- Time to event data
- Censoring
- Kaplan-Meier curve
- Parametric survival modelling
- Introduction to standard parametric survival models
Survival analysis
- In survival analysis, we are interested in
time to a specific event
and how risk factors or treatments affect the time to that event.
Time
: years, months etc. from the beginning of follow-up of an individual until an event occurs
Event
: disease incidence, relapse or any designated experience of interest, MI, AIDS for HIV patients, tumor recurrence, death [thus, survival
analysis]
Survival analysis
- Standard statistical methods are inappropriate for survival analysis:
Positive skew
:
- Survival data are non-negative and hence tend to be positively skewed
- Normality assumption required for some statistical methods does not hold
Censoring
Cumulative hazard function
\(H(t)\): The accumulation of conditional hazards up until a particular point in time, i.e., area under the hazard function until \(t\)
\[H(t)=\int_{0}^{t}h(u)du\]
Relationships between functions
- Survivor function can be written in terms of the (cumulative) hazard function and vice versa
\[S(t)=e^{-H(t)}\] \[H(t)=-ln(S(t))\]
Censoring
- Reasons censoring:
- When we have some information about individual’s survival time, but exact survival time is not known
- Individual does not experience the event before the study ends
- Individual is lost to follow-up during the study period
- Individual withdraws from the study (e.g. due to adverse drug reaction)
- Example: study ends while the patient is still alive, then that patient’s survival time is considered censored
- We know the survival time is at least as long as the period that the person has been followed up for
Censoring
Figure 3: Censoring
Source: Dey, T., Mukherjee, A., & Chakraborty, S. (2020). A practical overview and reporting strategies for statistical analysis of survival studies. Chest, 158(1), S39-S48.
Truncation
- Exclusion of observations based on event time or due to restrictions in the selection process.
- Truncation is due to sampling bias that only those individuals whose lifetimes lie within a certain interval can be observed.
- Left Truncation:
- Occurs when short survival times are missing from data, potentially introducing bias.
- Occurs when the subjects have been at risk before entering the study (e.g., life insurance policy holders where the study starts on a fixed date, event of interest is age at death).
- Right Truncation:
- Occurs when the entire study population has already experienced the event of interest (e.g., a historical survey of patients on a cancer registry)
Truncation
Figure 4: Truncation
- Different types of censoring and truncation in calendar time (left panel) and analysis time (right panel).
- Dots are events and arrows indicate censoring.
Source: https://duyngocnguyen.files.wordpress.com/2022/06/image-24.png?w=656
Kaplan-Meier Method
Method to compute survival time:
- Order event (survival) times (from earliest to latest)
- number of subjects at risk of experiencing event just before time \(t\) of interval \(j\): \(n_j\)
- Will depend on number of patients censored prior to time \(t\) of interval \(j\)
- number of failures (events) at time \(t_j\): \(d_j\)
- probability of surviving beyond time \(t_j\): \(\frac{n_{j}-d_{j}}{n_{j}}\)
- between \(t_{j}\) and just before \(t_{j+1}\) there are zero events
- censored observations deemed to occur just after \(t_{j}\) \[\hat{S}(t)=\prod_{t_{i}\le t}(1-\frac{d_{i}}{n_{i}})\]
Kaplan-Meier Survivor Function
- Step function:
- Survivor function constant between event times, decreasing at each event time
- Survivor function undefined where \(t>t^{max=censored}\)
Figure 5: Kaplan-Meier Curve
KM Life Table Example: Time to treatment discontinuation
Table 1: Life table
Row # |
Time_days |
Number At Risk |
Treatment discontinuation |
Censor |
Proportion event free in interval |
Cumulative Survival St |
Formula |
1 |
0 |
|
|
|
|
1 |
|
2 |
2 |
100 |
0 |
1 |
1 |
1 |
=(B2-C2)/B2 |
3 |
6 |
99 |
1 |
0 |
0.98989899 |
0.98989899 |
=(B3-C3)/B3*F2 |
4 |
8 |
98 |
1 |
0 |
0.989795918 |
0.97979798 |
|
5 |
14 |
97 |
2 |
1 |
0.979381443 |
0.95959596 |
|
6 |
21 |
94 |
1 |
0 |
0.989361702 |
0.949387492 |
|
7 |
22 |
93 |
3 |
1 |
0.967741935 |
0.918762089 |
|
8 |
27 |
89 |
2 |
0 |
0.97752809 |
0.89811575 |
|
9 |
30 |
87 |
0 |
1 |
1 |
0.89811575 |
|
10 |
35 |
86 |
1 |
1 |
0.988372093 |
0.887672544 |
|
11 |
40 |
84 |
2 |
0 |
0.976190476 |
0.866537483 |
|
KM example
Example data for surviving not looking at phones:
1 |
15 |
1 |
1 |
2 |
60 |
1 |
1 |
3 |
25 |
1 |
1 |
4 |
40 |
1 |
1 |
5 |
10 |
0 |
1 |
6 |
26 |
1 |
2 |
7 |
45 |
1 |
2 |
8 |
30 |
0 |
2 |
9 |
5 |
1 |
2 |
10 |
55 |
1 |
2 |
Results:
Call: npsurv(formula = Surv(time, Event) ~ 1, data = km_workshop)
n events median 0.95LCL 0.95UCL
[1,] 10 8 40 25 NA
Comparison of groups
- The Kaplan-Meier function depicts the survival function for a single group
- Plotting the KM for multiple groups we begin to compare across groups
- Comparison is often of treatment arms in RCTs, but could be by risk groups
- Hypotheses testing for differences between groups
- Log rank - equal weighting for all failure times
- Alternatives: Wilcoxon - weighted by risk set
Comparison of groups
- Cox proportional hazards
- Semi-parametric
- Linear component makes no assumption about underlying functional form \[h(t|x_{j})=h_{0}(t)e^{x_{j}\beta_{x}}\]
Parametric survival modelling
- A parametric survival model can be fitted to the survival data
- Assumes survival time follows a distribution
- Enables prediction of survival times beyond the follow-up of a clinical trial
- Unbiased estimate of mean survival (accounts for censoring)
- Generates survival curves more consistent with theoretical example than KM curve
Parametric survival modelling
Figure 6: Fitted parametric survival function
Types of parametric survival models
Main model types:
PH models
- Hazard at time \(t\) is the product of two quantities:
- Baseline hazard, \(h_{0}(t)\)
- Exponential expression of linear sum of explanatory variables (\(X\)) and coefficients \((\beta)\)
- Baseline hazard is a function of \(t\) but not \(X\)
- Linear predictor includes \(X\) but not \(t\)
PH models
Table 2: Examples of parametric PH models
Exponential |
Constant |
\(\lambda\) |
Weibull |
Monotonic |
\(\lambda\), \(p\) |
Gompertz |
Monotonic |
\(\lambda\), \(\theta\) |
Figure 7: PH models (Stata manual)
AFT models
The effect of treatment is interpreted in terms of its effect on the time-to-event, relative to control
- Relative treatment effect referred to as a time ratio (TR)
- Difference in the log event times between treatment arms is the log TR
- Acceleration factor (AF) is a simple transformation of TR
Exponential and Weibull can be used as both PH and AFT models
AFT models
Table 3. Examples of parametric AFT models
Weibull |
Monotonic |
\(\lambda\), \(p\) |
Lognormal |
Non-monotonic |
\(\sigma\), \(\mu\), \(\phi\) |
Log-logistic |
Non-monotonic |
\(\gamma\), \(\lambda\) |
Generalised gamma |
Non-monotonic |
\(\kappa\), \(\mu\), \(\theta\) |
Figure 8: AFT models (Stata manual)
AFT assumptions and associated test
Assumptions:
- Relative treatment effect acts multiplicatively on the time-to-event (i.e. time-to-event x TR)
- Relative treatment effect is constant over time
Tested by plotting the survival time quantiles for treatment against those for control (quantile-quantile [QQ] plot):
- Plot of survival at times \(t_{q}\) for equally spaced apart quartiles \(q\)
- Straight lines indicate a multiplicative effect of treatment on time, validating TRs as an appropriate measure of relative treatment effect.
Conditions of use of the methods discussed in the TSD
- Patient level data are available.
- If only summary statistics are available, methods introduced by Guyot el al. (2012) to recreate patient level data must be used.
- If evidence synthesis is required to include all relevant comparators within a TA, the methods discussed in the TSD should not be utilized.
Survival analysis modelling methods
- Two main approaches to survival analysis modelling: parametric and non-parametric.
- Parametric models assume that the survival distribution follows a specific parametric form, such as the Weibull or exponential distribution.
- Non-parametric models do not make any assumptions about the form of the survival distribution and estimate the survival function directly from the data.
- The choice of model depends on the nature of the data and the research question.
Parametric Survival Analysis Modelling
- Parametric models are often used when there is a clear theoretical basis for the assumed distribution.
- They can be more efficient than non-parametric models when the data is well-fitted by the assumed distribution.
- However, parametric models can be sensitive to violations of their assumptions.
Non-Parametric Survival Analysis Modelling
- Non-parametric models are more flexible than parametric models and can be used with any type of data.
- They are less sensitive to violations of assumptions than parametric models.
- However, non-parametric models can be less efficient than parametric models.
Common Parametric Survival Analysis Models
- The exponential distribution is the simplest parametric model as it incorporates a hazard function that is constant over time, and therefore it has only one parameter, \(\lambda\).
- The Weibull distribution can be parameterised either as a PH model or an AFT model.
- Similar to the Weibull distribution the Gompertz has two parameters – a shape parameter and a scale parameter. Also similar to the Weibull distribution the hazard in the Gompertz distribution increases or decreases monotonically.
- The log-normal distribution is another flexible parametric model that can be used to fit survival curves that are skewed.
Common Non-Parametric Survival Analysis Models
- The Kaplan-Meier estimator is a non-parametric estimator of the survival function that is based on the observed survival times.
- The Nelson-Aalen estimator is another non-parametric estimator of the cumulative hazard function.
- The Cox proportional hazards model is a semi-parametric model that can be used to adjust for covariates.
Choosing a Survival Analysis Modelling Method
- The choice of survival analysis modelling method depends on the nature of the data, the research question, and the assumptions that can be made about the survival distribution.
- It is important to consider the strengths and weaknesses of each method before making a decision.
Research Question
- Comparison of survival curves: Parametric or semi-parametric models like Weibull or Cox proportional hazards might be suitable.
- Estimating cumulative hazard (risk over time): Non-parametric models like Kaplan-Meier or Nelson-Aalen could be better choices.
- Predicting individual survival times: Specialized models like frailty models might be appropriate.
Model Assumptions
- Parametric models: Check if the data fits the assumed distribution (e.g., Weibull, exponential). Misfits can lead to unreliable results.
- Non-parametric models: No strict assumptions about the distribution, but may be less efficient than parametric models if the data fits a known form.
Goodness-of-Fit:
- Graphical methods: Plot observed vs. predicted survival curves, check for deviations and trends.
- Statistical tests: Chi-square tests, Kolmogorov-Smirnov tests, etc., check for statistically significant differences between observed and predicted survival.
- Information criteria: Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) compare different models based on their complexity and fit.
Interpreting Results:
- Survival curves: Visualize and compare the probability of event-free survival over time for different groups or treatments.
- Hazard rates: Analyze the instantaneous risk of an event occurring at any given time point.
- Confidence intervals: Understand the range of uncertainty surrounding the estimated model parameters.
- Sensitivity analysis: Explain how robust the results are to potential changes in assumptions or data.