--- title: "Guide to using parmsurvfit" author: "Ashley Jacobson, Victor Wilson, and Shannon Pileggi" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Guide to Using parmsurvfit} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction The `parmsurvfit` package executes basic parametric survival analysis techniques similar to those in 'Minitab'. Among these are fitting right-censored data, assessing fit, plotting survival functions, and summary statistics and probabilities. # Fitting right censored survival data The `fit_data` function produces maximum likelihood estimates (MLE) for right censored data based on a specified distribution. Here, * `time`: time-to-event variable * `censor`: censoring status variable (0 = right-censored; 1 = complete) Common survival distributions include: Weibull (`weibull`), log-normal (`lnorm`), exponential (`exp`), and logistic (`logis`). ## Example ```{r} library(parmsurvfit) fit_data(data = firstdrink, dist = "weibull", time = "age", censor = "censor") ``` # Assessing fit Assess fit graphically with histograms and overlaid density curves or numerically with the Anderson Darling adjusted test statistic. ## Histograms with density curves All time to event data are plotted regardless of censoring status. ```{r} plot_density(data = firstdrink, dist = "weibull", time = "age", censor = "censor", by = "gender") ``` ## PP-plots creates a percent-percent plot of right-censored data given that it follows a specified distribution. Points are plotted according to the [median rank method](https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/reliability/how-to/parametric-distribution-analysis-right-censoring/methods-and-formulas/probability-plot/) to accommodate the right-censored values. ```{r} plot_ppsurv(data = firstdrink, dist = "weibull", time = "age", censor = "censor") ``` ## Anderson-Darling test statistic The Anderson-Darling (AD) test statistic provides a numerical measure of fit such that lower values indicate a better fit. Computation of the test statistic adhered to [Minitab's documentation](https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/reliability/how-to/parametric-distribution-analysis-right-censoring/methods-and-formulas/goodness-of-fit-measures/), utilizing the median rank plotting method. ```{r} compute_AD(data = firstdrink, dist = "weibull", time = "age", censor = "censor") ``` # Survival, hazard, and cumulative hazard functions The survival function $S(t)$ estimates the proportion of subjects that survive beyond a specified time $t$. ```{r} plot_surv(data = firstdrink, dist = "weibull", time = "age", censor = "censor", by = "gender") ``` The hazard function, denoted $h(t)$, estimates the conditional risk that a subject will experience the event of interest in the next instant of time, given that the subject has survived beyond a certain time $t$. ```{r} plot_haz(data = firstdrink, dist = "weibull", time = "age", censor = "censor", by = "gender") ``` The cumulative hazard function, denoted $H(t)$, is the total accumulated risk of experiencing an event up to time $t$. ```{r} plot_cumhaz(data = firstdrink, dist = "weibull", time = "age", censor = "censor", by = "gender") ``` # Probabilities and statistics A survival probability estimates the probability that a subject survives (does not experience the event of interest) beyond a specified time $t$. ```{r} surv_prob(data = firstdrink, dist = "weibull", x = 30, lower.tail = F, time = "age", censor = "censor", by = "gender") ``` Various summary statistics, including mean, median, standard deviation, and percentiles of survival time. All summary statistics from the class `fitdistcens` are provided. If the distribution supplied is one of normal, lognormal, exponential, weibull, or logistic then the standard deviation reported is an exact computation from parameter estimates; however, if a user specifies a distribution other than that from this list, then the standard deviation is estimated from 1,000 randomly generated values from the distribution. ```{r stats} surv_summary(data = firstdrink, dist = "weibull", time = "age", censor = "censor", by = "gender") ```