---
title: "Guide to using parmsurvfit"
author: "Ashley Jacobson, Victor Wilson, and Shannon Pileggi"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Guide to Using parmsurvfit}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction
 
The `parmsurvfit` package executes basic parametric survival analysis techniques similar to those in 'Minitab'. Among these are fitting right-censored data, assessing fit, plotting survival functions, and summary statistics and probabilities.


# Fitting right censored survival data

The `fit_data` function produces maximum likelihood estimates (MLE) for right 
censored data based on a specified distribution.  Here,

* `time`: time-to-event variable
* `censor`: censoring status variable (0 = right-censored; 1 = complete)

Common survival distributions include: Weibull (`weibull`), 
log-normal (`lnorm`), exponential (`exp`), and logistic (`logis`). 


## Example

```{r}
library(parmsurvfit)

fit_data(data = firstdrink, 
         dist = "weibull", 
         time = "age", 
         censor = "censor")
```

# Assessing fit

Assess fit graphically with histograms and overlaid density curves or numerically with the Anderson Darling adjusted test statistic.

## Histograms with density curves

All time to event data are plotted regardless of censoring status.

```{r}
plot_density(data = firstdrink, 
             dist = "weibull", 
             time = "age", 
             censor = "censor", 
             by = "gender")
```


## PP-plots

creates a percent-percent plot of right-censored data given that it follows a specified distribution. Points are plotted according to the [median rank method](https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/reliability/how-to/parametric-distribution-analysis-right-censoring/methods-and-formulas/probability-plot/) to accommodate the right-censored values.

```{r}
plot_ppsurv(data = firstdrink, 
            dist = "weibull", 
            time = "age", 
            censor = "censor")
```


## Anderson-Darling test statistic

The Anderson-Darling (AD) test statistic provides a numerical measure of fit such that lower values indicate a better fit. Computation of the test statistic adhered to [Minitab's documentation](https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/reliability/how-to/parametric-distribution-analysis-right-censoring/methods-and-formulas/goodness-of-fit-measures/), utilizing the median rank plotting method.

```{r}
compute_AD(data = firstdrink, 
           dist = "weibull", 
           time = "age", 
           censor = "censor")
```

# Survival, hazard, and cumulative hazard functions

The survival function $S(t)$ estimates the proportion of subjects that survive beyond a specified time $t$.

```{r}
plot_surv(data = firstdrink, 
          dist = "weibull", 
          time = "age", 
          censor = "censor", 
          by = "gender")
```

The hazard function, denoted $h(t)$, estimates the conditional risk that a subject will experience the event of interest in the next instant of time, given that the subject has survived beyond a certain time $t$.

```{r}
plot_haz(data = firstdrink, 
         dist = "weibull", 
         time = "age", 
         censor = "censor",
         by = "gender")
```

The cumulative hazard function, denoted $H(t)$, is the total accumulated risk of experiencing an event up to time $t$.  

```{r}
plot_cumhaz(data = firstdrink, 
            dist = "weibull", 
            time = "age", 
            censor = "censor",  
            by = "gender")
```


# Probabilities and statistics

A survival probability estimates the probability that a subject survives (does not 
experience the event of interest) beyond a specified time $t$.

```{r}
surv_prob(data = firstdrink, 
          dist = "weibull", 
          x = 30, 
          lower.tail = F, 
          time = "age", 
          censor = "censor", 
          by = "gender")
```

Various summary statistics, including mean, median, standard deviation, and percentiles of survival time.  All summary statistics from the class `fitdistcens` are provided.  If the distribution supplied is one of normal, lognormal, exponential, weibull, or logistic then the standard deviation reported is an exact computation from parameter estimates; however, if a user specifies a distribution other than that from this list, then the standard deviation is estimated from 1,000 randomly generated values from the 
distribution.   

```{r stats}
surv_summary(data = firstdrink, 
             dist = "weibull", 
             time = "age", 
             censor = "censor", 
             by = "gender")
```