Bayesian Survival Analysis

class: center, middle, inverse, title-slide

# Bayesian Survival Analysis
## test on the JACC study data
### Chaochen Wang <br> Aichi Medical University
### 2020-01-14

---

class: middle
# Introduction

- Survival analysis is at the core of epidemiological data analysis.

- The Cox Proportional Hazard (PH) regression model is (still the most) popular model.

-  because of its familiarity and convenience

- Recently, there has been increased interest on applications of survival analysis based on Bayesian methodology.

---
class: middle

# Objectives

- To try and use a parametric Bayesian Survival Model approach and compare with the traditional PH model.

- The null hypothesis is: the risk of mortality from stroke is the same for people with different milk intake habits.

---
class: middle

# Data for analysis

- The JACC study data:

- 110585 participants (46395 men and 64190 women) aged between 40-79 at baseline
    
---
class: middle

# Exclusion criteria

.med[
1. With stroke history: n = 1496 (915 men and 581 wmen);
2. With cancer history: n = 1461 (411 men and 1050 women);
3. With myocardial infarction history: n = 2994 (1310 men and 1684 women);
4. With ischemic heart diseases history: n = 186 (91 men, 95 women);
5. With other heart diseases history (ICD9 codes: 420-429):  n = 518 (204 men and 314 women);
6. Did not answer the question about milk intake frequency: n = 9545 (3593 men and 5952 women); 
7. Finally, n = 94385 (39386 men and 54999 women) are left in the data
]

---
class: middle

# N of events:

- Total Stroke mortality confirmed: 
  - 2675 (1352 men and 1323 women)
  
- Subtypes of stroke: 
  - Ischemic stroke: <br>957 (520 men and 437 women)
  - Hemorrhagic stroke: <br>952 (432 men and 520 women)

---
class: right
background-image: url("fig/KMfigMen.png")
background-position: 60% 50%
background-size: contain
## Men

---
class: right
background-image: url("fig/KMfigwomen.png")
background-position: 60% 50%
background-size: contain
## Women

---
class: middle, center
# Baseline characteristics

(please see the handout Table 1 & 2)

(skipped in the talk)

---
class: middle,

# Traditional PH model results

(please see the handout Table 3)

.med[
1. Total stroke: <br> only daily drinkers in men had somewhat lower hazard ratio (0.87, 95%CI: 0.76, 1.01)

2. Hemorrhagic stroke: <br> no apparent association for men/women

3. Ischemic stroke: <br> men had milk intake more than 1-2 t/w had **"significantly"** lower hazard compared with non-drinkers
]

---
class: middle, center
# Do the results really answer what we are interested in?

---
class: middle
## Interpretation of the P value:

- The earth is round  (p < 0.05).

.pull-left[
<img src="fig/pvalueyay.jpg" width="85%" />
]

.pull-right[
<img src="fig/pvalue.png" width="95%" />
]

???
The p-value is the probability of the observed outcome plus all “more extreme” outcomes under the assumption that the experiment can be repeated for infinite times.

---
class: middle, inverse
background-image: url("fig/CI.gif")
background-position: 50% 50%
background-size: contain

---
class: middle

## Assumptions:

1. The experiment can be repeated.

2. There is a true value of interest (HR) that does not move.

Both are **unrealistic** for a huge cohort study like the JACC study, and may not help in answering the question that we are really interested in.

---
class: middle

## The real research questions are:

- What is **the probability** that people who drink milk everyday (or other frequency) had lower hazard of dying from stroke compared with people who don't ?

- If someone does not understand the concept of "hazard", then, the question should be changed into: <br> do people drink milk everyday (or other frequency) have **longer survival time/slower rate of dying** that is free from stroke mortality risk?

---
class: middle

## Bayesian parametric survival model can help us:

- Weibull model is useful because it can be expressed as both
    - accelerated failure time (AFT) model 
    - proportional hazard model

(details skipped, please refer to the handout.)

---
class: middle

### Codes for the crude Weibull survival model

We use the [**BUGS**](https://www.mrc-bsu.cam.ac.uk/software/bugs/) (**B**ayesian inference **U**sing **G**ibbs **S**ampling) language to describe the model:

.small[
```r
jacc.weibull.model <- function() {
  for(i in 1:39386){# loop through the male subjects
*    is.censored[i] ~ dinterval(t[i], c[i])
    t[i] ~ dweib(shape, lambda[i])
    lambda[i] <- exp(-mu[i] * shape)
*    mu[i] <- beta[1] + beta[2] * equals(Mlkfre[i], 2) +
*      beta[3] * equals(Mlkfre[i], 3) + 
*      beta[4] * equals(Mlkfre[i], 4) +
*      beta[5] * equals(Mlkfre[i], 5) 
  }
```
] 
    
---
class: middle

### Specify the priors, and generate values

.small[
```r
  ## priors for betas
  for(j in 1:5){
    beta[j] ~ dnorm(0, 0.001)
  } # increase the number here when adding covariates
  
  ### prior for shape
  shape ~ dgamma(.001, .001)
  
  ### Generated values
*  AFT[2] <- exp(beta[2])
*  HR[2] <- exp(shape * beta[2])
*  p.crit[2] <- step(1 - HR[2]) # = 1 if HR[2] < 1
  
  AFT[3] <- exp(beta[3])
  HR[3] <- exp(shape * beta[3])
  p.crit[3] <- step(1 - HR[3])
  
  AFT[4] <- exp(beta[4])
  HR[4] <- exp(shape * beta[4])
  p.crit[4] <- step(1 - HR[4])
  
  AFT[5] <- exp(beta[5])
  HR[5] <- exp(shape * beta[5])
  p.crit[5] <- step(1 - HR[5])
```
]

---
class: middle

### Run the model (the crude one took me about 1 hour...)

.small[
```r
library(R2jags) #conect to jags sampling engine in R
jagsfit <- jags.parallel(
                   data = JACCdata,
                   parameters.to.save = 
                              c("beta[5]", "HR[5]", "AFT[5]"),
                   #add more variables if you want to check more
*                   n.iter = 10000, 
*                   n.burnin = 2500, 
*                   n.chains = 3,
                   model.file = jacc.weibull.model)
```
]

---
class: middle, center, inverse

# Output and model checking

---
class: middle

### Direct output:(1)

.small[
```r
# Inference for Bugs model at "jacc.weibull.model", fit using jags,
#  3 chains, each with 10000 iterations (first 2500 discarded), n.thin = 7
#  n.sims = 3213 iterations saved. Sample size per chain = 1071 
# 1. Empirical mean and standard deviation for each variable,
#    plus standard error of the mean:
#                   Mean        SD   Naive SE Time-series SE
# AFT[2]        0.922303  0.064441 0.00113685     0.00420403
# AFT[3]        0.834723  0.051485 0.00090830     0.00362857
# AFT[4]        0.848329  0.052439 0.00092513     0.00401102
# AFT[5]        0.928095  0.041834 0.00073803     0.00336613
# HR[2]         0.890533  0.090586 0.00159811     0.00604246
# HR[3]         0.769913  0.069116 0.00121934     0.00486858
# HR[4]         0.788205  0.070624 0.00124594     0.00531505
*# HR[5]         0.897882  0.058477 0.00103164     0.00468654
*# p.crit[2]     0.871460  0.334743 0.00590548     0.01814848
*# p.crit[3]     0.996265  0.061008 0.00107630     0.00214585
*# p.crit[4]     0.998133  0.043180 0.00076177     0.00097572
*# p.crit[5]     0.953626  0.210327 0.00371056     0.02050164
```
]

---
class: middle

### Direct output (2)

.small[
```r
# 2. Quantiles for each variable:
#                   2.5%          25%          50%         75%       97.5%
# AFT[2]        0.807308     0.876649     0.918049     0.96495     1.05225
# AFT[3]        0.738423     0.799925     0.832078     0.86719     0.94214
# AFT[4]        0.749645     0.810952     0.847050     0.88448     0.95291
# AFT[5]        0.847796     0.899999     0.927850     0.95594     1.01147
# HR[2]         0.733186     0.825541     0.883106     0.94896     1.07725
# HR[3]         0.644053     0.721892     0.765778     0.81289     0.91643
# HR[4]         0.658787     0.737562     0.785538     0.83614     0.93282
# HR[5]         0.788623     0.857453     0.896668     0.93613     1.01665
```
]

---
class: middle, inverse
background-image: url("fig/traceplot.png")
background-position: 50% 50%
background-size: contain

---
class: middle, inverse
background-image: url("fig/autocor.png")
background-position: 50% 50%
background-size: contain

---
class: middle

### (Crude) hazard ratios and acceleration factors

<table class="table" style="font-size: 17px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;"></caption>
 <thead>
<tr>
<th style="border-bottom:hidden" colspan="1"></th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="5"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Hazard ratio (HR)</div></th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Acceleration factor (AF)</div></th>
</tr>
  <tr>
   <th style="text-align:left;"> Milk intake </th>
   <th style="text-align:left;"> Median </th>
   <th style="text-align:left;"> Mean (SD) </th>
   <th style="text-align:left;"> 95% CrI </th>
   <th style="text-align:left;"> MCSE </th>
   <th style="text-align:left;"> Probability </th>
   <th style="text-align:left;"> Median </th>
   <th style="text-align:left;"> Mean (SD) </th>
   <th style="text-align:left;"> 95% CrI </th>
   <th style="text-align:left;"> MCSE </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Never </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-2 t/Mon </td>
   <td style="text-align:left;"> 0.88 </td>
   <td style="text-align:left;"> 0.89 (0.09) </td>
   <td style="text-align:left;"> (0.73, 1.08) </td>
   <td style="text-align:left;"> 0.0016 </td>
   <td style="text-align:left;"> 87.15% </td>
   <td style="text-align:left;"> 0.92 </td>
   <td style="text-align:left;"> 0.92 (0.06) </td>
   <td style="text-align:left;"> (0.81, 1.06) </td>
   <td style="text-align:left;"> 0.0011 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-2 t/Week </td>
   <td style="text-align:left;"> 0.77 </td>
   <td style="text-align:left;"> 0.77 (0.07) </td>
   <td style="text-align:left;"> (0.64, 0.92) </td>
   <td style="text-align:left;"> 0.0012 </td>
   <td style="text-align:left;"> 99.63% </td>
   <td style="text-align:left;"> 0.83 </td>
   <td style="text-align:left;"> 0.83 (0.05) </td>
   <td style="text-align:left;"> (0.74, 0.94) </td>
   <td style="text-align:left;"> 0.0009 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3-4 t/Week </td>
   <td style="text-align:left;"> 0.79 </td>
   <td style="text-align:left;"> 0.79 (0.07) </td>
   <td style="text-align:left;"> (0.66, 0.93) </td>
   <td style="text-align:left;"> 0.0012 </td>
   <td style="text-align:left;"> 99.81% </td>
   <td style="text-align:left;"> 0.85 </td>
   <td style="text-align:left;"> 0.85 (0.05) </td>
   <td style="text-align:left;"> (0.75, 0.95) </td>
   <td style="text-align:left;"> 0.0009 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Daily </td>
   <td style="text-align:left;"> 0.89 </td>
   <td style="text-align:left;"> 0.90 (0.06) </td>
   <td style="text-align:left;"> (0.79, 1.02) </td>
   <td style="text-align:left;"> 0.001 </td>
   <td style="text-align:left;"> 95.36% </td>
   <td style="text-align:left;"> 0.93 </td>
   <td style="text-align:left;"> 0.93 (0.04) </td>
   <td style="text-align:left;"> (0.85, 1.01) </td>
   <td style="text-align:left;"> 0.0007 </td>
  </tr>
</tbody>
<tfoot>
<tr><td style="padding: 0; border: 0;" colspan="100%"><span style="font-style: italic;">Note: </span></td></tr>
<tr><td style="padding: 0; border: 0;" colspan="100%">
<sup></sup> Abbreviations: SD, standard deviation; CrI, credible interval; MCSE, Monte Carlo Standard</td></tr>
<tr><td style="padding: 0; border: 0;" colspan="100%">
<sup></sup>  Error; Probability indicates that the p for HR smaller than 1.</td></tr>
</tfoot>
</table>

---
class: middle, inverse
background-image: url("fig/CrudeHRdist.png")
background-position: 50% 50%
background-size: cover

---
class: middle

### (Age-adjusted) hazard ratios and acceleration factors

<table class="table" style="font-size: 17px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;"></caption>
 <thead>
<tr>
<th style="border-bottom:hidden" colspan="1"></th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="5"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Hazard ratio (HR)</div></th>
<th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Acceleration factor (AF)</div></th>
</tr>
  <tr>
   <th style="text-align:left;"> Milk intake </th>
   <th style="text-align:left;"> Median </th>
   <th style="text-align:left;"> Mean (SD) </th>
   <th style="text-align:left;"> 95% CrI </th>
   <th style="text-align:left;"> MCSE </th>
   <th style="text-align:left;"> Probability </th>
   <th style="text-align:left;"> Median </th>
   <th style="text-align:left;"> Mean (SD) </th>
   <th style="text-align:left;"> 95% CrI </th>
   <th style="text-align:left;"> MCSE </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Never </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> - </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-2 t/Mon </td>
   <td style="text-align:left;"> 0.98 </td>
   <td style="text-align:left;"> 0.98 (0.11) </td>
   <td style="text-align:left;"> (0.78, 1.19) </td>
   <td style="text-align:left;"> 0.0018 </td>
   <td style="text-align:left;"> 57.86% </td>
   <td style="text-align:left;"> 0.98 </td>
   <td style="text-align:left;"> 0.99 (0.06) </td>
   <td style="text-align:left;"> (0.87, 1.11) </td>
   <td style="text-align:left;"> 0.0011 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1-2 t/Week </td>
   <td style="text-align:left;"> 0.84 </td>
   <td style="text-align:left;"> 0.84 (0.08) </td>
   <td style="text-align:left;"> (0.69, 0.99) </td>
   <td style="text-align:left;"> 0.0014 </td>
   <td style="text-align:left;"> 97.79% </td>
   <td style="text-align:left;"> 0.9 </td>
   <td style="text-align:left;"> 0.90 (0.05) </td>
   <td style="text-align:left;"> (0.80, 0.99) </td>
   <td style="text-align:left;"> 0.0008 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 3-4 t/Week </td>
   <td style="text-align:left;"> 0.86 </td>
   <td style="text-align:left;"> 0.86 (0.08) </td>
   <td style="text-align:left;"> (0.71, 1.02) </td>
   <td style="text-align:left;"> 0.0014 </td>
   <td style="text-align:left;"> 95.92% </td>
   <td style="text-align:left;"> 0.91 </td>
   <td style="text-align:left;"> 0.91 (0.05) </td>
   <td style="text-align:left;"> (0.82, 1.01) </td>
   <td style="text-align:left;"> 0.0009 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Daily </td>
   <td style="text-align:left;"> 0.75 </td>
   <td style="text-align:left;"> 0.75 (0.05) </td>
   <td style="text-align:left;"> (0.67, 0.85) </td>
   <td style="text-align:left;"> 0.0009 </td>
   <td style="text-align:left;"> 100.00% </td>
   <td style="text-align:left;"> 0.85 </td>
   <td style="text-align:left;"> 0.85 (0.03) </td>
   <td style="text-align:left;"> (0.79, 0.91) </td>
   <td style="text-align:left;"> 0.006 </td>
  </tr>
</tbody>
<tfoot>
<tr><td style="padding: 0; border: 0;" colspan="100%"><span style="font-style: italic;">Note: </span></td></tr>
<tr><td style="padding: 0; border: 0;" colspan="100%">
<sup></sup> Abbreviations: SD, standard deviation; CrI, credible interval; MCSE, Monte Carlo Standard</td></tr>
<tr><td style="padding: 0; border: 0;" colspan="100%">
<sup></sup>  Error; Probability indicates that the p for HR smaller than 1.</td></tr>
</tfoot>
</table>

---
class: middle, inverse
background-image: url("fig/AgeHRdist.png")
background-position: 50% 50%
background-size: cover

---
class: middle, center

# Unfortunately, the other results are still under calculation....

The age adjusted model took about 3 hours.

The multivariable adjusted model has been runing for about 10 hours, and still running.

--
I really need a faster computer.

---
class: middle, center

# The estimated HRs are mostly similar to the results from Cox models.

---
class: middle, center

# And we found a way to answer the real research questions in our mind.

---
class: middle, center

# Thanks and enjoy