class: center, middle, inverse, title-slide # Bayesian Survival Analysis ## test on the JACC study data ### Chaochen Wang
Aichi Medical University ### 2020-01-14 --- class: middle # Introduction - Survival analysis is at the core of epidemiological data analysis. - The Cox Proportional Hazard (PH) regression model is (still the most) popular model. - because of its familiarity and convenience - Recently, there has been increased interest on applications of survival analysis based on Bayesian methodology. --- class: middle # Objectives - To try and use a parametric Bayesian Survival Model approach and compare with the traditional PH model. - The null hypothesis is: the risk of mortality from stroke is the same for people with different milk intake habits. --- class: middle # Data for analysis - The JACC study data: - 110585 participants (46395 men and 64190 women) aged between 40-79 at baseline --- class: middle # Exclusion criteria .med[ 1. With stroke history: n = 1496 (915 men and 581 wmen); 2. With cancer history: n = 1461 (411 men and 1050 women); 3. With myocardial infarction history: n = 2994 (1310 men and 1684 women); 4. With ischemic heart diseases history: n = 186 (91 men, 95 women); 5. With other heart diseases history (ICD9 codes: 420-429): n = 518 (204 men and 314 women); 6. Did not answer the question about milk intake frequency: n = 9545 (3593 men and 5952 women); 7. Finally, n = 94385 (39386 men and 54999 women) are left in the data ] --- class: middle # N of events: - Total Stroke mortality confirmed: - 2675 (1352 men and 1323 women) - Subtypes of stroke: - Ischemic stroke: <br>957 (520 men and 437 women) - Hemorrhagic stroke: <br>952 (432 men and 520 women) --- class: right background-image: url("fig/KMfigMen.png") background-position: 60% 50% background-size: contain ## Men --- class: right background-image: url("fig/KMfigwomen.png") background-position: 60% 50% background-size: contain ## Women --- class: middle, center # Baseline characteristics (please see the handout Table 1 & 2) (skipped in the talk) --- class: middle, # Traditional PH model results (please see the handout Table 3) .med[ 1. Total stroke: <br> only daily drinkers in men had somewhat lower hazard ratio (0.87, 95%CI: 0.76, 1.01) 2. Hemorrhagic stroke: <br> no apparent association for men/women 3. Ischemic stroke: <br> men had milk intake more than 1-2 t/w had **"significantly"** lower hazard compared with non-drinkers ] --- class: middle, center # Do the results really answer what we are interested in? --- class: middle ## Interpretation of the P value: - The earth is round (p < 0.05). .pull-left[ <img src="fig/pvalueyay.jpg" width="85%" /> ] .pull-right[ <img src="fig/pvalue.png" width="95%" /> ] ??? The p-value is the probability of the observed outcome plus all “more extreme” outcomes under the assumption that the experiment can be repeated for infinite times. --- class: middle, inverse background-image: url("fig/CI.gif") background-position: 50% 50% background-size: contain --- class: middle ## Assumptions: 1. The experiment can be repeated. 2. There is a true value of interest (HR) that does not move. Both are **unrealistic** for a huge cohort study like the JACC study, and may not help in answering the question that we are really interested in. --- class: middle ## The real research questions are: - What is **the probability** that people who drink milk everyday (or other frequency) had lower hazard of dying from stroke compared with people who don't ? - If someone does not understand the concept of "hazard", then, the question should be changed into: <br> do people drink milk everyday (or other frequency) have **longer survival time/slower rate of dying** that is free from stroke mortality risk? --- class: middle ## Bayesian parametric survival model can help us: - Weibull model is useful because it can be expressed as both - accelerated failure time (AFT) model - proportional hazard model (details skipped, please refer to the handout.) --- class: middle ### Codes for the crude Weibull survival model We use the [**BUGS**](https://www.mrc-bsu.cam.ac.uk/software/bugs/) (**B**ayesian inference **U**sing **G**ibbs **S**ampling) language to describe the model: .small[ ```r jacc.weibull.model <- function() { for(i in 1:39386){# loop through the male subjects * is.censored[i] ~ dinterval(t[i], c[i]) t[i] ~ dweib(shape, lambda[i]) lambda[i] <- exp(-mu[i] * shape) * mu[i] <- beta[1] + beta[2] * equals(Mlkfre[i], 2) + * beta[3] * equals(Mlkfre[i], 3) + * beta[4] * equals(Mlkfre[i], 4) + * beta[5] * equals(Mlkfre[i], 5) } ``` ] --- class: middle ### Specify the priors, and generate values .small[ ```r ## priors for betas for(j in 1:5){ beta[j] ~ dnorm(0, 0.001) } # increase the number here when adding covariates ### prior for shape shape ~ dgamma(.001, .001) ### Generated values * AFT[2] <- exp(beta[2]) * HR[2] <- exp(shape * beta[2]) * p.crit[2] <- step(1 - HR[2]) # = 1 if HR[2] < 1 AFT[3] <- exp(beta[3]) HR[3] <- exp(shape * beta[3]) p.crit[3] <- step(1 - HR[3]) AFT[4] <- exp(beta[4]) HR[4] <- exp(shape * beta[4]) p.crit[4] <- step(1 - HR[4]) AFT[5] <- exp(beta[5]) HR[5] <- exp(shape * beta[5]) p.crit[5] <- step(1 - HR[5]) ``` ] --- class: middle ### Run the model (the crude one took me about 1 hour...) .small[ ```r library(R2jags) #conect to jags sampling engine in R jagsfit <- jags.parallel( data = JACCdata, parameters.to.save = c("beta[5]", "HR[5]", "AFT[5]"), #add more variables if you want to check more * n.iter = 10000, * n.burnin = 2500, * n.chains = 3, model.file = jacc.weibull.model) ``` ] --- class: middle, center, inverse # Output and model checking --- class: middle ### Direct output:(1) .small[ ```r # Inference for Bugs model at "jacc.weibull.model", fit using jags, # 3 chains, each with 10000 iterations (first 2500 discarded), n.thin = 7 # n.sims = 3213 iterations saved. Sample size per chain = 1071 # 1. Empirical mean and standard deviation for each variable, # plus standard error of the mean: # Mean SD Naive SE Time-series SE # AFT[2] 0.922303 0.064441 0.00113685 0.00420403 # AFT[3] 0.834723 0.051485 0.00090830 0.00362857 # AFT[4] 0.848329 0.052439 0.00092513 0.00401102 # AFT[5] 0.928095 0.041834 0.00073803 0.00336613 # HR[2] 0.890533 0.090586 0.00159811 0.00604246 # HR[3] 0.769913 0.069116 0.00121934 0.00486858 # HR[4] 0.788205 0.070624 0.00124594 0.00531505 *# HR[5] 0.897882 0.058477 0.00103164 0.00468654 *# p.crit[2] 0.871460 0.334743 0.00590548 0.01814848 *# p.crit[3] 0.996265 0.061008 0.00107630 0.00214585 *# p.crit[4] 0.998133 0.043180 0.00076177 0.00097572 *# p.crit[5] 0.953626 0.210327 0.00371056 0.02050164 ``` ] --- class: middle ### Direct output (2) .small[ ```r # 2. Quantiles for each variable: # 2.5% 25% 50% 75% 97.5% # AFT[2] 0.807308 0.876649 0.918049 0.96495 1.05225 # AFT[3] 0.738423 0.799925 0.832078 0.86719 0.94214 # AFT[4] 0.749645 0.810952 0.847050 0.88448 0.95291 # AFT[5] 0.847796 0.899999 0.927850 0.95594 1.01147 # HR[2] 0.733186 0.825541 0.883106 0.94896 1.07725 # HR[3] 0.644053 0.721892 0.765778 0.81289 0.91643 # HR[4] 0.658787 0.737562 0.785538 0.83614 0.93282 # HR[5] 0.788623 0.857453 0.896668 0.93613 1.01665 ``` ] --- class: middle, inverse background-image: url("fig/traceplot.png") background-position: 50% 50% background-size: contain --- class: middle, inverse background-image: url("fig/autocor.png") background-position: 50% 50% background-size: contain --- class: middle ### (Crude) hazard ratios and acceleration factors <table class="table" style="font-size: 17px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;"></caption> <thead> <tr> <th style="border-bottom:hidden" colspan="1"></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="5"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Hazard ratio (HR)</div></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Acceleration factor (AF)</div></th> </tr> <tr> <th style="text-align:left;"> Milk intake </th> <th style="text-align:left;"> Median </th> <th style="text-align:left;"> Mean (SD) </th> <th style="text-align:left;"> 95% CrI </th> <th style="text-align:left;"> MCSE </th> <th style="text-align:left;"> Probability </th> <th style="text-align:left;"> Median </th> <th style="text-align:left;"> Mean (SD) </th> <th style="text-align:left;"> 95% CrI </th> <th style="text-align:left;"> MCSE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Never </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> </tr> <tr> <td style="text-align:left;"> 1-2 t/Mon </td> <td style="text-align:left;"> 0.88 </td> <td style="text-align:left;"> 0.89 (0.09) </td> <td style="text-align:left;"> (0.73, 1.08) </td> <td style="text-align:left;"> 0.0016 </td> <td style="text-align:left;"> 87.15% </td> <td style="text-align:left;"> 0.92 </td> <td style="text-align:left;"> 0.92 (0.06) </td> <td style="text-align:left;"> (0.81, 1.06) </td> <td style="text-align:left;"> 0.0011 </td> </tr> <tr> <td style="text-align:left;"> 1-2 t/Week </td> <td style="text-align:left;"> 0.77 </td> <td style="text-align:left;"> 0.77 (0.07) </td> <td style="text-align:left;"> (0.64, 0.92) </td> <td style="text-align:left;"> 0.0012 </td> <td style="text-align:left;"> 99.63% </td> <td style="text-align:left;"> 0.83 </td> <td style="text-align:left;"> 0.83 (0.05) </td> <td style="text-align:left;"> (0.74, 0.94) </td> <td style="text-align:left;"> 0.0009 </td> </tr> <tr> <td style="text-align:left;"> 3-4 t/Week </td> <td style="text-align:left;"> 0.79 </td> <td style="text-align:left;"> 0.79 (0.07) </td> <td style="text-align:left;"> (0.66, 0.93) </td> <td style="text-align:left;"> 0.0012 </td> <td style="text-align:left;"> 99.81% </td> <td style="text-align:left;"> 0.85 </td> <td style="text-align:left;"> 0.85 (0.05) </td> <td style="text-align:left;"> (0.75, 0.95) </td> <td style="text-align:left;"> 0.0009 </td> </tr> <tr> <td style="text-align:left;"> Daily </td> <td style="text-align:left;"> 0.89 </td> <td style="text-align:left;"> 0.90 (0.06) </td> <td style="text-align:left;"> (0.79, 1.02) </td> <td style="text-align:left;"> 0.001 </td> <td style="text-align:left;"> 95.36% </td> <td style="text-align:left;"> 0.93 </td> <td style="text-align:left;"> 0.93 (0.04) </td> <td style="text-align:left;"> (0.85, 1.01) </td> <td style="text-align:left;"> 0.0007 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; border: 0;" colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; border: 0;" colspan="100%"> <sup></sup> Abbreviations: SD, standard deviation; CrI, credible interval; MCSE, Monte Carlo Standard</td></tr> <tr><td style="padding: 0; border: 0;" colspan="100%"> <sup></sup> Error; Probability indicates that the p for HR smaller than 1.</td></tr> </tfoot> </table> --- class: middle, inverse background-image: url("fig/CrudeHRdist.png") background-position: 50% 50% background-size: cover --- class: middle ### (Age-adjusted) hazard ratios and acceleration factors <table class="table" style="font-size: 17px; margin-left: auto; margin-right: auto;"> <caption style="font-size: initial !important;"></caption> <thead> <tr> <th style="border-bottom:hidden" colspan="1"></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="5"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Hazard ratio (HR)</div></th> <th style="border-bottom:hidden; padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="4"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">Acceleration factor (AF)</div></th> </tr> <tr> <th style="text-align:left;"> Milk intake </th> <th style="text-align:left;"> Median </th> <th style="text-align:left;"> Mean (SD) </th> <th style="text-align:left;"> 95% CrI </th> <th style="text-align:left;"> MCSE </th> <th style="text-align:left;"> Probability </th> <th style="text-align:left;"> Median </th> <th style="text-align:left;"> Mean (SD) </th> <th style="text-align:left;"> 95% CrI </th> <th style="text-align:left;"> MCSE </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Never </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> <td style="text-align:left;"> - </td> </tr> <tr> <td style="text-align:left;"> 1-2 t/Mon </td> <td style="text-align:left;"> 0.98 </td> <td style="text-align:left;"> 0.98 (0.11) </td> <td style="text-align:left;"> (0.78, 1.19) </td> <td style="text-align:left;"> 0.0018 </td> <td style="text-align:left;"> 57.86% </td> <td style="text-align:left;"> 0.98 </td> <td style="text-align:left;"> 0.99 (0.06) </td> <td style="text-align:left;"> (0.87, 1.11) </td> <td style="text-align:left;"> 0.0011 </td> </tr> <tr> <td style="text-align:left;"> 1-2 t/Week </td> <td style="text-align:left;"> 0.84 </td> <td style="text-align:left;"> 0.84 (0.08) </td> <td style="text-align:left;"> (0.69, 0.99) </td> <td style="text-align:left;"> 0.0014 </td> <td style="text-align:left;"> 97.79% </td> <td style="text-align:left;"> 0.9 </td> <td style="text-align:left;"> 0.90 (0.05) </td> <td style="text-align:left;"> (0.80, 0.99) </td> <td style="text-align:left;"> 0.0008 </td> </tr> <tr> <td style="text-align:left;"> 3-4 t/Week </td> <td style="text-align:left;"> 0.86 </td> <td style="text-align:left;"> 0.86 (0.08) </td> <td style="text-align:left;"> (0.71, 1.02) </td> <td style="text-align:left;"> 0.0014 </td> <td style="text-align:left;"> 95.92% </td> <td style="text-align:left;"> 0.91 </td> <td style="text-align:left;"> 0.91 (0.05) </td> <td style="text-align:left;"> (0.82, 1.01) </td> <td style="text-align:left;"> 0.0009 </td> </tr> <tr> <td style="text-align:left;"> Daily </td> <td style="text-align:left;"> 0.75 </td> <td style="text-align:left;"> 0.75 (0.05) </td> <td style="text-align:left;"> (0.67, 0.85) </td> <td style="text-align:left;"> 0.0009 </td> <td style="text-align:left;"> 100.00% </td> <td style="text-align:left;"> 0.85 </td> <td style="text-align:left;"> 0.85 (0.03) </td> <td style="text-align:left;"> (0.79, 0.91) </td> <td style="text-align:left;"> 0.006 </td> </tr> </tbody> <tfoot> <tr><td style="padding: 0; border: 0;" colspan="100%"><span style="font-style: italic;">Note: </span></td></tr> <tr><td style="padding: 0; border: 0;" colspan="100%"> <sup></sup> Abbreviations: SD, standard deviation; CrI, credible interval; MCSE, Monte Carlo Standard</td></tr> <tr><td style="padding: 0; border: 0;" colspan="100%"> <sup></sup> Error; Probability indicates that the p for HR smaller than 1.</td></tr> </tfoot> </table> --- class: middle, inverse background-image: url("fig/AgeHRdist.png") background-position: 50% 50% background-size: cover --- class: middle, center # Unfortunately, the other results are still under calculation.... The age adjusted model took about 3 hours. The multivariable adjusted model has been runing for about 10 hours, and still running. -- I really need a faster computer. --- class: middle, center # The estimated HRs are mostly similar to the results from Cox models. --- class: middle, center # And we found a way to answer the real research questions in our mind. --- class: middle, center # Thanks and enjoy