class: center, middle, inverse, title-slide .title[ # Introduction and examples of Bayesian Statistical Modelling ] .subtitle[ ## for RWD Research Group ] .author[ ### 王 超辰 ] .institute[ ### Chaochen Wang ] .date[ ### 2021-8-30 ] --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Everybody is a Bayesian. It's just that some know it, and some don't. .right[.font140[[.black[Trivellore Raghunathan]](http://www-personal.umich.edu/~teraghu/)]] ] .bold[.font130[ 1. Introduction to Bayesian way of thinking 1. Bayes' Rule 1. Binomial Bayesian models 1. Simple linear regression models <!-- 1. Poission regression models --> 1. Logistic regression models 1. Multilevel (mixed-effect) models 1. Survival analysis with Cox proportional hazard models 1. Further issues ] ] --- class: # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> If I’m doing an experiment to save the world, I better use my prior. .right[.font140[[.black[Andrew Gelman (1965-)]](https://stat.columbia.edu/~gelman/)]] ] .bold[.font130[ 1. Introduction to Bayesian way of thinking 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- class: background-image: url("fig/restaurant_diagram.png") background-position: 50% 80% background-size: contain # Our evolving knowledge --- class: background-image: url("fig/bayes_diagram.png") background-position: 50% 80% background-size: contain # Bayesian knowledge building --- class: background-image: url("fig/our_bayes_diagram.png") background-position: 50% 50% background-size: contain # 2 persons (studies) Bayesian knowledge --- # Bayesian vs. Frequentist - probability .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Bayesian and frequentist analysis share a common goal: **to learn from data about the world around us**. .right[.font120[[.black[Mine Dogucu]](https://mdogucu.ics.uci.edu/index.html)]] ] ## Interpreting probability: -- .font130[.bold[ - In the Bayesian philosophy, a probability measures the **relative plausibility/degree of belief** of an event. ]] -- .font130[.bold[ - The frequentist philosophy is so named for its interpretation of probability as **the long-run relative frequency of a repeatable event**. ]] --- # The earth is round (p < 0.05). .pull-left[ <img src="fig/pvalueyay.jpg" width="85%" /> ] -- .pull-right[ <img src="fig/pvalue.png" width="95%" /> ] --- class: background-image: url("fig/pvalue.jpg") background-position: 50% 80% background-size: contain # Frequentist interpretation of P-value --- # Bayesian vs. Frequentist - Questions .blockquote[ ### <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> Asking questions when conducting a (hypothesis) test: ] <style type="text/css"> .tg {border-collapse:collapse;border-color:#ccc;border-spacing:0;} .tg td{background-color:#fff;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#f0f0f0;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-1wig{font-weight:bold;text-align:left;vertical-align:top} .tg .tg-s2ls{background-color:#efefef;border-color:#efefef;color:#000000;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-kub8{background-color:#eb811b;border-color:#efefef;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-gn8w{background-color:#eb811b;border-color:#c0c0c0;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-amwm{font-weight:bold;text-align:center;vertical-align:top} .tg .tg-z0ua{background-color:#efefef;border-color:#efefef;color:#000000;font-weight:bold;text-align:left;vertical-align:top} </style> <table class="tg"> <thead> <tr> <th class="tg-1wig"></th> <th class="tg-gn8w">Test positve</th> <th class="tg-amwm">Test negative</th> <th class="tg-amwm">Total</th> </tr> </thead> <tbody> <tr> <td class="tg-1wig">Disease</td> <td class="tg-gn8w">3</td> <td class="tg-amwm">1</td> <td class="tg-amwm">4</td> </tr> <tr> <td class="tg-z0ua">No Disease</td> <td class="tg-kub8">9</td> <td class="tg-s2ls">87</td> <td class="tg-s2ls">96</td> </tr> <tr> <td class="tg-1wig">Total</td> <td class="tg-gn8w">12</td> <td class="tg-amwm">88</td> <td class="tg-amwm">100</td> </tr> </tbody> </table> -- .font110[.bold[ - A Bayesian analysis wants to answer: with the observed data, <br>**what is the probability/chance that the hypothesis is correct?** - Given my positive test result, what is the chance that I actually have the disease? `\((3/12 = 25\%)\)` ]] --- # Bayesian vs. Frequentist - Questions .blockquote[ ### <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> Asking questions when conducting a (hypothesis) test: ] <style type="text/css"> .tg {border-collapse:collapse;border-color:#ccc;border-spacing:0;} .tg td{background-color:#fff;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#f0f0f0;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-v0zy{background-color:#efefef;color:#000000;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-6jtc{background-color:#ffffff;color:#330001;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-ldzw{background-color:#efefef;color:#000000;font-weight:bold;text-align:left;vertical-align:top} .tg .tg-txl9{background-color:#efefef;color:#330001;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-o42t{background-color:#efefef;color:#330001;font-weight:bold;text-align:left;vertical-align:top} .tg .tg-gn8w{background-color:#eb811b;border-color:#c0c0c0;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top} .tg .tg-qyxu{background-color:#ffffff;color:#330001;font-weight:bold;text-align:left;vertical-align:top} .tg .tg-ifxt{background-color:#23373b;border-color:#efefef;color:#ffffff;font-weight:bold;text-align:left;vertical-align:top} .tg .tg-ztbz{background-color:#23373b;border-color:#efefef;color:#ffffff;font-weight:bold;text-align:center;vertical-align:top} </style> <table class="tg"> <thead> <tr> <th class="tg-o42t"></th> <th class="tg-gn8w">Test positve</th> <th class="tg-txl9">Test negative</th> <th class="tg-txl9">Total</th> </tr> </thead> <tbody> <tr> <td class="tg-qyxu">Disease</td> <td class="tg-gn8w">3</td> <td class="tg-6jtc">1</td> <td class="tg-6jtc">4</td> </tr> <tr> <td class="tg-ifxt">No Disease</td> <td class="tg-ztbz">9</td> <td class="tg-ztbz">87</td> <td class="tg-ztbz">96</td> </tr> <tr> <td class="tg-ldzw">Total</td> <td class="tg-gn8w">12</td> <td class="tg-v0zy">88</td> <td class="tg-v0zy">100</td> </tr> </tbody> </table> .font110[.bold[ - A Frequentist analysis wants to answer: if the hypothesis is incorrect, <br>**what is the probability/chance that I observe this data, or even more extreme data?** <br> - If I don't have the disease, what is the chance that I would've tested positive? `\((9/96 \approx 10\%)\)` ]] --- # Bayesian vs. Frequentist - Questions .blockquote[ ### <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> Asking questions when conducting a (hypothesis) test: ] <style type="text/css"> .tg {border-collapse:collapse;border-color:#93a1a1;border-spacing:0;} .tg td{background-color:#fdf6e3;border-color:#93a1a1;border-style:solid;border-width:1px;color:#002b36; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#657b83;border-color:#93a1a1;border-style:solid;border-width:1px;color:#fdf6e3; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-2mf4{background-color:#23373b;color:#fdf6e3;text-align:left;vertical-align:top} .tg .tg-baqh{text-align:center;vertical-align:top} .tg .tg-wj0u{background-color:#23373b;color:#fdf6e3;text-align:center;vertical-align:top} .tg .tg-0lax{text-align:left;vertical-align:top} </style> <table class="tg"> <thead> <tr> <th class="tg-2mf4"></th> <th class="tg-wj0u"><span style="font-weight:bold">Test positve</span></th> <th class="tg-wj0u"><span style="font-weight:bold">Test negative</span></th> <th class="tg-wj0u"><span style="font-weight:bold">Total</span></th> </tr> </thead> <tbody> <tr> <td class="tg-0lax"><span style="font-weight:bold;color:#333">Disease</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">3</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">1</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">4</span></td> </tr> <tr> <td class="tg-0lax"><span style="font-weight:bold;color:#333">No Disease</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">9</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">87</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">96</span></td> </tr> <tr> <td class="tg-0lax"><span style="font-weight:bold;color:#333">Total</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">12</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">88</span></td> <td class="tg-baqh"><span style="font-weight:bold;color:#333">100</span></td> </tr> </tbody> </table> .font100[.bold[ - A Bayesian analysis wants to answer: with the observed data, <br>**what is the probability/chance that the hypothesis is correct?** - Given my positive test result, what is the chance that I actually have the disease? `\((3/12 = 25\%)\)` ]] .font100[.bold[ - A Frequentist analysis wants to answer: if the hypothesis is incorrect, <br>**what is the probability/chance that I observe this data, or even more extreme data?** <br> - If I don't have the disease, what is the chance that I would've tested positive? `\((9/96 \approx 10\%)\)` ]] --- class: # Bayesian vs. **Frequentist** - xx% intervals .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> If you, **hypothetically**, repeat the procedure/experiment a large number of times, the confidence interval should contain the true mean with a certain probability. But only God knows the true parameter, it is unknowable to human. ] <img src="fig/confidence-interval-simulation.gif" width="95%" style="display: block; margin: auto;" /> ??? If you, hypothetically, repeat the procedure a large number of times, the confidence interval should contain the true mean with a certain probability. --- class: background-image: url("fig/CI.gif") background-position: 50% 80% background-size: contain # Bayesian vs. **Frequentist** - xx% intervals --- class: clear, inverse <blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">Hey, <a href="https://twitter.com/hashtag/epitwitter?src=hash&ref_src=twsrc%5Etfw">#epitwitter</a> allow me to introduce the <a href="https://twitter.com/hashtag/epigif?src=hash&ref_src=twsrc%5Etfw">#epigif</a>! A whole new level of <a href="https://twitter.com/hashtag/epicartoon?src=hash&ref_src=twsrc%5Etfw">#epicartoon</a> to inform and delight 😆 <a href="https://t.co/gl2gGPlXa9">pic.twitter.com/gl2gGPlXa9</a></p>— Dr Ellie Murray, ScD (@EpiEllie) <a href="https://twitter.com/EpiEllie/status/1160395672706789376?ref_src=twsrc%5Etfw">August 11, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: # **Bayesian** vs. Frequentist - xx% intervals .blockquote[ #### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Based on (Given) the data, we are 90% certain that the interval contains the parameter value. - Either case is a valid 90% credible interval, because each enclosed 90% probability mass (plausibility). Which do you think is more useful? ] .pull-left[ <img src="fig/credibleInterval0.png" width="100%" /> ] -- .pull-right[ <img src="fig/credibleInterval2.png" width="100%" /> ] --- class: # **Bayesian** vs. Frequentist - xx% intervals .blockquote[ #### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Based on (Given) the data, we are 95% certain that the interval contains the parameter value. **HPD (highest posterior density) interval**, also known as an HDI. The HDI contains those x-values that have highest believability ] <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="85%" style="display: block; margin: auto;" /> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> The most newsworthy scientific studies are the least trustworthy. Maybe popular topics attract more and worse researchers, like flies drawn to the smell of honey? .right[.font120[[.black[Richard McElreath (1973-)]](https://xcelab.net/rm/)]] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. Bayes' Rule 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- # Bayes' Rule .blockquote[ .font120[ $$ `\begin{aligned} p(A,B) = p(A|B)\color{red}{p(B)} & = p(B|A)p(A) \\ p(A|B) & = \frac{p(B|A)p(A)}{\color{red}{p(B)}} \\ \Rightarrow p(\text{parameter} | \text{data}) & = \frac{p(\text{data|parameter}) \times p(\text{parameter}) }{\color{blue}{p(\text{data})}} \\ \Rightarrow \text{Posterior} & = \frac{\text{likelihood} \times \text{Prior}}{\color{blue}{\text{Average probability of the data}}} \end{aligned}` $$ - `\(\color{blue}{\text{Average probability of the data}}\)` also called `\(\color{blue}{\text{marginal likelihood}}\)` or `\(\color{blue}{\text{normalizing constant}}\)`, according to [the Law of Total Probability](https://www.probabilitycourse.com/chapter1/1_4_2_total_probability.phps): $$ `\begin{aligned} \color{blue}{p(\text{data})} & = p(\text{data|parameter}_1)p(\text{parameter}_1) + \cdots \\ & \;\; + p(\text{data|parameter}_n)p(\text{parameter}_n)\\ & = \int p(\text{data|parameter}) p(\text{parameter}) \;dp(\text{parameter}) \\ \end{aligned}` $$ .right[Thomas Bayes (1701–1761)] ] ] --- # Bayes' Rule Example: Pop vs soda vs coke .blockquote[.font120[ <svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M368 96h-48V56c0-13.255-10.745-24-24-24H24C10.745 32 0 42.745 0 56v400c0 13.255 10.745 24 24 24h272c13.255 0 24-10.745 24-24v-42.11l80.606-35.977C429.396 365.063 448 336.388 448 304.86V176c0-44.112-35.888-80-80-80zm16 208.86a16.018 16.018 0 0 1-9.479 14.611L320 343.805V160h48c8.822 0 16 7.178 16 16v128.86zM208 384c-8.836 0-16-7.164-16-16V144c0-8.836 7.164-16 16-16s16 7.164 16 16v224c0 8.836-7.164 16-16 16zm-96 0c-8.836 0-16-7.164-16-16V144c0-8.836 7.164-16 16-16s16 7.164 16 16v224c0 8.836-7.164 16-16 16z"></path></svg> The U.S. Census has published the population proportions for different regions as<sup>1</sup>: (these are our prior probabilities) <style type="text/css"> .tg {border-collapse:collapse;border-color:#ccc;border-spacing:0;} .tg td{background-color:#fff;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#f0f0f0;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-0lax{text-align:left;vertical-align:top} </style> <table class="tg"> <thead> <tr> <th class="tg-0lax"><span style="font-weight:normal">Region</span></th> <th class="tg-0lax"><span style="font-weight:normal">Population</span></th> <th class="tg-0lax"><span style="font-weight:normal">Percentage</span></th> </tr> </thead> <tbody> <tr> <td class="tg-0lax">(N)ortheast</td> <td class="tg-0lax">55,982,803</td> <td class="tg-0lax">17.1%</td> </tr> <tr> <td class="tg-0lax">(M)idwest</td> <td class="tg-0lax">68,329,004</td> <td class="tg-0lax">20.8%</td> </tr> <tr> <td class="tg-0lax">(W)est</td> <td class="tg-0lax">78,347,268</td> <td class="tg-0lax">23.9%</td> </tr> <tr> <td class="tg-0lax">(S)outh</td> <td class="tg-0lax">125,580,448</td> <td class="tg-0lax">38.3%</td> </tr> </tbody> </table> The dataset of 374250 people's use of "pop", "soda" or "coke" from a survey<sup>2</sup> is included in [`bayesrules`](https://github.com/bayes-rules/bayesrules) package. ] ```r install.packages("bayesrules") library(bayesrules); data(pop_vs_soda) ?pop_vs_soda ``` ] .small[ 1. [https://www.census.gov/popclock/data_tables.php?component=growth](https://www.census.gov/popclock/data_tables.php?component=growth) 2. [https://popvssoda.com/](https://popvssoda.com/) ] --- # Bayes' Rule Example: Pop vs soda vs coke ```r library(tidyverse) library(janitor) library(bayesrules); data(pop_vs_soda) # Summarize pop use by region pop_vs_soda %>% tabyl(pop, region) %>% adorn_percentages("col") ``` ``` ## pop midwest northeast south west ## FALSE 0.3552958 0.7266026 0.92077769 0.7057215 ## TRUE 0.6447042 0.2733974 0.07922231 0.2942785 ``` .blockquote[.font120[ Let `\(A\)` denote the event that a person uses "pop", then we will have the following regional likelihoods: `\(p(A|M) = 0.6447;\;\;\; p(A|M) = 0.2734;\;\;\; p(A|S) = 0.0792;\;\;\; p(A|W) = 0.2943\)` ]] ??? For example, 64.47% of people in the Midwest but only 7.92% of people in the South use the term “pop.” --- # Bayes' Rule Example: Pop vs soda vs coke .blockquote[.font100[ <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> Considering that `\(p(S) = 38\%\)` of the population lives in the South, but "pop" is rarely `\((p(A|S) =7.92\%)\)` used in the South. What is the **posterior probability** that the interviewee lives in the South? $$ `\begin{aligned} p(S | A) & = \frac{p(A|S) \times p(S)}{p(A)} \\ & = \frac{p(A|S) \times p(S)}{p(A|M)p(M) + p(A|N)p(N) + p(A|S)p(S) + p(A|W)p(W)} \\ & = \frac{0.0792 \times 0.38}{0.6447 \times 0.21 + 0.2734\times0.17 + 0.0792 \times 0.38 + 0.2943 \times 0.24} \\ & \approx 0.1065 \end{aligned}` $$ Similarly, posterior probability of the interviewee living in the Midwest, Northeast, or West can be updated: ] ] <style type="text/css"> .tg {border-collapse:collapse;border-color:#ccc;border-spacing:0;} .tg td{background-color:#fff;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{background-color:#f0f0f0;border-color:#ccc;border-style:solid;border-width:1px;color:#333; font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-0lax{text-align:left;vertical-align:top} </style> <table class="tg"> <thead> <tr> <th class="tg-0lax"><span style="font-weight:700">region</span></th> <th class="tg-0lax"><span style="font-weight:700">M</span></th> <th class="tg-0lax"><span style="font-weight:700">N</span></th> <th class="tg-0lax"><span style="font-weight:700">S</span></th> <th class="tg-0lax"><span style="font-weight:700">W</span></th> <th class="tg-0lax"><span style="font-weight:700">Total</span></th> </tr> </thead> <tbody> <tr> <td class="tg-0lax">prior probability</td> <td class="tg-0lax">0.21</td> <td class="tg-0lax">0.17</td> <td class="tg-0lax">0.38</td> <td class="tg-0lax">0.24</td> <td class="tg-0lax">1</td> </tr> <tr> <td class="tg-0lax">posterior probability</td> <td class="tg-0lax">0.4791</td> <td class="tg-0lax">0.1645</td> <td class="tg-0lax">0.1065</td> <td class="tg-0lax">0.2499</td> <td class="tg-0lax">1</td> </tr> </tbody> </table> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> How can we live if we don’t change? .right[.font120[[.black[Beyoncé. Lyric from Satellites.]](https://www.youtube.com/watch?v=VJCmq_nMZrw)] ] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. Binomial Bayesian models 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- # Binomial Bayesian models .blockquote[.font110[ <svg version="1.0" viewBox="0 0 574.526 300.548" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <defs> <clipPath clipPathUnits="userSpaceOnUse" id="a"> <path d="M.16 13.992h573.892v300.777H.16z"></path> </clipPath> </defs> <path clip-path="url(#a)" d="M528.36 298.619s-81.75 1.639-118.528-68.918C369.096 151.868 353.026 16.59 292.902 16.59h-1.958c-60.124 0-76.194 135.278-116.93 213.11-37.097 70.558-118.568 68.919-118.568 68.919" transform="matrix(.86256 0 0 .86256 41.29 15.194)" fill="none" stroke="#000" stroke-width="4.15748px" stroke-linecap="round" stroke-linejoin="miter" stroke-miterlimit="8" stroke-dasharray="none" stroke-opacity="1"></path> <path d="m293.487 29.814.138 254.335M227.352 117.88l.138 165.648M165.423 245.599l.138 37.93M103.977 271.598v12.275M359.14 118.155l.138 165.718M421.206 245.943l.138 37.93M482.997 272.08v12.069" fill="none" stroke="#6cf" stroke-width="1.17237px" stroke-linecap="round" stroke-linejoin="miter" stroke-miterlimit="8" stroke-dasharray="none" stroke-opacity="1"></path> <path clip-path="url(#a)" d="m3.318 311.451 567.496.16" transform="matrix(.86256 0 0 .86256 41.29 15.194)" fill="none" stroke="#6cf" stroke-width="3.35796px" stroke-linecap="square" stroke-linejoin="miter" stroke-miterlimit="8" stroke-dasharray="none" stroke-opacity="1"></path></svg> The basic binomial model follows the form: $$ y \sim \text{Binomial}(n, \pi) $$ Where, - `\(y\)` is the count variable ( `\(0\)` or a positive number ); - `\(n\)` is the number of trials; - `\(\pi\)` is the probability a given trial was a success. The prior model for `\(p\)` is a [Beta distribution](https://en.wikipedia.org/wiki/Beta_distribution): $$ \pi \sim \text{Beta}(a, b) $$ Where, - `\(a, b\)` are hyperparameters, determine the shape of the probability density function (pdf) of Beta distribution. ] ] --- # Beta-distributions - various values of a, b <img src="index_files/figure-html/beta-distr-1.png" width="90%" style="display: block; margin: auto;" /> --- # Binomial Bayesian models - Example .blockquote[.font120[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Polling results from difference sources showed that Mr. Trump's support was between 0.2 (CNN) and 0.6 (Fox). - This information can be translated into a Beta distribution by following steps: ]] .font120[.scroll-output[ ```r quantile1 <- list(p = 0.5, x = 0.4) # we believe the median of the prior is 0.4 quantile2 <- list(p = 0.99999, x = 0.6) # the 99.999th percentile of the prior is 0.6 quantile3 <- list(p = 0.00001, x = 0.2) # the 0.001st percentile of the prior is 0.2 source("findbeta.R") # the function to find the best hyperparameters a, b library("LearnBayes") # install by install.packages("LearnBayes") findBeta(quantile1, quantile2, quantile3) ``` ``` ## [1] "The best beta prior has a= 38.4510810810811 b= 60.7559159159159" ``` ]] --- # `findbeta.R` .scroll-output[ ```r findBeta <- function(quantile1,quantile2,quantile3) { # find the quantiles specified by quantile1 and quantile2 and quantile3 quantile1_p <- quantile1[[1]]; quantile1_q <- quantile1[[2]] quantile2_p <- quantile2[[1]]; quantile2_q <- quantile2[[2]] quantile3_p <- quantile3[[1]]; quantile3_q <- quantile3[[2]] # find the beta prior using quantile1 and quantile2 priorA <- beta.select(quantile1,quantile2) priorA_a <- priorA[1]; priorA_b <- priorA[2] # find the beta prior using quantile1 and quantile3 priorB <- beta.select(quantile1,quantile3) priorB_a <- priorB[1]; priorB_b <- priorB[2] # find the best possible beta prior diff_a <- abs(priorA_a - priorB_a); diff_b <- abs(priorB_b - priorB_b) step_a <- diff_a / 100; step_b <- diff_b / 100 if (priorA_a < priorB_a) { start_a <- priorA_a; end_a <- priorB_a } else { start_a <- priorB_a; end_a <- priorA_a } if (priorA_b < priorB_b) { start_b <- priorA_b; end_b <- priorB_b } else { start_b <- priorB_b; end_b <- priorA_b } steps_a <- seq(from=start_a, to=end_a, length.out=1000) steps_b <- seq(from=start_b, to=end_b, length.out=1000) max_error <- 10000000000000000000 best_a <- 0; best_b <- 0 for (a in steps_a) { for (b in steps_b) { # priorC is beta(a,b) # find the quantile1_q, quantile2_q, quantile3_q quantiles of priorC: priorC_q1 <- qbeta(c(quantile1_p), a, b) priorC_q2 <- qbeta(c(quantile2_p), a, b) priorC_q3 <- qbeta(c(quantile3_p), a, b) priorC_error <- abs(priorC_q1-quantile1_q) + abs(priorC_q2-quantile2_q) + abs(priorC_q3-quantile3_q) if (priorC_error < max_error) { max_error <- priorC_error; best_a <- a; best_b <- b } } } print(paste("The best beta prior has a=",best_a,"b=",best_b)) } ``` ] --- # Plot beta(38.5, 60.8) ```r pi <- Vectorize(function(theta) dbeta(theta, 38.5,60.8)) curve(pi, xlab =~ pi, ylab = "Density", main = "Beta prior: a = 38.5, b = 60.8", frame = FALSE, lwd = 2) ``` <img src="index_files/figure-html/betaTrump-1.png" width="90%" style="display: block; margin: auto;" /> --- # Different `\(\pi \Rightarrow\)` different n of supporters <img src="index_files/figure-html/betaTrumppred-1.png" width="100%" style="display: block; margin: auto;" /> ??? - if Trump’s support π were low (top row), the polling result Y is also likely to be low. If her support were high (bottom row), Y is also likely to be high. --- # Binomial models - likelihood function .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> We conducted a new poll of `\(n = 100\)` Americans, and we found `\(Y = 60\)`, the number of Trump supporters. The likelihood function for this model will be: $$ `\begin{aligned} Y|\pi & \sim \text{Binomial}(100, \pi) \\ L(\pi | y = 60) & = {100 \choose 60}\pi^{60} (1-\pi)^{40} \; \text{for } \pi \in [0, 1] \end{aligned}` $$ ] <img src="index_files/figure-html/betaTrumplikelihood-1.png" width="90%" style="display: block; margin: auto;" /> --- # Binomial models - posterior model .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> We now have two pieces of information: $$ `\begin{aligned} Y = 60 | \pi & \sim \text{Binomial}(100, \pi) & \text{[likelihood]} \\ \pi & \sim \text{Beta}(38.5, 60.8) & \text{[Prior]} \\ \end{aligned}` $$ Beta prior with Binomial likelihood also gets Beta posterior - [conjugate](https://www.bayesrulesbook.com/chapter-3.html#ch3-bbmodel): $$ `\begin{aligned} p(\pi | Y = 60) & \sim \text{Beta}(60 + 38.5 = 98.5, 100+60.8-60 = 100.8) & \text{[Posterior]}\\ \end{aligned}` $$ ] .scroll-box-8[ ```r # a, b: shape / super parameters MeanBeta <- function(a,b) a/(a+b) ModeBeta <- function(a,b) { m <- ifelse(a>1 & b>1, (a-1)/(a+b-2), NA) return(m) } VarianceBeta <- function(a,b) (a*b)/((a+b)^2*(a+b+1)) # binbayes function in R #------------------------------ # a, b: shape / super parameters # r : number of successes # n : number of trials binbayes <- function(a, b, r, n) { prior <- c(a, b, NA, MeanBeta(a,b), sqrt(VarianceBeta(a, b)), qbeta(0.025, a, b), qbeta(0.5, a, b), qbeta(0.975, a, b)) posterior <- c(a+r, b+n-r, r/n, MeanBeta(a+r, b+n-r), sqrt(VarianceBeta(a+r, b+n-r)), qbeta(0.025, a+r, b+n-r), qbeta(0.5, a+r, b+n-r), qbeta(0.975, a+r, b+n-r)) out <- rbind(prior, posterior) out <- round(out, 4) colnames(out) <- c("a","b","r/n","Mean", "SD", "2.5%", "50%", "97.5%") return(out) } ``` ] ```r # a=38.5, b=60.8, r=60, n=100 binbayes(38.5, 60.8, 60, 100) ``` ``` ## a b r/n Mean SD 2.5% 50% 97.5% ## prior 38.5 60.8 NA 0.3877 0.0486 0.2947 0.3870 0.4850 ## posterior 98.5 100.8 0.6 0.4942 0.0353 0.4251 0.4942 0.5634 ``` --- # Beta-binomial with `brms` .scroll-box-8[ ```r library(brms) b01 <- brm(data = list(w = 60), family = binomial(link = "identity"), w | trials(100) ~ 0 + Intercept, # this is the prior distribution prior(beta(38.5, 60.8), class = b, lb = 0, ub = 1), iter = 5000, warmup = 1000, seed = 3 ) saveRDS(b01, "Stanfiles/b01.rds") ``` ] ```r b01 <- readRDS("Stanfiles/b01.rds") summary(b01) ``` ``` ## Family: binomial ## Links: mu = identity ## Formula: w | trials(100) ~ 0 + Intercept ## Data: list(w = 60) (Number of observations: 1) ## Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1; ## total post-warmup draws = 16000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 0.49 0.04 0.42 0.56 1.00 6174 6151 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` --- # Prior, likelihood, and posterior all in one ```r bayesrules::plot_beta_binomial(alpha = 38.5, beta = 60.8, y = 60, n = 100, prior = TRUE, #the default likelihood = TRUE, #the default posterior = TRUE) + #the default theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line = element_line(colour = "black")) ``` <img src="index_files/figure-html/betaTrumpbinomial-1.png" width="90%" style="display: block; margin: auto;" /> --- class: background-image: url("fig/Fraud.jpg") background-position: 50% 80% background-size: contain # But we all know what happened later <svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <circle cx="12" cy="12" r="10"></circle> <path d="M8 14s1.5 2 4 2 4-2 4-2"></path> <line x1="9" y1="9" x2="9.01" y2="9"></line> <line x1="15" y1="9" x2="15.01" y2="9"></line></svg> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> All models are wrong, but some are useful. .right[.font120[[.black[George E. P. Box (1919-2013)]](https://en.wikipedia.org/wiki/George_E._P._Box)] ] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. Simple linear regression models <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- # Simple linear regression models .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> A partial census data for an area recorded people's height in the `rethinking` package: ```r library(rethinking) ``` ```r data("Howell1") d <- Howell1 precis(d) ``` ``` ## mean sd 5.5% 94.5% histogram ## height 138.2635963 27.6024476 81.108550 165.73500 ▁▁▁▁▁▁▁▂▁▇▇▅▁ ## weight 35.6106176 14.7191782 9.360721 54.50289 ▁▂▃▂▂▂▂▅▇▇▃▂▁ ## age 29.3443934 20.7468882 1.000000 66.13500 ▇▅▅▃▅▂▂▁▁ ## male 0.4724265 0.4996986 0.000000 1.00000 ▇▁▁▁▁▁▁▁▁▇ ``` we only keep individuals of age 18 or older ```r d2 <- d %>% filter(age >= 18) ``` ] ``` ## Loading required package: Rcpp ``` ``` ## Loading 'brms' package (version 2.17.0). Useful instructions ## can be found by typing help('brms'). A more detailed introduction ## to the package is available through vignette('brms_overview'). ``` ``` ## ## Attaching package: 'brms' ``` ``` ## The following object is masked from 'package:stats': ## ## ar ``` --- # Simple linear regression models .blockquote[.font120[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Let's write down the model for height `\((h_i)\)`: $$ `\begin{aligned} \color{red}{h_i} & \color{red}{\sim \text{Normal}(\mu, \sigma)} & \color{red}{\text{[Likelihood]}} \\ \mu & \sim \text{Normal}(178, 20) & [\text{Prior for } \mu] \\ \sigma & \sim \text{Uniform}(0, 50) & [\text{Prior for } \sigma] \end{aligned}` $$ ]] <img src="index_files/figure-html/dnorm178-1.png" width="90%" style="display: block; margin: auto;" /> --- # Simple linear regression models with `brms` ```r b_lm1 <- brm( data = d2, family = gaussian, height ~ 1, prior = c(prior(normal(178, 20), class = Intercept), prior(uniform(0, 50), class = sigma)), iter = 31000, warmup = 30000, chains = 4, cores = 4, seed = 4) saveRDS(b_lm1, "Stanfiles/b_lm1.rds") ``` .scroll-box-12[ ```r b_lm1 <- readRDS("Stanfiles/b_lm1.rds") summary(b_lm1) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: height ~ 1 ## Data: d2 (Number of observations: 352) ## Draws: 4 chains, each with iter = 31000; warmup = 30000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 154.62 0.41 153.84 155.44 1.00 3795 2957 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 7.75 0.29 7.19 8.36 1.00 3694 2807 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Posterior distribution, inspect the chains <img src="index_files/figure-html/plotblm1-1.png" width="120%" style="display: block; margin: auto;" /> --- # Linear regression with predictor <img src="index_files/figure-html/plotb2-1.png" width="120%" style="display: block; margin: auto;" /> --- # Linear regression with predictor .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Let's write down the model for weight `\(w_i\)` predicting height `\((h_i)\)`: $$ `\begin{aligned} \color{red}{h_i} & \color{red}{\sim \text{Normal}(\mu_i, \sigma)} & \color{red}{\text{[Likelihood]}} \\ \color{blue}{\mu_i} & \color{blue}{= \alpha + \beta(x_i - \bar{x})} & \color{blue}{\text{[Linear model]}} \\ \alpha & \sim \text{Normal}(178, 20) & [\text{Prior for } \alpha] \\ \color{green}{\beta} & \color{green}{\sim \text{Normal}(0, 10}) & \color{green}{[\text{Prior for } \beta]}\\ \sigma & \sim \text{Uniform}(0, 50) & [\text{Prior for } \sigma] \end{aligned}` $$ _It’s often advantageous to mean center our predictors._ But prior for `\(\color{green}{\beta \sim \text{Normal}(0, 10)}\)` needs explanation. Let's simulate to see what it means. ]] ```r set.seed(2971) # how many lines would you like? n_lines <- 100 lines <- tibble(n = 1:n_lines, a = rnorm(n_lines, mean = 178, sd = 20), b = rnorm(n_lines, mean = 0, sd = 10)) %>% tidyr::expand(nesting(n, a, b), weight = range(d2$weight)) %>% mutate(height = a + b * (weight - mean(d2$weight))) ``` --- # Prior predictive simulation for the height <img src="index_files/figure-html/priorbeta-1.png" width="120%" style="display: block; margin: auto;" /> --- # Better prior choice for `\(\beta\sim\text{Log-Normal(0, 1)}\)` <img src="index_files/figure-html/priorbetaetter-1.png" width="120%" style="display: block; margin: auto;" /> ??? it is the distribution whose logarithm is normally distributed. --- # Linear regression with predictor .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Let's re-write the model for weight `\(w_i\)` predicting height `\((h_i)\)`: $$ `\begin{aligned} \color{red}{h_i} & \color{red}{\sim \text{Normal}(\mu_i, \sigma)} & \color{red}{\text{[Likelihood]}} \\ \color{blue}{\mu_i} & \color{blue}{= \alpha + \beta(x_i - \bar{x})} & \color{blue}{\text{[Linear model]}} \\ \alpha & \sim \text{Normal}(178, 20) & [\text{Prior for } \alpha] \\ \color{green}{\beta} & \color{green}{\sim \text{Log-Normal}(0, 1)} & \color{green}{[\text{Prior for } \beta]}\\ \sigma & \sim \text{Uniform}(0, 50) & [\text{Prior for } \sigma] \end{aligned}` $$ Let's build this model with Stan: ]] ```r d2 <- d2 %>% mutate(weight_c = weight - mean(weight)) # center weight ``` ```r b_lm2 <- brm(data = d2, family = gaussian, height ~ 1 + weight_c, prior = c(prior(normal(178, 20), class = Intercept), prior(lognormal(0, 1), class = b), prior(uniform(0, 50), class = sigma)), iter = 28000, warmup = 27000, chains = 4, cores = 4, seed = 4); saveRDS(b_lm2, "Stanfiles/b_lm2.rds") ``` --- # Summary of the fitting results .scroll-box-12[ ```r b_lm2 <- readRDS("Stanfiles/b_lm2.rds") summary(b_lm2) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: height ~ 1 + weight_c ## Data: d2 (Number of observations: 352) ## Draws: 4 chains, each with iter = 28000; warmup = 27000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 154.60 0.27 154.08 155.13 1.00 4421 2990 ## weight_c 0.90 0.04 0.82 0.99 1.01 513 605 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 5.10 0.19 4.74 5.48 1.00 2405 1453 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] -- ## Compare with frequentist linear regression ```r lm2 <- lm(height ~ weight_c, data = d2) epiDisplay::regress.display(lm2) ``` ``` ## Linear regression predicting height ## ## Coeff.(95%CI) P(t-test) P(F-test) ## weight_c (cont. var.) 0.91 (0.82,0.99) < 0.001 < 0.001 ## ## No. of observations = 352 ``` --- # Posterior distribution, inspect the chains <img src="index_files/figure-html/plotblm2-1.png" width="120%" style="display: block; margin: auto;" /> --- # Plot posteriors (95% CrI) on the data <img src="index_files/figure-html/plotb_lm2-1.png" width="120%" style="display: block; margin: auto;" /> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Reject a specific null, and then argue for an arbitrary alternative. It’s pretty remarkable that so few people see how absurd this procedure is. .right[.font120[.black[JP de Ruiter]] ] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. Logistic regression models 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- # Logistic regression model - `lbw` data .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Low birth weight data: <img src="fig/lbw.png" width="60%" style="display: block; margin: auto;" /> We want to see which variable might be associated with low birth weight. ] .scroll-box-12[ ```r lbw <- haven::read_dta("data/lbw.dta") head(lbw) ``` ``` ## # A tibble: 6 × 11 ## id low age lwt race smoke ptl ht ui ftv bwt ## <dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 85 0 19 182 2 [black] 0 [nonsmoker] 0 0 1 0 2523 ## 2 86 0 33 155 3 [other] 0 [nonsmoker] 0 0 0 3 2551 ## 3 87 0 20 105 1 [white] 1 [smoker] 0 0 0 1 2557 ## 4 88 0 21 108 1 [white] 1 [smoker] 0 0 1 2 2594 ## 5 89 0 18 107 1 [white] 1 [smoker] 0 0 1 0 2600 ## 6 91 0 21 124 3 [other] 0 [nonsmoker] 0 0 0 0 2622 ``` ] --- # Logistic regression model - model .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Let's write down the model for mother's smoking status `\((\text{SM}_i)\)` (yes, no) predicting low birth weight of the babies `\((\text{low}_i)\)` (yes, no): $$ `\begin{aligned} \color{red}{\text{low}_i} & \color{red}{\sim \text{Binomial}(1, p_i)} & \color{red}{\text{[Likelihood]}} \\ \color{blue}{\text{logit}{(p_i)}} & \color{blue}{= \alpha + \beta_{\text{SM}_i} } & \color{blue}{\text{[Linear model]}} \\ \alpha & \sim \text{Normal}(0, 2) & [\text{Prior for } \alpha] \\ \color{green}{\beta} & _; \color{green}{\sim \text{Normal}(0, 2)} & \color{green}{[\text{Prior for } \beta]}\\ \end{aligned}` $$ ]] <img src="index_files/figure-html/priorjust-1.png" width="120%" style="display: block; margin: auto;" /> --- # Logistic regression model - fit in Stan ```r lbw$smoke <- as.factor(lbw$smoke) b_lbw_sm <- brm(data = lbw, family = bernoulli(link = "logit"), formula = low ~ 1 + smoke, prior = c(prior(normal(0, 2), class = Intercept), prior(normal(0, 2), class = b)), iter = 2000, warmup = 1000, chains = 4, cores = 8, seed = 11, sample_prior = T) saveRDS(b_lbw_sm, "Stanfiles/b_lbw_sm.rds") ``` .scroll-box-10[ ```r b_lbw_sm <- readRDS("Stanfiles/b_lbw_sm.rds") summary(b_lbw_sm) ``` ``` ## Family: bernoulli ## Links: mu = logit ## Formula: low ~ 1 + smoke ## Data: lbw (Number of observations: 189) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept -1.08 0.21 -1.49 -0.68 1.00 2928 2580 ## smoke 0.69 0.31 0.08 1.33 1.00 3497 2275 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ```r exp(fixef(b_lbw_sm)) # generate OR and 95% CrI ``` ``` ## Estimate Est.Error Q2.5 Q97.5 ## Intercept 0.3388723 1.233606 0.2250548 0.5066391 ## smoke 1.9962295 1.369582 1.0858879 3.7635748 ``` ] --- # Logistic regression model - frequentist .scroll-output[ ```r Model_Binary <- glm(low ~ smoke, family = binomial(link = "logit"), data = lbw) summary(Model_Binary) ``` ``` ## ## Call: ## glm(formula = low ~ smoke, family = binomial(link = "logit"), ## data = lbw) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -1.0197 -0.7623 -0.7623 1.3438 1.6599 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -1.0871 0.2147 -5.062 4.14e-07 *** ## smoke 0.7041 0.3196 2.203 0.0276 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 234.67 on 188 degrees of freedom ## Residual deviance: 229.80 on 187 degrees of freedom ## AIC: 233.8 ## ## Number of Fisher Scoring iterations: 4 ``` ```r epiDisplay::logistic.display(Model_Binary) ``` ``` ## ## Logistic regression predicting low ## ## OR(95%CI) P(Wald's test) P(LR-test) ## smoke: 1 vs 0 2.02 (1.08,3.78) 0.028 0.027 ## ## Log-likelihood = -114.9023 ## No. of observations = 189 ## AIC value = 233.8046 ``` ] --- # Bayesian posterior OR .scroll-box-12[ ```r b_lbw_sm <- readRDS("Stanfiles/b_lbw_sm.rds") post <- as_draws_df(b_lbw_sm) post %>% transmute(`odds ratio` = exp(b_smoke)) %>% summarise(median = median(`odds ratio`), mean = mean(`odds ratio`), sd = sd(`odds ratio`), lowCrI = quantile(`odds ratio`, probs = .025), upperCrI = quantile(`odds ratio`, probs = .975)) %>% mutate_all(round, digits = 4) ``` ``` ## # A tibble: 1 × 5 ## median mean sd lowCrI upperCrI ## <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1.99 2.10 0.680 1.09 3.76 ``` ] .pull-left[ <img src="index_files/figure-html/postOR1-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="index_files/figure-html/postOR2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Posterior distribution, inspect the chains <img src="index_files/figure-html/b_lbw_sm-1.png" width="120%" style="display: block; margin: auto;" /> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> “How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?” (Doyle 1890, Ch. 6) .right[.font120[.black[Sherlock Holmes said to Dr. Watson]] ] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. Multilevel (mixed-effect) models 1. .white[Survival analysis with Cox proportional hazard models] 1. .white[Further issues] ] ] --- # Multilevel models - random intercept .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> `siblings` data: we have information on 8604 children born to 3978 mothers. Variable available: birth weight (g), gestational age (weeks), sex and maternal smoking status and age of mother when baby was born. ] <img src="index_files/figure-html/Hier01-5-1.png" width="120%" style="display: block; margin: auto;" /> --- # Multilevel models - random intercept .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> Let's write down a random intecept model with no predictor for babies' birth weight `\(bwt\)` nested within mother IDs. $$ `\begin{aligned} \color{red}{\text{bwt}_i} & \color{red}{\sim \text{Normal}(\mu_i, \sigma_i)} & \color{red}{\text{[Likelihood]}} \\ \color{blue}{\mu_i} & \color{blue}{= \alpha_{\text{Mother}[i]}} & \color{blue}{\text{[Random intercept model]}} \\ \color{green}{\alpha_j} & \color{green}{\sim \text{Normal}(\bar{\alpha}, \sigma_{\text{Mother}[i]})} & \color{green}{[\text{Prior for } \alpha]}\\ \color{green}{\bar{\alpha}} & \color{green}{\sim \text{To be determined}} & \color{green}{\text{[Prior for }\bar{\alpha}]} \\ \color{green}{\sigma_{\text{Mother}[i]}} & \color{green}{\sim \text{To be determined}} & \color{green}{\text{[Prior for }\sigma_{\text{Mother}[i]}]} \\ \color{red}{\sigma_i} & \color{red}{\sim \text{To be determined}} & \color{red}{\text{[Prior for } \sigma_i]} \end{aligned}` $$ The priors for the parameters are left un-specified. ]] ```r b_sib_2 <- brm(data = siblings, family = gaussian, formula = birwt ~ 1 + (1 | momid), iter = 5000, warmup = 1000, chains = 4, cores = 8, seed = 13) ``` --- # Multilevel models - random intercept .scroll-output[ ```r b_sib_2 <- readRDS("Stanfiles/b_sib_2.rds") summary(b_sib_2) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: birwt ~ 1 + (1 | momid) ## Data: siblings (Number of observations: 8604) ## Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1; ## total post-warmup draws = 16000 ## ## Group-Level Effects: ## ~momid (Number of levels: 3978) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sd(Intercept) 368.29 6.39 355.90 380.96 1.00 4864 8784 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 3468.00 7.18 3454.06 3482.08 1.00 10719 12323 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 377.79 3.91 370.18 385.46 1.00 7907 10731 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Multilevel models - frequentist .scroll-output[ ```r library(lmerTest) library(lme4) M1 <- lmer(birwt ~ (1|momid), data = siblings, REML = TRUE) summary(M1) ``` ``` ## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] ## Formula: birwt ~ (1 | momid) ## Data: siblings ## ## REML criterion at convergence: 130945.2 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -6.2743 -0.4860 0.0035 0.5054 4.0504 ## ## Random effects: ## Groups Name Variance Std.Dev. ## momid (Intercept) 135686 368.4 ## Residual 142625 377.7 ## Number of obs: 8604, groups: momid, 3978 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 3467.969 7.139 3958.319 485.8 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] --- # Intraclass correlation (ICC) .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> We can calculate the point estimate from frequentist model results: `$$\text{ICC} = \frac{\sigma^2_{\text{Mother}[i]}}{\sigma^2_{\text{Mother}[i]} + \sigma^2_i} = \frac{368.4^2}{368.4^2 + 377.7^2} = 48.8 \%$$` But with Bayesian inference we know we can do better than this: ]] .pull-left[ <img src="index_files/figure-html/ICC0-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="index_files/figure-html/ICC-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Random intercept + 1st level predictor(s) .scroll-output[ ```r siblings <- siblings %>% mutate(c_gestat = gestat - 38, # centering gestational age to 38 weeks c_mage = mage - 30, # centering maternal age to 30 years old male = factor(male, labels = c("female", "male")), smoke = factor(smoke, labels = c("Nonsmoker", "Smoker"))) ``` ```r b_sib_3 <- brm(data = siblings, family = gaussian, formula = birwt ~ 1 + c_gestat + (1 | momid), seed = 13) summary(b_sib_3) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: birwt ~ 1 + c_gestat + (1 | momid) ## Data: siblings (Number of observations: 8604) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Group-Level Effects: ## ~momid (Number of levels: 3978) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sd(Intercept) 336.38 5.95 324.90 348.14 1.00 1278 2204 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 3358.77 7.32 3344.39 3373.15 1.00 2792 2700 ## c_gestat 83.72 2.23 79.23 88.11 1.00 4240 2878 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 352.56 3.66 345.26 359.62 1.00 2088 2709 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Random intercept + 1st level predictor(s) .blockquote[ Frequentist `lmer` command and results are like this: ] .scroll-output[ ```r M2 <- lmer(birwt ~ c_gestat+ (1 | momid), data = siblings, REML = TRUE) summary(M2) ``` ``` ## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] ## Formula: birwt ~ c_gestat + (1 | momid) ## Data: siblings ## ## REML criterion at convergence: 129638.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.9566 -0.5295 0.0066 0.5264 3.9872 ## ## Random effects: ## Groups Name Variance Std.Dev. ## momid (Intercept) 113073 336.3 ## Residual 124282 352.5 ## Number of obs: 8604, groups: momid, 3978 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 3358.544 7.184 4956.364 467.50 <2e-16 *** ## c_gestat 83.732 2.231 7785.252 37.52 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) ## c_gestat -0.406 ``` ] --- # Random intercept + 2nd level predictor(s) .scroll-output[ ```r b_sib_4 <- brm(data = siblings, family = gaussian, formula = birwt ~ 1 + c_gestat + smoke + (1 | momid), seed = 13) summary(b_sib_4) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: birwt ~ 1 + c_gestat + smoke + (1 | momid) ## Data: siblings (Number of observations: 8604) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Group-Level Effects: ## ~momid (Number of levels: 3978) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sd(Intercept) 317.00 6.02 305.26 328.62 1.00 1303 1590 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 3397.01 7.38 3382.72 3411.63 1.00 3152 2775 ## c_gestat 83.88 2.21 79.44 88.15 1.00 4397 2999 ## smokeSmoker -271.16 16.27 -303.01 -238.98 1.00 3571 2994 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 353.51 3.75 346.17 360.91 1.00 1898 2447 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Random intercept + 2nd level predictor(s) .blockquote[ Frequentist `lmer` command and results are like this: ] .scroll-output[ ```r M3 <- lmer(birwt ~ c_gestat + smoke + (1 | momid), data = siblings, REML = TRUE) summary(M3) ``` ``` ## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] ## Formula: birwt ~ c_gestat + smoke + (1 | momid) ## Data: siblings ## ## REML criterion at convergence: 129358.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -4.0115 -0.5368 0.0033 0.5318 3.7920 ## ## Random effects: ## Groups Name Variance Std.Dev. ## momid (Intercept) 100583 317.1 ## Residual 124879 353.4 ## Number of obs: 8604, groups: momid, 3978 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 3396.789 7.332 5194.160 463.28 <2e-16 *** ## c_gestat 83.913 2.210 7924.290 37.97 <2e-16 *** ## smokeSmoker -271.149 16.098 7367.611 -16.84 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) c_gstt ## c_gestat -0.398 ## smokeSmoker -0.317 0.014 ``` ] --- # Random slopes with predictor(s) .scroll-output[ ```r b_sib_5 <- brm(data = siblings, family = gaussian, formula = birwt ~ 1 + c_gestat + smoke + c_mage + (1 + smoke | momid), seed = 13) summary(b_sib_5) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: birwt ~ 1 + c_gestat + smoke + c_mage + (1 + smoke | momid) ## Data: siblings (Number of observations: 8604) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Group-Level Effects: ## ~momid (Number of levels: 3978) ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sd(Intercept) 317.13 6.49 304.70 329.75 1.01 1246 2230 ## sd(smokeSmoker) 225.59 68.47 23.85 324.01 1.11 32 NA ## cor(Intercept,smokeSmoker) -0.35 0.14 -0.54 -0.03 1.03 240 NA ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 3412.17 7.40 3397.99 3427.28 1.00 1930 2399 ## c_gestat 84.41 2.19 80.01 88.57 1.00 3343 2523 ## smokeSmoker -252.76 17.02 -286.22 -220.35 1.00 1443 2161 ## c_mage 12.94 1.11 10.80 15.20 1.00 2044 2725 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 348.08 3.73 340.78 355.42 1.01 768 2208 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Random slopes with predictor(s) .blockquote[ Frequentist `lmer` command and results are like this: ] .scroll-output[ ```r M4 <- lmer(birwt ~ c_gestat + smoke + c_mage + (smoke | momid), data = siblings, REML = TRUE) summary(M4) ``` ``` ## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest'] ## Formula: birwt ~ c_gestat + smoke + c_mage + (smoke | momid) ## Data: siblings ## ## REML criterion at convergence: 129204.3 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.9801 -0.5342 -0.0009 0.5294 3.7808 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## momid (Intercept) 100991 317.8 ## smokeSmoker 64816 254.6 -0.39 ## Residual 120669 347.4 ## Number of obs: 8604, groups: momid, 3978 ## ## Fixed effects: ## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 3411.986 7.404 4456.157 460.81 <2e-16 *** ## c_gestat 84.432 2.194 7848.005 38.48 <2e-16 *** ## smokeSmoker -253.538 16.776 936.755 -15.11 <2e-16 *** ## c_mage 12.930 1.098 5336.860 11.78 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Correlation of Fixed Effects: ## (Intr) c_gstt smkSmk ## c_gestat -0.389 ## smokeSmoker -0.300 0.018 ## c_mage 0.158 0.019 0.153 ``` ] --- # Posterior distribution, inspect the chains <img src="index_files/figure-html/b_sib_3-1.png" width="120%" style="display: block; margin: auto;" /> --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Epidemiology is like a bikini: what is revealed is interesting; what is concealed is crucial. .right[.font120[[.black[Peter Duesberg]](https://en.wikipedia.org/wiki/Peter_Duesberg)]] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. Survival analysis with Cox proportional hazard models 1. .white[Further issues] ] ] --- # Cox proportional hazard models .scroll-output[ ```r gehan <- read.table("https://data.princeton.edu/wws509/datasets/gehan.raw", header = FALSE, sep = "", col.names = c( "group", "weeks", "remission")) library(survival) coxmodel <- coxph(Surv(time = weeks, event = remission) ~ as.factor(group), data = gehan) summary(coxmodel) ``` ``` ## Call: ## coxph(formula = Surv(time = weeks, event = remission) ~ as.factor(group), ## data = gehan) ## ## n= 42, number of events= 30 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## as.factor(group)2 -1.5721 0.2076 0.4124 -3.812 0.000138 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## as.factor(group)2 0.2076 4.817 0.09251 0.4659 ## ## Concordance= 0.69 (se = 0.041 ) ## Likelihood ratio test= 16.35 on 1 df, p=5e-05 ## Wald test = 14.53 on 1 df, p=1e-04 ## Score (logrank) test = 17.25 on 1 df, p=3e-05 ``` ] --- # Cox model in Rstan `brms` .scroll-output[ ```r gehan <- gehan %>% mutate( remission1 = 1 - remission ) %>% mutate( Group = as.factor(group) ) fitBayesCox <- brm(data = gehan, family = brmsfamily("cox"), weeks | cens(remission1) ~ 1 + Group, iter = 4000, warmup = 1500, chains = 4, cores = 4, seed = 14) summary(fitBayesCox) ``` ``` ## Family: cox ## Links: mu = log ## Formula: weeks | cens(remission1) ~ 1 + Group ## Data: gehan (Number of observations: 42) ## Draws: 4 chains, each with iter = 4000; warmup = 1500; thin = 1; ## total post-warmup draws = 10000 ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.59 0.34 0.95 2.28 1.00 6754 6053 ## Group2 -1.66 0.42 -2.50 -0.86 1.00 8651 7038 ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Cox model in Rstan `brms` - HR .scroll-box-14[ ```r fitBayesCox <- readRDS("Stanfiles/fitBayesCox.rds") post <- posterior_samples(fitBayesCox) post %>% transmute(`hazard ratio` = exp(b_Group2)) %>% summarise(median = median(`hazard ratio`), mean = mean(`hazard ratio`), sd = sd(`hazard ratio`), lowCrI = quantile(`hazard ratio`, probs = .025), upperCrI = quantile(`hazard ratio`, probs = .975)) %>% mutate_all(round, digits = 4) ``` ``` ## median mean sd lowCrI upperCrI ## 1 0.1921 0.2078 0.0887 0.0819 0.423 ``` ] .pull-left[ <img src="index_files/figure-html/postHR-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="index_files/figure-html/postHR2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Table of Cotents 目次 .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Epidemiology is like a bikini: what is revealed is interesting; what is concealed is crucial. .right[.font120[[.black[Peter Duesberg]](https://en.wikipedia.org/wiki/Peter_Duesberg)]] ] .bold[.font140[ 1. .white[Introduction to Bayesian way of thinking] 1. .white[Bayes' Rule] 1. .white[Binomial Bayesian models] 1. .white[Simple linear regression models] <!-- 1. .white[Poission regression models] --> 1. .white[Logistic regression models] 1. .white[Multilevel (mixed-effect) models] 1. .white[Survival analysis with Cox proportional hazard models] 1. Further issues ] ] --- # Further issue - build an interaction .blockquote[.font110[ <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> Why using indicator/dummy variable is not a good idea: - In a simple linear regression model (back to the Howell1 data in P36), if we add a dummy variable for gender (men = 1, women = 0): `$$\mu_i = \alpha + \beta_1(x_i - \bar{x}) + \beta_2\text{Sex}_i$$` - This means we need to add prior information (known or vague/flat, whatever) to `\(\beta_2\)` for men - Which also means that we tell the model that `\(\mu_i\)` (mean height) of men **more uncertain** than women, before we do any analysis on the data. - A recommended solution to avoid this situation is to use **a index variable** instead of dummy variable, let men and women have different intercepts with the same prior: `$$\mu_i = \alpha_{\text{Sex[i]}} + \beta_1(x_i - \bar{x})$$` ]] --- # Further issue - use index variable .scroll-output[ ```r d2 <- d2 %>% mutate(Sex = if_else(male == 1, "1", "2")) # 1 for men 2 for women ``` ```r b_lm2_sex <- brm(data = d2, family = gaussian, height ~ 0 + Sex + weight_c, prior = c(prior(normal(178, 20), class = b, coef = Sex1), prior(normal(178, 20), class = b, coef = Sex2), prior(lognormal(0, 1), class = b, coef = weight_c), prior(uniform(0, 50), class = sigma)), iter = 28000, warmup = 27000, chains = 4, cores = 8, seed = 4) summary(b_lm2_sex) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: height ~ 0 + Sex + weight_c ## Data: d2 (Number of observations: 352) ## Draws: 4 chains, each with iter = 28000; warmup = 27000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Sex1 158.06 0.38 157.33 158.79 1.00 2124 2688 ## Sex2 151.55 0.34 150.88 152.21 1.00 2108 2167 ## weight_c 0.64 0.04 0.55 0.73 1.00 513 457 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 4.29 0.16 3.99 4.63 1.00 2135 1636 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Relationship between height and weight <img src="index_files/figure-html/plotb_lm2_sex-1.png" width="120%" style="display: block; margin: auto;" /> --- # Further issue - adding interaction .scroll-output[ ```r b_lm2_sex1 <- brm(data = d2, family = gaussian, bf(height ~ 0 + a + b * weight_c, a ~ 0 + Sex, b ~ 0 + Sex, nl = TRUE), prior = c(prior(normal(178, 20), class = b, coef = Sex1, nlpar = a), prior(normal(178, 20), class = b, coef = Sex2, nlpar = a), prior(lognormal(0, 1), class = b, coef = Sex1, nlpar = b), prior(lognormal(0, 1), class = b, coef = Sex2, nlpar = b), prior(exponential(1), class = sigma)), iter = 2000, warmup = 1000, chains = 4, cores = 8, seed = 4) summary(b_lm2_sex1) ``` ``` ## Family: gaussian ## Links: mu = identity; sigma = identity ## Formula: height ~ 0 + a + b * weight_c ## a ~ 0 + Sex ## b ~ 0 + Sex ## Data: d2 (Number of observations: 352) ## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup draws = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## a_Sex1 157.86 0.40 157.09 158.68 1.00 3358 2765 ## a_Sex2 151.38 0.35 150.69 152.07 1.00 4051 3267 ## b_Sex1 0.70 0.06 0.58 0.81 1.00 3865 2660 ## b_Sex2 0.58 0.06 0.47 0.69 1.00 3550 3137 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## sigma 4.26 0.16 3.96 4.58 1.00 4162 2781 ## ## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- # Plotting the interaction <img src="index_files/figure-html/plotb_lm2_sexinter-1.png" width="120%" style="display: block; margin: auto;" /> --- # Ordered categorical predictors .blockquote[.font110[ <svg viewBox="0 0 384 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M202.021 0C122.202 0 70.503 32.703 29.914 91.026c-7.363 10.58-5.093 25.086 5.178 32.874l43.138 32.709c10.373 7.865 25.132 6.026 33.253-4.148 25.049-31.381 43.63-49.449 82.757-49.449 30.764 0 68.816 19.799 68.816 49.631 0 22.552-18.617 34.134-48.993 51.164-35.423 19.86-82.299 44.576-82.299 106.405V320c0 13.255 10.745 24 24 24h72.471c13.255 0 24-10.745 24-24v-5.773c0-42.86 125.268-44.645 125.268-160.627C377.504 66.256 286.902 0 202.021 0zM192 373.459c-38.196 0-69.271 31.075-69.271 69.271 0 38.195 31.075 69.27 69.271 69.27s69.271-31.075 69.271-69.271-31.075-69.27-69.271-69.27z"></path></svg> How do you take care of the ordered categorical predictors? Such as, - Educational level: <br> "Elementary School", "High School", "Bachelor's Degree", "Graduate Degree"; - Categorized values transformed from continuous values: <br> BMI: < 18.5, 18.5 - 25, >= 25. - Dummy variable again? Or force them as continuous predictors: 0, 1, 2, ...? - **We don't want to assume that the effect distance between each ordinal value is the same.** ]] .scroll-box-10[ ```r data("Trolley", package = "rethinking") d <- Trolley;rm(Trolley) d <- d %>% mutate(edu_new = recode(edu, "Elementary School" = 1, "Middle School" = 2, "Some High School" = 3, "High School Graduate" = 4, "Some College" = 5, "Bachelor's Degree" = 6, "Master's Degree" = 7, "Graduate Degree" = 8) %>% as.integer()) # what did we do? d %>% distinct(edu, edu_new) %>% arrange(edu_new) ``` ``` ## edu edu_new ## 1 Elementary School 1 ## 2 Middle School 2 ## 3 Some High School 3 ## 4 High School Graduate 4 ## 5 Some College 5 ## 6 Bachelor's Degree 6 ## 7 Master's Degree 7 ## 8 Graduate Degree 8 ``` ] --- # Dirichlet distribution as the prior .blockquote[.font110[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M290.74 93.24l128.02 128.02-277.99 277.99-114.14 12.6C11.35 513.54-1.56 500.62.14 485.34l12.7-114.22 277.9-277.88zm207.2-19.06l-60.11-60.11c-18.75-18.75-49.16-18.75-67.91 0l-56.55 56.55 128.02 128.02 56.55-56.55c18.75-18.76 18.75-49.16 0-67.91z"></path></svg> When there are 8 education levels, we need 7 parameters to proportionate the maximum/total effect `\(\beta_{\text{E}}\)` of "Education" $$ `\begin{aligned} \mu_i & = \color{red}{\beta_E}\sum_{j = 1}^{7} \delta_j + \text{other predictors}\\ \delta & \sim \text{Dirichlet}(\alpha) \end{aligned}` $$ Where, `\(\sum_{j = 1}^{7} \delta_j =1\)` ]] <img src="index_files/figure-html/dirichlet-1.png" width="85%" style="display: block; margin: auto;" /> ??? It expects that any of the probabilities could be bigger or smaller than the others. --- # Specify `dirchlet` prior in the Stan model $$ `\begin{aligned} \text{response}_i & \sim \text{Categorical}{\mathbf{(p)}} \\ \text{logit}(p_k) & = \alpha_k - \phi_i \\ \phi_i & = \beta_1\text{action}_i + \beta_2\text{contact}_i + \beta_3 \text{intention}_i + \beta_4 \text{mo(edu_i, }\delta_i) \\ \alpha_k & \sim \text{Normal}(0, 1.5) \\ \beta_1, \dots, \beta_3 & \sim \text{Normal}(0, 1) \\ \beta_4 & \sim \text{Normal}(0, 0.143) \\ \mathbf{\delta} & \sim \text{Dirchlet}(2, 2, 2, 2, 2, 2, 2) \end{aligned}` $$ ```r b12.6 <- brm(data = d, family = cumulative, response ~ 1 + action + contact + intention + mo(edu_new), # note the `mo()` syntax prior = c(prior(normal(0, 1.5), class = Intercept), prior(normal(0, 1), class = b), # note the new kinds of prior statements prior(normal(0, 0.143), class = b, coef = moedu_new), * prior(dirichlet(2, 2, 2, 2, 2, 2, 2), class = simo, coef = moedu_new1)), iter = 2000, warmup = 1000, cores = 4, chains = 4, seed = 12) ``` --- # Final words .blockquote[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M256 32C114.6 32 0 125.1 0 240c0 49.6 21.4 95 57 130.7C44.5 421.1 2.7 466 2.2 466.5c-2.2 2.3-2.8 5.7-1.5 8.7S4.8 480 8 480c66.3 0 116-31.8 140.6-51.4 32.7 12.3 69 19.4 107.4 19.4 141.4 0 256-93.1 256-208S397.4 32 256 32zM128 272c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32zm128 0c-17.7 0-32-14.3-32-32s14.3-32 32-32 32 14.3 32 32-14.3 32-32 32z"></path></svg> Cited from ["My Journey From Frequentist to Bayesian Statistics"](https://www.fharrell.com/post/journey/) by [Professor Frank Harrell, https://www.fharrell.com/](https://www.fharrell.com/): - A slightly oversimplified equations to contrast frequentist and Bayesian inference: - Frequentist = subjectivity<sub>1</sub> + subjectivity<sub>2</sub> + objectivity + data + endless arguments about everything - Baeysian = subjectivity<sub>1</sub> + subjectivity<sub>3</sub> + objectivity + data + endless arguments about one thing (the prior) Where 1. subjectivity<sub>1</sub> = choice of the data model 2. subjectivity<sub>2</sub> = sample space, how repetitions of the experiments are envisioned, choice of the stopping rule, 1-tailed vs. 2-tailed tests, multiplicity adjustments, ... 3. subjectivity<sub>3</sub> = prior distribution ] --- # (Dis)Advantages of Bayesian Statistics .blockquote[.font130[ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M173.898 439.404l-166.4-166.4c-9.997-9.997-9.997-26.206 0-36.204l36.203-36.204c9.997-9.998 26.207-9.998 36.204 0L192 312.69 432.095 72.596c9.997-9.997 26.207-9.997 36.204 0l36.203 36.204c9.997 9.997 9.997 26.206 0 36.204l-294.4 294.401c-9.998 9.997-26.207 9.997-36.204-.001z"></path></svg> Advantages: - Natural approach to express uncertainty (probability) - Ability to (formally) incorporate prior information - Increased model flexibility - Full posterior distribution of every parameter <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M173.898 439.404l-166.4-166.4c-9.997-9.997-9.997-26.206 0-36.204l36.203-36.204c9.997-9.998 26.207-9.998 36.204 0L192 312.69 432.095 72.596c9.997-9.997 26.207-9.997 36.204 0l36.203 36.204c9.997 9.997 9.997 26.206 0 36.204l-294.4 294.401c-9.998 9.997-26.207 9.997-36.204-.001z"></path></svg> (Dis)Advantage but fun: - Slow speed of model estimation ]] --- # References 1. McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018. 1. Gelman, Andrew, Jennifer Hill, and Aki Vehtari. Regression and other stories. Cambridge University Press, 2020. 1. Kruschke, John. "Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan." (2014). 1. Lesaffre, Emmanuel, and Andrew B. Lawson. Bayesian biostatistics. John Wiley & Sons, 2012. 1. Sober, Elliott. Evidence and evolution: The logic behind the science. Cambridge University Press, 2008. 1. Kruschke, J.K., Liddell, T.M. Bayesian data analysis for newcomers. Psychon Bull Rev 25, 155–177 (2018). https://doi.org/10.3758/s13423-017-1272-1 1. Gelman, Andrew, et al. Bayesian data analysis. Chapman and Hall/CRC, 1995. 1. Bürkner, Paul-Christian. "brms: An R package for Bayesian multilevel models using Stan." Journal of statistical software 80.1 (2017): 1-28. 1. Carpenter, Bob, et al. "Stan: A probabilistic programming language." Journal of statistical software 76.1 (2017): 1-32. 1. Dogucu, Mine, Alicia Johnson, and Miles Ott. 2021. Bayesrules: Datasets and Supplemental Functions from the Bayes Rules! Book. https://github.com/bayes-rules/bayesrules. --- class: inverse, center, middle, mline # Thanks and see you again