# 第 85 章 不同實驗/研究設計時適用的貝葉斯模型

1. $$Y$$ 是要研究的疾病，$$X$$ 是要研究的暴露因素；
2. 給概率 (probabilities)，和患病率 (prevalences)提供均一先驗概率分佈 (uniform priors)。給作用效果的評價指標例如危險度比 (risk ratios, RR)，危險度差 (risk difference, RD)，或者是比值比 (odds ratios, OR)提供中性的，圍繞着無效爲中心的先驗概率分佈 (“neutral” priors centred on no effect)。

## 85.1 隊列研究設計時的貝葉斯模型

model{
Y0 ~ dbin(r0, X0) # Data model for unexposed
Y1 ~ dbin(r1, X1) # Data model for exposed

# Priors for risks of exposed and unexposed
r0 ~ dbeta(a.unexp, b.unexp)
r1 ~ dbeta(a.exp, b.exp)

# Computation of comparison statistics
rd <-  r1 - r0   # for Risk Difference
rr <- r1 / r0    # for Risk Ratio
}

list(X0 = 2000,
X1 = 1000,
Y0 = 100,
Y1 = 150,
a.unexp = 1, b.unexp = 1,
a.exp   = 1, b.exp   = 1)

model{
Y0 ~ dbin(r0, X0) # Data model for unexposed
Y1 ~ dbin(r1, X1) # Data model for exposed

# define r1 as the unexposed risk plus an effect "rd"
r1 <- r0 + rd

# Priors for risks of exposed and unexposed
r0 ~ dbeta(a.unexp, b.unexp)
rd ~ dunif(min.rd, max.rd)

# Computation of comparison statistics
rr <- r1 / r0    # for Risk Ratio
}

list(X0 = 2000,
X1 = 1000,
Y0 = 100,
Y1 = 150,
a.unexp = 1, b.unexp = 1,
min.rd = -30, max.rd = 30)

model{
Y0 ~ dbin(r0, X0)   # Data model for unexposed
logit(r0) <- lr0    # define the logit of r0
Y1 ~ dbin(r1, X1)   # Data model for exposed
logit(r1) <- lr1    # define the logit of r1

# define lr1 as the unexposed logit(risk) plus log(OR):
lr1 <- lr0 + lor

# priors for the logit of risk of unexposed and of log(OR):
lr0 ~ dnorm(mean.lr0, pr.lr0)
lor ~ dnorm(mean.lor, pr.lor)

# computations of comparison statistics:
rd <- r1 - r0      # For risk difference
rr <- r1 / r0      # For risk ratio
}

list(X0 = 2000,
X1 = 1000,
Y0 = 100,
Y1 = 150,
mean.lr0 = 0, pr.lr0 = 0.3,
mean.lor = 0, pr.lor = 0.33)

## 85.2 病例對照研究設計時的貝葉斯模型

model{
X0 ~ dbin(p0, Y0) # data model for CONTROLS
logit(p0) <- lp0  # define the logit of p0
X1 ~ dbin(p1, Y1) # data model for CASES
logit(p1) <- lp1  # define the logit of p1

# define lp1 as the unexposed logit(exposure) plus log(OR)
lp1 <- lp0 + lor

# Priors for logit of exposure and of log(OR)
lp0 ~ dnorm(mean.lp0, pr.lp0)
lor ~ dnorm(mean.lor, pr.lor)

# Computation of comparison statistics
or <- exp(lor)    # For Odds Ratio
}

list(Y0 = 2000,
Y1 = 1000,
X0 = 800,
X1 = 600,
mean.lp0 = 0, pr.lp0 = 0.3,
mean.lor = 0, pr.lor = 0.33)

## 85.3 橫斷面研究設計時的貝葉斯模型

• 25名吸煙者患有癌症；
• 15名非吸煙者患有癌症；
• 150名吸煙者無癌症；
• 剩餘210名非吸煙者無癌症。

$N \sim \text{Multinomial}([\theta_{11}, \theta_{10}, \theta_{01}, \theta_{00}], 400)$

$\mathbf{\theta} \sim \text{Dirichlet}(\alpha_1, \alpha_2, \dots. \alpha_k) \text{ where } \alpha_i >0, \text{ and we denote } \sum_{i = 1}^k \alpha_i = \alpha$

1. 狄利克雷分布中的任意一個分類的(category)邊際先驗概率分布(marginal prior distribution)是一個Beta分布：$$\theta_i \sim \text{Beta}(\alpha_i, \alpha - \alpha_i)$$
2. 狄利克雷分布中的任意一個分類的參數 $$\theta_i$$ 的均值（期望）是：$$E(\theta_i) = \frac{\alpha_i}{\alpha}$$
3. 狄利克雷分布中的任意一個分類的參數 $$\theta_i$$ 的方差是：$$V(\theta_i) = \frac{E(\theta_i)(1-E(\theta_i))}{\alpha + 1}$$

• 狄利克雷分布$$\text{Dirichlet}(1,2,1)$$，其實是樣本量爲$$\alpha = 4$$，均值是 $$\text{Means}[0.25, 0.5, 0.25]$$的分布；
• 狄利克雷分布$$\text{Dirichlet}(1,2,1)$$，其實是樣本量爲$$\alpha = 40$$，均值也是$$\text{Means}[0.25, 0.5, 0.25]$$的分布。

model{
N[1:4] ~ dmulti(p[], S)  # data model for sample
p[1:4] ~ ddirch(alpha[]) # Dirichlet prior for vector of probabilities

# Computation of comparison statistics:
px0 <- p[2] + p[4]       # proportion of non-exposed
px1 <- 1 - px0           # proportion of exposed
r0 <- p[2] / px0         # risk in the non-exposed
r1 <- p[1] / px1         # risk in the exposed
rr <- r1 / r0            # risk ratio, RR
rd <- r1 - r0            # risk difference, RD
or <- (p[1]*p[4]) / (p[2]*p[3])   # odds ratio, OR
p.crit <- step(or - 1)   # =1 if or >= 1, 0 otherwise

# calculate the total sample size
S <- sum(N[])
}

list(N = c(25, 15, 150, 210),
alpha = c(1, 1, 1, 1)) # uniform prior for p_ij

Iterations = 1001:26000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 25000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean       SD  Naive SE Time-series SE
deviance 18.81623 2.453005 1.097e-02      1.105e-02
or        2.43862 0.861082 3.851e-03      3.886e-03
p[1]      0.06436 0.012207 5.459e-05      5.459e-05
p[2]      0.03959 0.009694 4.335e-05      4.335e-05
p[3]      0.37378 0.024005 1.074e-04      1.078e-04
p[4]      0.52227 0.024829 1.110e-04      1.110e-04
p.crit    0.99372 0.078998 3.533e-04      3.630e-04
r0        0.07046 0.016980 7.594e-05      7.593e-05
r1        0.14689 0.026570 1.188e-04      1.188e-04
rd        0.07643 0.031454 1.407e-04      1.407e-04
rr        2.21362 0.703480 3.146e-03      3.172e-03

2. Quantiles for each variable:

2.5%      25%      50%      75%    97.5%
deviance 16.04000 17.02000 18.19000 19.92000 25.11000
or        1.18700  1.83300  2.29700  2.88600  4.52103
p[1]      0.04272  0.05571  0.06368  0.07215  0.09039
p[2]      0.02285  0.03268  0.03880  0.04563  0.06050
p[3]      0.32720  0.35750  0.37360  0.38980  0.42150
p[4]      0.47350  0.50560  0.52240  0.53890  0.57060
p.crit    1.00000  1.00000  1.00000  1.00000  1.00000
r0        0.04094  0.05839  0.06914  0.08114  0.10730
r1        0.09882  0.12820  0.14570  0.16410  0.20270
rd        0.01569  0.05514  0.07586  0.09718  0.13970
rr        1.16700  1.71900  2.10500  2.58700  3.90200

Iterations = 1001:26000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 25000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean       SD Naive SE Time-series SE
deviance 86.75869 10.43878 4.67e-02       4.67e-02
or        1.41233  0.48014 2.15e-03       2.15e-03
p[1]      0.01854  0.00361 1.61e-05       1.61e-05
p[2]      0.01141  0.00283 1.27e-05       1.28e-05
p[3]      0.53491  0.01330 5.95e-05       5.92e-05
p[4]      0.43514  0.01325 5.93e-05       5.96e-05
p.crit    0.81588  0.38759 1.73e-03       1.74e-03
r0        0.02554  0.00630 2.82e-05       2.86e-05
r1        0.03350  0.00646 2.89e-05       2.89e-05
rd        0.00796  0.00901 4.03e-05       4.03e-05
rr        1.39672  0.46134 2.06e-03       2.06e-03

2. Quantiles for each variable:

2.5%      25%      50%     75%    97.5%
deviance 67.64000 79.50000 86.26000 93.5700 108.5000
or        0.70919  1.07600  1.33600  1.6600   2.5680
p[1]      0.01216  0.01598  0.01833  0.0208   0.0262
p[2]      0.00654  0.00940  0.01118  0.0132   0.0176
p[3]      0.50880  0.52600  0.53480  0.5439   0.5611
p[4]      0.40930  0.42620  0.43510  0.4440   0.4613
p.crit    0.00000  1.00000  1.00000  1.0000   1.0000
r0        0.01473  0.02106  0.02504  0.0294   0.0393
r1        0.02205  0.02894  0.03313  0.0376   0.0472
rd       -0.01009  0.00205  0.00804  0.0140   0.0255
rr        0.71649  1.07400  1.32400  1.6360   2.5040

## 85.4 把不同實驗設計的數據用貝葉斯模型連接起來

### 85.4.1 Linking sub-models throug common parameters

model{
# Data model for unexposed group
Y0c ~ dbin(r0, X0c)
logit(r0) <- lr0

# Data model for exposed group
Y1c ~ dbin(r1, X1c)
logit(r1) <- lr1

# lor is log(OR)
lr1 <- lr0 + lor

# Prior for logit of unexposed risk
lr0 ~ dnorm(0, 0.3)
# Prior for log(OR)
lor ~ dnorm(0, 0.33)
}

model{
# Data model for controls
X0cc ~ dbin(p0, Y0cc)
logit(p0) <- lp0

# Data model for cases
X1cc ~ dbin(p1, Y1cc)
logit(p1) <- lp1

# lor is log(OR)
lp1 <- lp0 + lor

# Prior for logit of probability of exposure for controls
lp0 ~ dnorm(0, 0.3)
# Prior for log(OR)
lor ~ dnorm(0, 0.33)
}

model{
# Cohort sub-model
# Data model for unexposed group
Y0c ~ dbin(r0, X0c)
logit(r0) <- lr0

# Data model for exposed group
Y1c ~ dbin(r1, X1c)
logit(r1) <- lr1

# lor is log(OR)
lr1 <- lr0 + lor

# Prior for logit of unexposed risk
lr0 ~ dnorm(0, 0.3)
# Case-control sub-model
# Data model for controls
X0cc ~ dbin(p0, Y0cc)
logit(p0) <- lp0

# Data model for cases
X1cc ~ dbin(p1, Y1cc)
logit(p1) <- lp1

# lor is log(OR)
lp1 <- lp0 + lor

# Prior for logit of probability of exposure for controls
lp0 ~ dnorm(0, 0.3)

# Prior for common log(OR)
lor ~ dnorm(0, 0.33)
}

## 85.5 Practical Bayesian Statistics 06

### 85.5.1 The GREAT Trial

• 治療組：當患者被家庭醫生 (General Practitioners) 發現心肌梗塞時(myocardial infarctin, MI)，在家中立刻給予阿尼普酶藥物，等到患者被救護車帶到醫院以後給患者服用安慰劑。
• 對照組：同樣情況下，家庭醫生在家中發現患者有心肌梗塞時先不給予藥物治療，而是先讓患者服用安慰劑，等救護車帶患者抵達醫院之後再讓患者服用真正的阿尼普酶藥物。

• 治療組163人，13例死亡；
• 對照組148人，23例死亡。

model {
for (i in 1:2) {
deaths[i] ~ dbin(p[i],n[i])
logit(p[i]) <- alpha + beta*treat[i]
}
alpha ~ dunif(-100,100)
beta ~ dunif(-100,100)
OR <- exp(beta)
}

treat[i] 是一個指示变量 (indicator variable)，i = 1 時表示對照組，i = 0 時表示治療組；
death[i] 是第 i 組對象的死亡病例數；
n[i]     是第 i 組對象的總人數；
p[i]     是第 i 組試驗組患者死亡的概率。
• 解釋上面模型語句中 alpha, beta 的含義是什麼。

alpha 是治療組的對數比值 (log odds for the treatment arm)
beta 是對照組和治療組相比較的死亡比值比 (log odds ratio of death in the control arm compared to the treatment arm)，當 beta 大於1時，說明對照組死亡比值高，結果對治療組有利。

• 用兩組起始值文件來跑這個貝葉斯模型。
# Step 1 check model
modelCheck("backupfiles/great-model1.txt")
modelData("backupfiles/great-data.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(alpha = 1, beta = -1)
modelInits(bugsData(initlist))
initlist1 <- list(alpha = -1, beta = 1)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "alpha", "beta", "p")
samplesSet(parameters)

# Generate 1000 iterations
modelUpdate(1000)

# Check trace history for the first 1000 run
samplesHistory("*", mfrow = c(5,1), ask = FALSE)
postsamples <- buildMCMC("*")
gelman.diag(postsamples)
## Potential scale reduction factors:
##
##       Point est. Upper C.I.
## OR          1.01       1.03
## alpha       1.01       1.06
## beta        1.01       1.04
## p[1]        1.02       1.08
## p[2]        1.00       1.00
##
## Multivariate psrf
##
## 1.01
gelman.plot(postsamples)

• 記錄這時候我們獲得的比值比 OR，迴歸係數，及兩組實驗組的事後死亡概率。
# Generate 50000 iterations
modelUpdate(25000)
## 25000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##           mean      sd  MC_error val2.5pc   median val97.5pc start sample
## OR     2.31600 0.92180 0.0089740  1.06100  2.14400    4.5740  1001  50000
## alpha -2.48000 0.29400 0.0028080 -3.09800 -2.46500   -1.9410  1001  50000
## beta   0.76930 0.37200 0.0035940  0.05961  0.76250    1.5200  1001  50000
## p[1]   0.07982 0.02104 0.0001969  0.04321  0.07837    0.1255  1001  50000
## p[2]   0.15540 0.02958 0.0001403  0.10200  0.15370    0.2182  1001  50000
• 現在我們把先驗概率分佈修改成：

\begin{aligned} \alpha & \sim \text{Logistic}(0,1) \\ \beta & \sim \text{Normal}(0, 0.33)\; 0.33 \text{ is the precision}\\ \end{aligned}

# Step 1 check model
modelCheck("backupfiles/great-model1_alt.txt") 
## model is syntactically correct
# Load the data
modelData("backupfiles/great-data.txt")     
## data loaded
# compile the model with two separate chains
modelCompile(numChains = 2) 
## model compiled
# generate initial values
# the choice is arbitrary
initlist <- list(alpha = 1, beta = -1)
modelInits(bugsData(initlist))
## Initializing chain 1:
## initial values loaded and chain initialized but another chain contain uninitialized variables
initlist1 <- list(alpha = -1, beta = 1)
modelInits(bugsData(initlist1))
## Initializing chain 2:
## model is initialized
# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "alpha", "beta", "p")
samplesSet(parameters)
## monitor set for variable 'OR'
## monitor set for variable 'alpha'
## monitor set for variable 'beta'
## monitor set for variable 'p'
# Generate 1000 iterations
modelUpdate(1000)
## 1000 updates took 0 s
## Potential scale reduction factors:
##
##       Point est. Upper C.I.
## OR         1.002      1.005
## alpha      1.005      1.008
## beta       1.000      1.003
## p[1]       1.016      1.023
## p[2]       0.999      0.999
##
## Multivariate psrf
##
## 1.01
# Generate 50000 iterations
modelUpdate(25000)
## 25000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##          mean      sd  MC_error val2.5pc   median val97.5pc start sample
## OR     2.0710 0.77560 0.0102800  0.97510  1.93600    3.9790  1001  50000
## alpha -2.3860 0.27890 0.0043500 -2.96700 -2.37600   -1.8720  1001  50000
## beta   0.6639 0.35670 0.0047540 -0.02521  0.66040    1.3810  1001  50000
## p[1]   0.0867 0.02167 0.0003371  0.04892  0.08504    0.1333  1001  50000
## p[2]   0.1539 0.02925 0.0001229  0.10130  0.15250    0.2151  1001  50000
• 對改變了先驗概率前後的貝葉斯分析結果，你有什麼看法？

• 爲了分析改變先驗概率分佈前後到底哪種給予了模型更多的信息，我們可以把兩個模型改寫成不含數據，只有預測模型的語句，從而可以看到只有先驗概率分佈時的結果是怎樣的。這時候我們把數據加載這一步省略掉，然後對 alpha, beta, p 進行軌跡檢測，繪製他們的預測分佈密度曲線。
# Step 1 check model
modelCheck("backupfiles/great-model1_forward.txt")
modelData("backupfiles/great-forward.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(alpha = 1, beta = -1)
modelInits(bugsData(initlist))
initlist1 <- list(alpha = -1, beta = 1)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("alpha", "beta", "p")
samplesSet(parameters)

# Generate 10000 iterations
modelUpdate(26000)

# Density plot
samplesDensity("*", mfrow = c(2,2), ask = FALSE)

# Step 1 check model
modelCheck("backupfiles/great-model1_altforward.txt")
modelData("backupfiles/great-forward.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(alpha = 1, beta = -1)
modelInits(bugsData(initlist))
initlist1 <- list(alpha = -1, beta = 1)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("alpha", "beta", "p")
samplesSet(parameters)

# Generate 10000 iterations
modelUpdate(26000)

# Density plot
samplesDensity("*", mfrow = c(2,2), ask = FALSE)

1. 作爲替代方案，我們可以把上述模型重新用對數比值比(Log odds ratio, LOR)作爲未知參數來重新建模：

$LOR = \log(\frac{p[1]/(1 - p[1])}{p[2]/(1 - p[2])}) = \text{logit}(p[1]) - \text{logit}(p[2])$

model {
for (i in 1:2) {
deaths[i] ~ dbin(p[i],n[i])
}
logit(p[1]) <- logit(p[2]) + LOR
p[2] ~ dbeta(1,1)
LOR ~ dnorm(0,0.33)
OR <- exp(LOR)
}
• 再看一遍上面寫好的模型，確定你能夠理解其含義，請確認先驗概率分布的實際意義。

# Step 1 check model
modelCheck("backupfiles/great-model2.txt") 
## model is syntactically correct
# Load the data
modelData("backupfiles/great-data-alt.txt")     
## data loaded
# compile the model with two separate chains
modelCompile(numChains = 2) 
## model compiled
# generate initial values
# the choice is arbitrary
initlist <- list(LOR = 0.5, p = c(NA, 0.2))
modelInits(bugsData(initlist))
## Initializing chain 1:
## initial values loaded and chain initialized but another chain contain uninitialized variables
initlist1 <- list(LOR = 5, p = c(NA, 0.8))
modelInits(bugsData(initlist1))
## Initializing chain 2:
## model is initialized
# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "p")
samplesSet(parameters)
## monitor set for variable 'OR'
## monitor set for variable 'p'
# Generate 10000 iterations
modelUpdate(26000)
## 26000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##        mean      sd  MC_error val2.5pc  median val97.5pc start sample
## OR   2.0790 0.77960 0.0073820  0.97320 1.94300    3.9890  1001  50000
## p[1] 0.1539 0.02947 0.0001353  0.10060 0.15230    0.2162  1001  50000
## p[2] 0.0864 0.02153 0.0001985  0.04896 0.08474    0.1331  1001  50000
• 修改模型的代碼使之用於向前採集樣本用於預測：
model {
#   for (i in 1:2) {
#       deaths[i] ~ dbin(p[i],n[i])
#   }
logit(p[1]) <- logit(p[2]) + LOR
p[2] ~ dbeta(1,1)
LOR ~ dnorm(0,0.33)
OR <- exp(LOR)
}
# Step 1 check model
modelCheck("backupfiles/great-model2_forward.txt") 
## model is syntactically correct
# Load the data
# modelData("backupfiles/great-forward.txt")
# compile the model with two separate chains
modelCompile(numChains = 2) 
## model compiled
# generate initial values
# the choice is arbitrary
initlist <- list(LOR = 0.5, p = c(NA, 0.2))
modelInits(bugsData(initlist))
## Initializing chain 1:
## initial values loaded and chain initialized but another chain contain uninitialized variables
initlist1 <- list(LOR = 5, p = c(NA, 0.8))
modelInits(bugsData(initlist1))
## Initializing chain 2:
## model is initialized
# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "p")
samplesSet(parameters)
## monitor set for variable 'OR'
## monitor set for variable 'p'
# Generate 10000 iterations
modelUpdate(26000)
## 26000 updates took 0 s
# Density plot the first 1000 run
samplesDensity("p", mfrow = c(1,2), ask = FALSE)

### 85.5.2 吸煙與癌症

#### 85.5.2.1 隊列研究設計

model{
# Data model for non-smokers
Y0c ~ dbin(r0, X0c)
logit(r0) <- lr0

# Data model for smokers
Y1c ~ dbin(r1, X1c)
logit(r1) <- lr1

lr1 <- lr0 + lor   # lor is log(OR)
OR <- exp(lor)  # comparison statistic

# Priors
lr0 ~ dnorm(0, 0.3)  # priors for logit of non-smokers
lor ~ dnorm(0, 0.33)  # prior for log(OR)
}

# Y0c number of non-smokers developed cancer
# X0c number of nonpsmokers
# Y1c number of smokers developed cancer
# X1c number of smokers 

list(X0c = 2000,
Y0c = 100,
X1c = 1000,
Y1c = 150) 

list(lr0 = -1, lor = 0)
list(lr0 = -5, lor = 5)

# Step 1 check model
modelCheck("backupfiles/cohort-model.txt")
modelData("backupfiles/dataforcohort.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(lr0 = -1, lor = 0)
modelInits(bugsData(initlist))
initlist1 <- list(lr0 = -5, lor = 5)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "lor")
samplesSet(parameters)

# Generate 1000 iterations
modelUpdate(1000)

# Check trace history for the first 1000 run
samplesHistory("*", mfrow = c(2,1), ask = FALSE)
## Potential scale reduction factors:
##
##     Point est. Upper C.I.
## OR       0.999      0.999
## lor      0.999      0.999
##
## Multivariate psrf
##
## 1
# Generate 50000 iterations
modelUpdate(25000)
## 25000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##      mean     sd MC_error val2.5pc median val97.5pc start sample
## OR  3.335 0.4513 0.003901   2.5360  3.304     4.302  1001  50000
## lor 1.195 0.1347 0.001163   0.9306  1.195     1.459  1001  50000

#### 85.5.2.2 病例對照研究設計下的模型

• 病例組中600人過去20年中有過吸煙史；
• 對照組中800人過去20年中有過吸煙史。

model{
# Data model for non-cancer controls
X0cc ~ dbin(p0, Y0cc)
logit(p0) <- lp0

# Data model for cancer cases
X1cc ~ dbin(p1, Y1cc)
logit(p1) <- lp1

lp1 <- lp0 + lor # lor is log(OR)
OR <- exp(lor)  # comparison statistic

# Priors
lp0 ~ dnorm(0, 0.3)  # prior for logit of probability of exposure for controls
lor ~ dnorm(0, 0.33) # prior for log(OR)
}

# X0cc indicates number of smokers among controls (without cancer)
# Y0cc indicates number of controls
# X1cc is the number of smokers among cancer cases
# Y1cc is the number of cancer cases 

• X0cc 表示對照組中有吸煙史的人數；
• Y0cc 表示對照組的總人數；
• X1cc 表示癌症患者組中有吸煙史的人數；
• Y1cc 表示癌症患者組的總人數。

list(X0cc = 800,
Y0cc = 2000,
X1cc = 600,
Y1cc = 1000)

# Step 1 check model
modelCheck("backupfiles/case-control-model.txt")
modelData("backupfiles/dataforcasecontrol.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(lp0 = -2, lor = 0)
modelInits(bugsData(initlist))
initlist1 <- list(lp0 = 2, lor = 5)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "lor")
samplesSet(parameters)

# Generate 1000 iterations
modelUpdate(1000)

# Check trace history for the first 1000 run
samplesHistory("*", mfrow = c(2,1), ask = FALSE)
## Potential scale reduction factors:
##
##     Point est. Upper C.I.
## OR           1          1
## lor          1          1
##
## Multivariate psrf
##
## 1
# Generate 50000 iterations
modelUpdate(25000)
## 25000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##       mean      sd  MC_error val2.5pc median val97.5pc start sample
## OR  2.2530 0.17780 0.0011660   1.9260 2.2450    2.6210  1001  50000
## lor 0.8092 0.07875 0.0005161   0.6555 0.8088    0.9636  1001  50000

#### 85.5.2.3 聯合模型 joint model

# Joint model

model{
# cohort sub-model
Y0c ~ dbin(r0, X0c) # data model for non-smokers
logit(r0) <- lr0
Y1c ~ dbin(r1, X1c) # data model for smokers
logit(r1) <- lr1
lr1 <- lr0 + lor # lor is log(OR)
# prior for cohort sub-model
lr0 ~ dnorm(0, 0.3) # prior for logOdds of nonsmokers

# case-control sub-model
X0cc ~ dbin(p0, Y0cc) # data model for non-cancer controls
logit(p0) <- lp0
X1cc ~ dbin(p1, Y1cc) # data model for cancer cases
logit(p1) <- lp1
lp1 <- lp0 + lor # lor is log(OR)
# prior for case-control sub-model
lp0 ~ dnorm(0, 0.3) # prior for logOdds of exposure for controls

# Common code
lor ~ dnorm(0, 0.33) # prior for common log(OR)
OR <- exp(lor) # comparison statistic
}

list(X0c  = 2000,
Y0c  = 100,
X1c  = 1000,
Y1c  = 150,
X0cc = 800,
Y0cc = 2000,
X1cc = 600,
Y1cc = 1000
)

# Step 1 check model
modelCheck("backupfiles/jointmodel.txt")
modelData("backupfiles/dataforjoint.txt")
# compile the model with two separate chains
modelCompile(numChains = 2)
# generate initial values
# the choice is arbitrary
initlist <- list(lr0 = -1, lp0 = -2, lor = 0)
modelInits(bugsData(initlist))
initlist1 <- list(lr0 = -5, lp0 = 2, lor = 5)
modelInits(bugsData(initlist1))

# Set monitors on nodes of interest#### SPECIFY, WHICH PARAMETERS TO TRACE:
parameters <- c("OR", "lor")
samplesSet(parameters)

# Generate 1000 iterations
modelUpdate(1000)

# Check trace history for the first 1000 run
samplesHistory("*", mfrow = c(2,1), ask = FALSE)
## Potential scale reduction factors:
##
##     Point est. Upper C.I.
## OR           1          1
## lor          1          1
##
## Multivariate psrf
##
## 1
# Generate 50000 iterations
modelUpdate(25000)
## 25000 updates took 0 s
# Summary Statistics
sample.statistics <- samplesStats("*", beg = 1001)
print(sample.statistics)
##       mean      sd  MC_error val2.5pc median val97.5pc start sample
## OR  2.4930 0.17070 0.0009080   2.1760 2.4870     2.844  1001  50000
## lor 0.9111 0.06842 0.0003648   0.7777 0.9111     1.045  1001  50000

#### 85.5.2.4 你是否能證明兩個研究的比值比是相同的？

Cancer
Y N
Smoking Y S
N NS
C NC

$\text{OR}_{\text{cohort}} = \frac{\text{Pr}(C|S)\times\text{Pr}(NC|NS)}{\text{Pr}(C|NS)\times\text{Pr}(NC|S)}$

Smoking
Y N
Cancer Y C
N NC
S NS

$\text{OR}_{\text{case-control}} = \frac{\text{Pr}(S|C)\times\text{Pr}(NS|NC)}{\text{Pr}(S|NC)\times\text{Pr}(NS|C)}$

\begin{aligned} \text{OR}_{\text{cohort}} & = \frac{\text{Pr}(C|S)\times\text{Pr}(NC|NS)}{\text{Pr}(C|NS)\times\text{Pr}(NC|S)} \\ & = \frac{\frac{\text{Pr}(S|C)\text{Pr}(C)}{\text{Pr}(S)}\times\frac{\text{Pr}(NS|NC)\text{Pr}(NC)}{\text{Pr}(NS)}}{\frac{\text{Pr}(NS|C)\text{Pr}(C)}{\text{Pr}(NS)}\times\frac{\text{Pr}(S|NC)\text{Pr}(NC)}{\text{Pr}(S)}} \\ & = \frac{\text{Pr}(S|C)\times\text{Pr}(NS|NC)}{\text{Pr}(S|NC)\times\text{Pr}(NS|C)} \\ & = \text{OR}_{\text{case-control}} \end{aligned}