# 第 45 章 多參數模型 Introduction to multiparameter models

## 45.1 把不需要的噪音參數平均出去 Averaging over ‘nuisance parameters’

$y | \mu, \sigma^2 \sim N(\mu, \sigma^2)$

$E_{p(\theta | y)}[\theta] \approx \frac{1}{S}\sum_{S = 1}^S \theta^{(S)}$

$E_{p(\theta | y)}[\theta] \approx \frac{1}{S}\sum_{S = 1}^S \theta^{(S)} = \int\theta p(\theta | y)$

$p(\theta_1, \theta_2 | y) \propto p(y | \theta_1, \theta_2) p(\theta_1, \theta_2)$

$p(\theta_1 |y) \approx \frac{1}{S} \sum_{S = 1}^S p(\theta_1, \theta_2^{(S)} | y)$ 其中，$$\theta_2$$ 可以使用蒙特卡洛 (Monte Carlo) 過程從 $$p(\theta_2 | y)$$ 中隨機採集。當這個無數小的區間的面積無限趨近於零時，$$S \rightarrow 0$$，上面的方程式就變成了一個關於 $$\theta_2$$ 的積分方程式：

$p(\theta_1 |y) \approx \frac{1}{S} \sum_{S = 1}^S p(\theta_1, \theta_2^{(S)} | y) = \int p(\theta_1, \theta_2 | y) d\theta_2$

$p(\theta_1 | y) = \int p(\theta_1 |\theta_2 , y) p(\theta_2 | y) d\theta_2$

## 45.2 未知均值也未知方差的正（常）態分佈數據 normal data with unknown mean and variance

### 45.2.1 無信息先驗概率分佈 noninformative prior distribution

$p(\mu, \sigma^2) \propto \sigma^{-2}$

\begin{aligned} p(\mu, \sigma^2 | y) & \propto \sigma^{-2} \prod_{i =1}^n \frac{1}{\sqrt{2\pi} \sigma} \exp\left( -\frac{1}{2\sigma^2}(y_i - \mu)^2 \right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i =1}^n (y_i - \mu)^2 \right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i =1}^n (y_i^2 - 2y_i\mu + \mu^2) \right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\sum_{i =1}^n (y_i^2 - 2y_i\mu + \mu^2 -\bar{y}^2 + \bar{y}^2 - 2y_i\bar{y} + 2y_i\bar{y} ) \right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[\sum_{i =1}^n (y_i^2 - 2y_i\bar{y} + \bar{y}^2) + \sum_{i = 1}^n(\mu^2 - 2y_i\mu - \bar{y}^2 + 2y_i\bar{y}) \right]\right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[\sum_{i =1}^n (y_i^2 - 2y_i\bar{y} + \bar{y}^2) + n(\mu^2 - 2\bar{y}\mu - \bar{y}^2 + 2\bar{y}^2) \right]\right) \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[\sum_{i =1}^n (y - \bar{y})^2 + n(\bar{y} - \mu)^2 \right]\right) \\ \color{darkred}{\text{Where } \bar{y}} & \color{darkred}{= \frac{1}{n}\sum_{i = 1}^n y_i} \\ & = \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[(n - 1)s^2 + n(\bar{y} - \mu)^2 \right]\right) \\ \color{darkred}{\text{Where } s^2} & \color{darkred}{= \frac{1}{n - 1}\sum_{i = 1}^n (y_i - \bar{y})^2} \\ \end{aligned}

$p(\mu | y) = \int p(\mu, \sigma^2 | y) d\sigma^2$

$p(\sigma^2 | \mu) = \int p(\mu, \sigma^2 | y) d \mu$

\begin{aligned} p(\sigma^2 | y) & \propto \int p(\mu, \sigma^2 | y) d \mu \\ & \propto \int \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[(n - 1)s^2 + n(\bar{y} - \mu)^2 \right]\right) d\mu \\ \text{Move terms } & \text{does not include } \mu \text{ outside of the intergral } \\ & \propto \sigma^{-n - 2} \exp\left( -\frac{1}{2\sigma^2}(n - 1)s^2 \right) \int\exp\left( -\frac{n}{2\sigma^2}(\bar{y} - \mu)^2 \right) d\mu \\ \because \int \frac{1}{\sqrt{2\pi\sigma^2}} & \exp\left(- \frac{1}{2\sigma^2}(y - \theta) \right)d\theta = 1 \text{ integration of the pdf of a normal distribution} \\ & \propto \sigma^{-n - 2} \exp\left( -\frac{1}{2\sigma^2}(n - 1)s^2 \right) \times \sqrt{\frac{2\pi\sigma^2}{n}} \\ & \propto (\sigma^2)^{-(\frac{n -1}{2} + 1)} \exp\left( -\frac{(n -1 )s^2}{2\sigma^2} \right) \end{aligned} 你會發現這個邊際分佈的概率函數竟然成了一個逆卡方分佈：

$p(\sigma^2 | y) = \text{Inv-}\chi^2 (\sigma^2 | n - 1, s^2)$

\begin{aligned} \sigma^2 | y & \sim \text{Inv-}\chi^2(\sigma^2| n, v) \\ \text{Where } v & = \frac{1}{n}\sum_{i =1}^n(y_i - \theta)^2 \end{aligned}

\begin{aligned} \sigma^2 | y & \sim \text{Int-}\chi^2(n - 1, s^2) \\ \text{Where } s^2 & = \frac{1}{n -1}\sum_{i = 1}^n(y_i - \bar{y})^2 \end{aligned}

\begin{aligned} \text{The posterior} & \text{ joint distribution is:} \\ p(\mu, \sigma^2 | y) & = \color{darkred}{p(\mu | \sigma^2, y)} \color{darkgreen}{p(\sigma^2 | y)} \\ \color{darkgreen}{p(\sigma^2 | y)} & = \text{Inv-}\chi^2 (\sigma^2 | n-1, s^2) \\ (\sigma^2)^{(s)} & \sim \color{darkgreen}{p(\sigma^2 | y)} \\ \color{darkred}{p(\mu | \sigma^2, y)} & = N(\mu | \bar{y}, \sigma^2 /n) \color{grey}{\propto \exp\left( -\frac{n}{2\sigma^2}(\bar{y} - \mu)^2 \right)} \\ \mu^{(s)} & \sim \color{darkred}{p(\mu | \sigma^2, y)} \\ \mu^{(s)}, (\sigma^2)^{(s)} & \sim p(\mu, \sigma^2 | y) \end{aligned}

### 45.2.2 均值的事後邊際概率分佈 marginal posterior distribution of $$\mu$$

\begin{aligned} p(\mu | y) & = \int_0^\infty p(\mu, \sigma^2 |y) d\sigma^2 \\ & \propto \int_0^\infty \sigma^{-n - 2} \exp\left(-\frac{1}{2\sigma^2}\left[(n - 1)s^2 + n(\bar{y} - \mu)^2 \right]\right) d\sigma^2 \end{aligned}

$A = (n - 1)s^2 + n(\bar{y} - \mu)^2 \text{ and } z = \frac{A}{2\sigma^2}$

\begin{aligned} p(\mu | y) & \propto A^{-n/2} \int_0^\infty z^{(n - 2)/2} \exp(-z)dz \\ \color{gray}{\text{Recognize }}& \color{gray}{\text{Gamma intergral }\Gamma(u) = \int_0^\infty x^{u-1}\exp(-x)dx} \\ \color{gray}{\text{Although t}}& \color{gray}{\text{here are different power terms, }} \\ \color{gray}{\text{but we kno}}& \color{gray}{\text{w this intergral will give us a constant value.}} \\ \color{gray}{\text{therefore o}}& \color{gray}{\text{nly the } A \text{ part is left in the proporional portions}} \\ & \propto \left[ (n - 1)s^2 + n(\bar{y} - \mu)^2 \right]^{-n/2} \\ & \propto \left[ 1 + \frac{n(\bar{y} - \mu)^2}{(n - 1)s^2} \right]^{-n/2} \\ \Rightarrow p(\mu | y) & \propto t_{n-1}(\mu | \bar{y}, \frac{s^2}{n}) \color{gray}{\text{ is a Student's } t \text{ distribution}} \end{aligned} 我們成功推導出未知方差未知均值的正常態數據的事後概率分佈是一個 $$t$$ 分佈。

## 45.3 R 演示 正常態數據但均值方差均未知

• 一串可能是正常態分佈的數據
y <- c(93, 112, 122, 135, 122, 150, 118, 90, 124, 114)
• 該數據的充分統計量
n <- length(y)
s2 <- var(y)
my <- mean(y)
• 我們希望採集樣本的事後聯合分佈可以被理解為兩個條件分佈的乘積：

$p(\mu, \sigma^2 | y) = p(\sigma^2 | y) p(\mu | \sigma^2, y)$

# random sampler for scaled inverse chi-squared distribution
rsinvchisq <- function(n, nu, s2, ...) nu*s2 / rchisq(n, nu, ...)

# probabilityb density function for scaled inverse chi-squared distribution
dsinvchisq <- function(x, nu, s2){
exp(log(nu/2)*nu/2 - lgamma(nu/2) + log(s2)/2*nu - log(x)*(nu/2+1) - (nu*s2/2)/x)
}

# 上述概率密度函數其實是把縮放逆卡方分佈的概率密度函數本身先取對數，再作 exp 提高計算效率
• $$p(\sigma^s | y)$$ 中採集 1000 個樣本
ns <- 1000
sigma2  <- rsinvchisq(ns, n-1, s2)
• $$p(\mu | \sigma^2 , y)$$ 中採集樣本
mu <- rnorm(ns, my, sqrt(sigma2/n))
# or equivalently
# mu <- my + sqrt(sigma2/n)*rnorm(length(sigma2)
• 從預測分佈中採集樣本 $$p(\tilde{y} | y)$$
sigma <- sqrt(sigma2)
ynew <- rnorm(ns, mu, sigma)
• 為計算 sigma, mu, ynew 在一些範圍內的小區隔之間的密度準備
t1l <- c(90, 150)
t2l <- c(10, 60)
nl <- c(50, 185)
t1 <- seq(t1l[1], t1l[2], length.out = ns)
t2 <- seq(t2l[1], t2l[2], length.out = ns)
xynew <- seq(nl[1], nl[2], length.out = ns)
• 計算 $$\mu$$ 的精確邊際分佈密度 (the exact marginal density of $$p(\mu)$$)，記得它是一個自由度為 $$n-1$$ 的 t 分佈。
# multiplication by 1./sqrt(s2/n) is due to the transformation of
# variable z=(x-mean(y))/sqrt(s2/n)
pm <- dt((t1-my) / sqrt(s2/n), n-1) / sqrt(s2/n)
• 從採集的均值樣本推算以這些均值的高斯內核近似估計：(estimate the marginal density using samples and ad hoc Gaussian kernel approximation)
pmk <- density(mu, adjust = 2, n = ns, from = t1l[1], to = t1l[2])$y • 類似地，計算標準差本身的精確邊際分佈密度 (the marginal density of $$p(\sigma^2)$$)，它是一個逆縮放卡方分佈，自由度是 $$n-1$$，縮放尺度是 $$s^2$$ # the multiplication by 2*t2 is due to the transformation of # variable z=t2^2, see BDA3 p. 21 ps <- dsinvchisq(t2^2, n-1, s2) * 2*t2 • 計算採集得到的事後標準差樣本本身的高斯內核估計 (estimate the marginal density using samples and ad hoc Gaussian kernel approximation) psk <- density(sigma, n = ns, from = t2l[1], to = t2l[2])$y
• 計算精確的預測值本身的分佈密度 (compute the exact predictive density)
# multiplication by 1./sqrt(s2/n) is due to the transformation of variable
# see BDA3 p. 21
p_new <- dt((xynew-my) / sqrt(s2*(1+1/n)), n-1) / sqrt(s2*(1+1/n))
• 估計事後聯合分佈的密度。
# Combine grid points into another data frame
# with all pairwise combinations
dfj <- data.frame(t1 = rep(t1, each = length(t2)),
t2 = rep(t2, length(t1)))
dfj$z <- dsinvchisq(dfj$t2^2, n-1, s2) * 2*dfj$t2 * dnorm(dfj$t1, my, dfj$t2/sqrt(n)) # breaks for plotting the contours cl <- seq(1e-5, max(dfj$z), length.out = 6)

### 45.3.1 繪製聯合事後密度分佈及均值和方差各自的邊際分佈 visualise the joint and marginal densities

• 下面的代碼用於生成均值的邊際分佈
dfm <- data.frame(t1, Exact = pm, Empirical = pmk) %>% gather(grp, p, -t1)
margmu <- ggplot(dfm) +
geom_line(aes(t1, p, color = grp)) +
coord_cartesian(xlim = t1l) +
labs(title = 'Marginal of mu', x = '', y = '') +
scale_y_continuous(breaks = NULL) +
theme(legend.background = element_blank(),
legend.position = c(0.75, 0.8),
legend.title = element_blank())
• 下面的代碼用於生成標準差的邊際分佈
dfs <- data.frame(t2, Exact = ps, Empirical = psk) %>% gather(grp, p, -t2)
margsig <- ggplot(dfs) +
geom_line(aes(t2, p, color = grp)) +
coord_cartesian(xlim = t2l) +
coord_flip() +
labs(title = 'Marginal of sigma', x = '', y = '') +
scale_y_continuous(breaks = NULL) +
theme(legend.background = element_blank(),
legend.position = c(0.75, 0.8),
legend.title = element_blank())
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
• 下面的代碼用於繪製聯合事後分佈密度
joint1labs <- c('Samples','Exact contour')
joint1 <- ggplot() +
geom_point(data = data.frame(mu,sigma), aes(mu, sigma, col = '1'), size = 0.1) +
geom_contour(data = dfj, aes(t1, t2, z = z, col = '2'), breaks = cl) +
coord_cartesian(xlim = t1l,ylim = t2l) +
labs(title = 'Joint posterior', x = '', y = '') +
scale_y_continuous(labels = NULL) +
scale_x_continuous(labels = NULL) +
scale_color_manual(values=c('blue', 'black'), labels = joint1labs) +
guides(color = guide_legend(nrow  = 1, override.aes = list(
shape = c(16, NA), linetype = c(0, 1), size = c(2, 1)))) +
theme(legend.background = element_blank(),
legend.position = c(0.5, 0.9),
legend.title = element_blank())
• 合併上面三個圖
# blank plot for combining the plots
bp <- grid.rect(gp = gpar(col = 'white'))
grid.arrange(joint1, margsig, margmu, bp, nrow = 2)

### 45.3.2 單獨繪製邊際分佈的每一個部分

• 繪製另外一個聯合事後分佈橢圓等高線圖
# data frame for the conditional of mu and marginal of sigma
dfc <- data.frame(mu = t1, marg = rep(sigma[1], length(t1)),
cond = sigma[1] + dnorm(t1 ,my, sqrt(sigma2[1]/n)) * 100) %>%
gather(grp, p, marg, cond)
# legend labels for the following plot
joint2labs <- c('Exact contour plot', 'Sample from joint post.',
'Cond. distribution of mu', 'Sample from the marg. of sigma')
joint2 <- ggplot() +
geom_contour(data = dfj, aes(t1, t2, z = z, col = '1'), breaks = cl) +
geom_point(data = data.frame(m = mu[1], s = sigma[1]), aes(m , s, color = '2')) +
geom_line(data = dfc, aes(mu, p, color = grp), linetype = 'dashed') +
coord_cartesian(xlim = t1l,ylim = t2l) +
labs(title = 'Joint posterior', x = '', y = '') +
scale_x_continuous(labels = NULL) +
scale_color_manual(values=c('blue', 'darkgreen','darkgreen','black'), labels = joint2labs) +
guides(color = guide_legend(nrow  = 2, override.aes = list(
shape = c(NA, 16, NA, NA), linetype = c(1, 0, 1, 1)))) +
theme(legend.background = element_blank(),
legend.position = c(0.5, 0.85),
legend.title = element_blank())
• 繪製標準差的邊際密度函數
margsig2 <- ggplot(data = data.frame(t2, ps)) +
geom_line(aes(t2, ps), color = 'blue') +
coord_cartesian(xlim = t2l) +
coord_flip() +
labs(title = 'Marginal of sigma', x = '', y = '') +
scale_y_continuous(labels = NULL)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
• 合併兩個圖
grid.arrange(joint2, margsig2, ncol = 2)

### 45.3.3 繪製事後均值的概率密度分佈

• 計算每一個樣本的概率密度分佈
condpdfs <- sapply(t1, function(x) dnorm(x, my, sqrt(sigma2/n)))
• 繪製其中25個樣本
# data frame of 25 first samples
dfm25 <- data.frame(t1, t(condpdfs[1:25,])) %>% gather(grp, p, -t1)
condmu <- ggplot(data = dfm25) +
geom_line(aes(t1, p, group = grp), linetype = 'dashed') +
labs(title = 'Conditional distribution of mu for first 25 samples', y = '', x = '') +
scale_y_continuous(breaks = NULL)
• 繪製這些事後均值樣本的均值
dfsam <- data.frame(t1, colMeans(condpdfs), pm) %>% gather(grp,p,-t1)
# labels
mulabs <- c('avg of sampled conds', 'exact marginal of mu')
meanmu <- ggplot(data = dfsam) +
geom_line(aes(t1, p, size = grp, color = grp)) +
labs(y = '', x = 'mu') +
scale_y_continuous(breaks = NULL) +
scale_size_manual(values = c(2, 0.8), labels = mulabs) +
scale_color_manual(values = c('orange', 'black'), labels = mulabs) +
theme(legend.position = c(0.8, 0.8),
legend.background = element_blank(),
legend.title = element_blank())
• 合併
grid.arrange(condmu, meanmu, ncol = 1)

## 45.4 R 演示 分析二進制數據 (BDA3 P.74)

$(x_i, n_i, y_i); i = 1, \dots, k$ 其中，

• $$k$$ 是藥物/毒物劑量水平的分組；
• $$x_i$$ 是第 $$i$$ 劑量組的實際藥物濃度，通常會取對數；
• $$n_i$$ 是第 $$i$$ 劑量組的實驗動物數量；
• $$y_i$$ 是第 $$i$$ 劑量組的實驗動物死亡（或發現腫瘤的動物）數量。

df1 <- data.frame(
x = c(-0.86, -0.30, -0.05, 0.73),
n = c(5, 5, 5, 5),
y = c(0, 1, 3, 5)
)
df1
##       x n y
## 1 -0.86 5 0
## 2 -0.30 5 1
## 3 -0.05 5 3
## 4  0.73 5 5

$y_i|\theta_i \sim \text{Bin}(n_i, \theta_i)$

• $$\theta_i$$ 是第$$i$$劑量組實驗動物的死亡概率，該組劑量是 $$x_i$$

$$$\text{logit}(\theta_i) = \log(\frac{\theta_i}{1 - \theta_i})\\ \text{logit}(\theta_i) = \alpha + \beta x_i$$ \tag{45.1}$

• 繪製該數據的散點圖
ggplot(df1, aes(x=x, y=y)) +
geom_point(size=2, color='red') +
scale_x_continuous(breaks = df1x, minor_breaks=NULL, limits = c(-1.5, 1.5)) + scale_y_continuous(breaks = 0:5, minor_breaks=NULL) + labs(title = 'Bioassay', x = 'Dose (log g/ml)', y = 'Number of deaths') + theme(panel.grid.major = element_blank()) 根據該數據的模型函數 (Model (45.1))，我們可以寫下本數據的似然 likelihood： $$$p(y_i|\alpha, \beta) \propto \left[ \text{logit}^{-1}(\alpha + \beta x_i) \right]^{y_i}\left[ 1- \text{logit}^{-1}(\alpha + \beta x_i) \right]^{n_i - y_i}$$ \tag{45.2}$ 於是該模型的事後概率分佈為： \begin{aligned} p(\alpha, \beta | y) & \propto p(\alpha, \beta) p(y | \alpha, \beta) \\ & \propto p(\alpha, \beta) \prod_{i = 1}^k p(y | \alpha, \beta) \end{aligned} \tag{45.3} 為了便於計算，可推導似然函數 (Likelihood function (45.2)) 變成如下的對數似然 (log-likelihood)： $\ell_i \propto y_i(\alpha + \beta x_i) - n_i \log\left[ 1 + \exp(\alpha + \beta x_i) \right]$ - 在 R 裡設定好這個對數似然函數： logl <- function(df, a, b) df['y']*(a + b * df['x']) - df['n']*log1p(exp(a + b * df['x'])) • 設定參數 $$\alpha, \beta$$ 可能的範圍，然後準備好用於事後概率密度採樣使用的小格子 A = seq(-4, 8, length.out = 50) B = seq(-10, 40, length.out = 50) cA <- rep(A, each = length(B)) cB <- rep(B, length(A)) • 計算該數據 df1 的似然： p <- apply(df1, 1, logl, cA, cB) %>% rowSums() %>% exp() • 從這些小格子裡根據似然來採集樣本（with replacement） nsamp <- 1000 samp_indices <- sample(length(p), size = nsamp, replace = T, prob = p/sum(p)) samp_A <- cA[samp_indices[1:nsamp]] samp_B <- cB[samp_indices[1:nsamp]] # Add random jitter, see BDA3 p. 76 samp_A <- samp_A + runif(nsamp, (A[1] - A[2])/2, (A[2] - A[1])/2) samp_B <- samp_B + runif(nsamp, (B[1] - B[2])/2, (B[2] - B[1])/2) • 生成樣本數據 samps <- tibble(ind = 1:nsamp, alpha = samp_A, beta = samp_B) %>% mutate(ld50 = - alpha/beta) • 繪製採集的邏輯曲線 invlogit <- plogis xr <- seq(-1.5, 1.5, length.out = 100) dff <- pmap_df(samps[1:100,], ~ tibble(x = xr, id=..1, f = invlogit(..2 + ..3*x))) ppost <- ggplot(df1, aes(x=x, y=y/n)) + geom_line(data=dff, aes(x=x, y=f, group=id), linetype=1, color='blue', alpha=0.2) + geom_point(size=2, color='red') + scale_x_continuous(breaks = df1x, minor_breaks=NULL, limits = c(-1.5, 1.5)) +
scale_y_continuous(breaks = seq(0,1,length.out=3), minor_breaks=NULL) +
labs(title = 'Bioassay', x = 'Dose (log g/ml)', y = 'Proportion of deaths') +
theme(panel.grid.major = element_blank())

# add 50% death line and LD50 dots
ppost + geom_hline(yintercept = 0.5, linetype = 'dashed', color = 'gray') +
geom_point(data=samps[1:100,], aes(x=ld50, y=0.5), color='blue', alpha=0.2)
• 繪製事後概率密度圖
# limits for the plots
xl <- c(-2, 8)
yl <- c(-2, 40)
pos <- ggplot(data = data.frame(cA ,cB, p), aes(cA, cB)) +
geom_raster(aes(fill = p, alpha = p), interpolate = T) +
geom_contour(aes(z = p), colour = 'black', size = 0.2) +
coord_cartesian(xlim = xl, ylim = yl) +
labs(title = 'Posterior density evaluated in grid', x = 'alpha', y = 'beta') +
scale_fill_gradient(low = 'yellow', high = 'red', guide = F) +
scale_alpha(range = c(0, 1), guide = F)

sam <- ggplot(data = samps) +
geom_point(aes(alpha, beta), color = 'blue') +
coord_cartesian(xlim = xl, ylim = yl) +
labs(title = 'Posterior draws', x = 'alpha', y = 'beta')
grid.arrange(pos, sam, nrow=2)
• 繪製事後 LD50 的直方圖，這裡 LD50 的含義是，在多大的濃度時，動物死亡/發生腫瘤的概率會是50%。
his <- ggplot(data = samps) +
geom_histogram(aes(ld50), binwidth = 0.02,
fill = 'steelblue', color = 'black') +
coord_cartesian(xlim = c(-0.5, 0.5)) +
labs(x = 'LD50 = -alpha/beta')
his