# 第 18 章 多個參數時的統計推斷 Inference with multiple parameters I

## 18.1 多參數 multiple parameters - LRT

### 18.1.1 似然 likelihood

$L(\theta_1,\cdots,\theta_k | \underline{x}) = f(\underline{x} | \theta_1,\cdots,\theta_k) = \prod^n_{i=1}f(x_i|\theta_1,\cdots,\theta_k)$

$\ell(\theta_1,\cdots,\theta_k|\underline{x}) = \sum^n_{i=1}\text{log}f(x_1|\theta_1,\cdots,\theta_k)$

$\left\{ \begin{array}{c} \frac{\partial \ell}{\partial \theta_1} = \ell^\prime(\theta_1) = 0 \\ \frac{\partial \ell}{\partial \theta_2} = \ell^\prime(\theta_k) = 0 \\ \vdots \\ \frac{\partial \ell}{\partial \theta_k} = \ell^\prime(\theta_k) = 0 \\ \end{array} \right.$

• 這些連立方程有時被叫做 score equations
• $$\text{MLE}$$ 的恆定性，不變性 invariance 在多個參數時同樣適用。

• 當參數只有一個 $$\theta$$ 時，其 $$\text{MLE}$$ 的方差是 $$S^2=\left.-\frac{1}{\ell^{\prime\prime}(\theta)}\right\vert_{\theta=\hat{\theta}}$$
• 當參數有多個時，$$k$$$$\text{MLE}$$ 的方差是一個 $$k\times k$$ 的對稱矩陣，其中二次微分矩陣 (18.1) 的昵稱是海森矩陣 Hessian matrix

$$$\underline{\ell^{\prime\prime}(\theta)} = \left( \begin{array}{c} \frac{\partial^2\ell}{\partial\theta^2_1} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_1} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_1} \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_2} & \frac{\partial^2\ell}{\partial\theta^2_2} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_k} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_k} & \cdots & \frac{\partial^2\ell}{\partial\theta^2_k} \\ \end{array} \right)$$ \tag{18.1}$

$\Rightarrow \underline{\ell^{\prime\prime}(\theta)} |_{\color{red}{\theta=\hat\theta}} = \left( \begin{array}{c} \frac{\partial^2\ell}{\partial\theta^2_1} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_1} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_1} \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_2} & \frac{\partial^2\ell}{\partial\theta^2_2} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_k} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_k} & \cdots & \frac{\partial^2\ell}{\partial\theta^2_k} \\ \end{array} \right)_{\color{red}{\theta=\hat\theta}}$

$\Rightarrow \underline{\text{Var}(\hat\theta)} = - \left( \begin{array}{c} \frac{\partial^2\ell}{\partial\theta^2_1} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_1} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_1} \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_2} & \frac{\partial^2\ell}{\partial\theta^2_2} & \cdots & \frac{\partial^2\ell}{\partial\theta_k\partial\theta_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2\ell}{\partial\theta_1\partial\theta_k} & \frac{\partial^2\ell}{\partial\theta_2\partial\theta_k} & \cdots & \frac{\partial^2\ell}{\partial\theta^2_k} \\ \end{array} \right)^{\color{red}{-1}}_{\color{red}{\theta=\hat\theta}}$

### 18.1.2 對數似然比檢驗

\begin{aligned} & \text{H}_0: \underline{\theta} = \underline{\theta_0} \\ & \Rightarrow -2llr(\underline{\theta_0}) = -2(\ell(\underline{\theta_0})- \ell(\hat{\underline{\theta}})) \stackrel{\cdot}{\sim} \chi^2_r \\ & \text{Where } r \text{ is the number of parameters restricted under H}_0 \end{aligned}

## 18.2 多參數 Wald 檢驗 - Wald test

\begin{aligned} & \text{H}_0: \theta=\theta_0 \Rightarrow W_\theta = (\frac{M-\theta_0}{S})^2 \stackrel{\cdot}{\sim} \chi^2_1 \\ & \text{Where } M=\hat\theta, S^2=\left.-\frac{1}{\ell^{\prime\prime}(\theta)}\right\vert_{\theta=\hat{\theta}} \\ & \Rightarrow W=(\hat\theta-\theta_0)^2(-\ell^{\prime\prime}(\hat\theta)) \stackrel{\cdot}{\sim} \chi^2_1 \end{aligned}

• 我們可以先一個一個考慮參數：

\begin{aligned} & W_\lambda = (\hat\lambda-\lambda_0)^2(-\ell^{\prime\prime}(\hat\lambda)) \stackrel{\cdot}{\sim} \chi^2_1 \\ & W_\psi = (\hat\psi-\psi_0)^2(-\ell^{\prime\prime}(\hat\psi)) \stackrel{\cdot}{\sim} \chi^2_1 \\ & \Rightarrow W_\lambda + W_\psi \stackrel{\cdot}{\sim} \chi^2_2 \\ & \Rightarrow W = (\hat\lambda-\lambda_0)^2(-\ell^{\prime\prime}(\hat\lambda)) + (\hat\psi-\psi_0)^2(-\ell^{\prime\prime}(\hat\psi)) \stackrel{\cdot}{\sim} \chi^2_2 \end{aligned}

• 也可以一開始就兩個參數一起考慮：

$\underline{\ell^\prime} = \left( \begin{array}{c} \frac{\partial\ell}{\partial\lambda}\\ \frac{\partial\ell}{\partial\psi} \end{array} \right) \Rightarrow \underline{\ell^{\prime\prime}} = \left( \begin{array}{c} \frac{\partial^2\ell}{\partial\lambda^2} & \frac{\partial^2\ell}{\partial\lambda\partial\psi} \\ \frac{\partial^2\ell}{\partial\psi\partial\lambda} & \frac{\partial^2\ell}{\partial\psi^2} \end{array} \right)$

$(\hat\lambda-\lambda_0)^2+(\hat\psi-\psi_0)^2 = (\hat\lambda-\lambda_0, \hat\psi-\psi_0)\left( \begin{array}{c} \hat\lambda-\lambda_0 \\ \hat\psi-\psi_0 \end{array} \right)$

\begin{aligned} W = & (\hat\lambda-\lambda_0, \hat\psi-\psi_0)(-\underline{\ell^{\prime\prime}}(\hat\lambda,\hat\psi))\left( \begin{array}{c} \hat\lambda-\lambda_0 \\ \hat\psi-\psi_0 \end{array} \right) \\ = & - (\hat\lambda-\lambda_0, \hat\psi-\psi_0)\left( \begin{array}{c} \frac{\partial^2\ell}{\partial\lambda^2} & \frac{\partial^2\ell}{\partial\lambda\partial\psi} \\ \frac{\partial^2\ell}{\partial\psi\partial\lambda} & \frac{\partial^2\ell}{\partial\psi^2} \end{array} \right)_{\hat\lambda,\hat\psi} \left( \begin{array}{c} \hat\lambda-\lambda_0 \\ \hat\psi-\psi_0 \end{array} \right)\\ & \text{ Because } \lambda \text{ and } \psi \text{ are independent,} \\ & \text{ so their covariance } \frac{\partial^2\ell}{\partial\lambda\partial\psi} = \frac{\partial^2\ell}{\partial\psi\partial\lambda} = 0\\ \Rightarrow = & - (\hat\lambda-\lambda_0, \hat\psi-\psi_0)\left( \begin{array}{c} \ell^{\prime\prime}(\hat\lambda) & 0 \\ 0 & \ell^{\prime\prime}(\hat\psi) \end{array} \right) \left( \begin{array}{c} \hat\lambda-\lambda_0 \\ \hat\psi-\psi_0 \end{array} \right)\\ = & - (\hat\lambda-\lambda_0, \hat\psi-\psi_0)\left( \begin{array}{c} \ell^{\prime\prime}(\hat\lambda)(\hat\lambda-\lambda_0) \\ \ell^{\prime\prime}(\hat\psi)(\hat\psi-\psi_0) \end{array} \right) \\ = & (\hat\lambda-\lambda_0)^2(-\ell^{\prime\prime}(\hat\lambda)) + (\hat\psi-\psi_0)^2(-\ell^{\prime\prime}(\hat\psi)) \stackrel{\cdot}{\sim} \chi^2_2 \end{aligned}

$W = -(\hat{\underline{\theta}} - \underline{\theta_0})^T\underline{\ell^{\prime\prime}(\hat\theta)}(\underline{\hat\theta} - \underline{\theta_0}) \stackrel{\cdot}{\sim} \chi^2_k$

## 18.3 多參數 Score 檢驗 - Score test

$\text{H}_0: \theta=\theta_0 \text{ v.s. H}_1: \theta \neq \theta_0 \\ \frac{U^2}{V} \stackrel{\cdot}{\sim} \chi^2_1 \\ \text{Where } U=\ell^\prime(\theta_0), V=E[-\ell^{\prime\prime}(\theta_0)]$

$\underline{U}^T\underline{V}^{-1}\underline{U} \stackrel{\cdot}{\sim} \chi^2_k \\ \text{Where } \underline{U} = \left.\frac{\partial\ell}{\partial\underline{\theta}} \right\vert_{\underline{\theta}=\underline{\theta_0}}, \underline{V} = E[-\underline{\ell^{\prime\prime}(\theta)}]_{\underline{\theta}=\underline{\theta_0}}$

$(\frac{\partial\ell}{\partial\lambda}, \frac{\partial\ell}{\partial\psi})_{\lambda_0, \psi_0}\left( E\left[ -\left( \begin{array}{c} \frac{\partial^2\ell}{\partial\lambda^2} & \frac{\partial^2\ell}{\partial\lambda\partial\psi} \\ \frac{\partial^2\ell}{\partial\psi\partial\lambda} & \frac{\partial^2\ell}{\partial\psi^2} \end{array} \right)_{\lambda_0,\psi_0} \right] \right)^{-1}\left( \begin{array}{c} \frac{\partial\ell}{\partial\lambda}\\ \frac{\partial\ell}{\partial\psi} \end{array} \right)_{\lambda_0,\psi_0} \stackrel{\cdot}{\sim} \chi^2_2$

## 18.4 條件似然 conditional likelihood

$K_0 \sim Po(\mu_0); K_1 \sim Po(\mu_1) ; \text{ where } \mu_0 = \lambda_0 p_0 \mu_1 = \lambda_1 p_1\\ k=k_0 + k_1 \Rightarrow K_0+K_1 \sim Po(\mu_0 + \mu_1)$

\begin{aligned} & \text{Prob}(k_0 \text{events in group 0} | k \text{ events in total}) \\ = & \frac{\text{Prob}(k_0 \text{ events in group }0 \text{ and } k-k_0 \text{ events in group } 1)} {\text{Prob}(k \text{ events in total})} \\ \end{aligned} \tag{18.2}

\begin{aligned} \text{Prob}(k &\text{ events in total}) \\ = & \frac{(\lambda_0 p_0 + \lambda_1 p_1)^k e^{-(\lambda_0 p_0 + \lambda_1 p_1)}}{k!} \\ \text{Prob}(k_0 &\text{ events in group }0 \text{ and } k-k_0 \text{ events in group } 1) \\ = & \frac{(\lambda_0 p_0)^{k_0}e^{-\lambda_0 p_0}}{k_0!}\times\frac{(\lambda_1 p_1)^{k-k_0}e^{-\lambda_1 p_1}}{(k-k_0)!} \end{aligned}

\begin{aligned} & \frac{\frac{(\lambda_0 p_0)^{k_0}e^{-\lambda_0 p_0}}{k_0!}\times\frac{(\lambda_1 p_1)^{k-k_0}e^{-\lambda_1 p_1}}{(k-k_0)!}} {\frac{(\lambda_0 p_0 + \lambda_1 p_1)^k e^{-(\lambda_0 p_0 + \lambda_1 p_1)}}{k!}} \\ = & \frac{e^{-(\lambda_0 p_0 + \lambda_1 p_1)}(\lambda_0 p_0)^{k_0}(\lambda_1 p_1)^{k-k_0}\cdot k!}{e^{-(\lambda_0 p_0 + \lambda_1 p_1)}(\lambda_0p_0+\lambda_1p_1)^k\cdot k_0!\cdot (k-k_0)!}\\ = & (\frac{\lambda_0 p_0}{\lambda_0 p_0+\lambda_1 p_1})^{k_0}(\frac{\lambda_1 p_1}{\lambda_0 p_0+\lambda_1 p_1})^{k-k_0}\cdot\frac{k!}{k_0!(k-k_0)!} \\ = & (\pi)^{k_0}(1-\pi)^{k-k_0}\cdot\frac{k!}{k_0!(k-k_0)!} \\ \text{Where } & \pi = \frac{\lambda_0 p_0}{\lambda_0 p_0 + \lambda_1 p_1} = \frac{p_0}{p_0+(\lambda_1/\lambda_0)p_1} = \frac{p_0}{p_0+\theta p_1}\\ \Rightarrow &\text{ Given } K_0+K_1=K, K_0 \sim Bin(k, \pi=\frac{p_0}{p_0+\theta p_1}) \end{aligned}

\begin{aligned} L(\pi) & = (\pi)^{k_0}(1-\pi)^{k-k_0} \\ \Rightarrow \ell(\pi) & = k_0 \text{log}\pi + (k-k_0)\text{log} (1-\pi) \\ \text{Because } \pi & = \frac{p_0}{p_0+\theta p_1} \\ \ell_c(\theta) & = k_0 \text{log}(\frac{\pi}{1-\pi}) + k\text{log}(1-\pi) \\ & = k_0 \text{log}(\frac{p_0}{\theta p_1}) + k\text{log}(\frac{\theta p_1}{p_0 + \theta p_1}) \\ \text{Ignoring} & \text{ terms not involving } \theta \\ \ell_c(\theta)& = k_1 \text{log}\theta - k\text{log}(p_0 + \theta p_1) \end{aligned} \tag{18.3}

1. 推導出的條件對數似然是一個真實的以觀察數據爲條件的對數似然，可以用於假設檢驗；
2. 條件似然過程依賴於我們能否找到這樣一個“條件似然”，使得模型的對數似然只取決於我們關心的參數，我們幸運地找到了發生率比的對數似然方程，但是至今沒有人找到發生率差 $$\lambda_1-\lambda_0$$ 的條件對數似然
3. 與此相對地是，下一章介紹的子集似然函數 (profile likelihood)，可以用於幾乎所有的多參數模型的假設檢驗之構建；
4. 但是，條件對數似然相當之重要，特別是它作爲 Cox proportional hazard model 模型的基本模型構架在生存分析 (survival analysis) 中的應用，以及在配對病例對照分析 (matched case-control study) 中用於條件邏輯迴歸 (conditional logistic regression) 的理論基礎 (將會在第二學期的碩士課程中介紹，敬請期待)。

## 18.5 練習

1. 如果需要檢驗的零假設是 $$\text{H}_0:$$ 有心臟病史的男性發病率的對數等於 $$-3$$，無心臟病史的男性發病率的對數等於 $$-4.5$$。請推導該實驗的聯合對數似然比檢驗，Wald 檢驗兩種檢驗法的檢驗統計量，並進行假設檢驗。

• 模型：

$K_i \sim \text{Poisson}(\mu_i); \mu_i = \lambda_i p_i\\ \text{Where } \lambda_i \text{ is the rate parameter in group } i, \\ p_i \text{ is the person-years at risk in group }i \\$

• 數據：

$k_0 = 52, p_0 = 4862; k_1 = 25, p_1 = 512$

$\ell(\lambda | \text{data}) = -\lambda p + k \text{log} \lambda$

$$\psi = \text{log} \lambda$$ 有：

$\ell(\psi) = k \psi - e^\psi p$

$$\psi_0 = \text{log}\lambda_0; \psi_1 = \text{log}\lambda_1$$，那麼本題中的假設檢驗可以寫成是：

$\text{H}_0: {\psi_0}_0 = -4.5, {\psi_1}_0 = -3 \text{ v.s. H}_1: {\psi_0}_0 \neq -4.5 \text{ or } {\psi_1}_0 \neq -3$

1. 對數似然比檢驗需要尋找的檢驗統計量是 $$-2llr({\psi_0}_0,{\psi_1}_0)$$，其中：

$llr({\psi_0}_0,{\psi_1}_0) = \ell({\psi_0}_0,{\psi_1}_0) - \ell(\hat\psi_0,\hat\psi_1)$

$$$\ell(\psi_0, \psi_1) = k_0 \psi_0 - e^{\psi_0} p_0 + k_1 \psi_1 - e^{\psi_1} p_1$$ \tag{18.4}$

$\Rightarrow \frac{\partial\ell}{\partial\psi_0} = k_0 - e^{\psi_0}p_0 \\ \text{and} \\ \frac{\partial\ell}{\partial\psi_1} = k_1 - e^{\psi_1}p_1$

\begin{aligned} \frac{\partial\ell}{\partial{\psi}_0} & = 0 \\ \Rightarrow e^{{\hat\psi}_0} & = \frac{k_0}{p_0} \\ \Rightarrow {\hat\psi}_0 & = \text{log}(\frac{k_0}{p_0}) \\ \text{And similarly } {\hat\psi}_1 & = \text{log}(\frac{k_1}{p_1}) \end{aligned}

\begin{aligned} \ell({\psi_0}_0,{\psi_1}_0) & = 52\times(-4.5) - e^{-4.5}\times4862+25\times(-3)-e^{-3}\times512 \\ & = -388.5029 \\ \ell(\hat\psi_0,\hat\psi_1) & = 52\times\text{log}\frac{52}{4862} - e^{\text{log}\frac{52}{4862}}\times4862 + 25\times\text{log}\frac{25}{512} - e^{\text{log}\frac{25}{512}}\times512 \\ & = 52\times\text{log}\frac{52}{4862} - 52 + 25\times\text{log}\frac{25}{512} - 25 \\ & = -388.4602 \\ \Rightarrow llr({\psi_0}_0,{\psi_1}_0) & = -388.5029 - (-388.4602) = - 0.0427 \\ \Rightarrow -2llr({\psi_0}_0,{\psi_1}_0) & = 0.0854 \end{aligned}

1. Wald 檢驗時我們需要的檢驗統計量爲：

$W = (\hat\psi_0-{\psi_0}_0, \hat\psi_1-{\psi_1}_0)(-\underline{\ell^{\prime\prime}}(\hat\psi_0,\hat\psi_1))\left( \begin{array}{c} \hat\psi_0-{\psi_0}_0 \\ \hat\psi_1-{\psi_1}_0 \end{array} \right)$

\begin{aligned} \underline{\ell^\prime}(\psi_0, \psi_1) & = \left( \begin{array}{c} k_0 - e^{\psi_0}p_0 \\ k_1 - e^{\psi_1}p_1 \end{array} \right) \\ \Rightarrow \underline{\ell^{\prime\prime}}(\psi_0,\psi_1) & = \left( \begin{array}{c} \frac{\partial^2\ell}{\partial\psi^2_0} & \frac{\partial^2\ell}{\partial\psi_1\partial\psi_0} \\ \frac{\partial^2\ell}{\partial\psi_0\partial\psi_1} & \frac{\partial^2\ell}{\partial\psi^2_1} \end{array} \right) = \left( \begin{array}{c} -e^{\psi_0}p_0 & 0\\ 0 & -e^{\psi_1}p_1 \end{array} \right) \\ \Rightarrow -\underline{\ell^{\prime\prime}}(\hat\psi_0,\hat\psi_1) & = \left( \begin{array}{c} -e^{\hat\psi_0}p_0 & 0\\ 0 & -e^{\hat\psi_1}p_1 \end{array} \right) \\ & = \left( \begin{array}{c} -e^{\text{log}(\frac{52}{4862})}\times4862 & 0\\ 0 & -e^{\text{log}(\frac{25}{512})}\times512 \end{array} \right) \\ & = \left( \begin{array}{c} 52 & 0\\ 0 & 25 \end{array} \right) \end{aligned}

$$\hat\psi_0-{\psi_0}_0 = \text{log}(\frac{52}{4862})-(-4.5) = -0.0379$$

\begin{aligned} W & = (\hat\psi_0-{\psi_0}_0, \hat\psi_1-{\psi_1}_0)(-\underline{\ell^{\prime\prime}}(\hat\psi_0,\hat\psi_1))\left( \begin{array}{c} \hat\psi_0-{\psi_0}_0 \\ \hat\psi_1-{\psi_1}_0 \end{array} \right) \\ & = (-0.0379, -0.0194)\left( \begin{array}{c} 52 & 0\\ 0 & 25 \end{array} \right)\left( \begin{array}{c} -0.0379 \\ -0.0194 \end{array} \right) = 0.08439208 \end{aligned}

Wald 檢驗的檢驗統計量也一樣服從 $$\chi^2_2$$，所以拒絕域同對數似然比檢驗法的$$\mathfrak{R} > \chi^2_{2,0.95} = 5.99$$，所以，檢驗的結果 $$W = 0.08439208 < 5.99$$，在顯著性水平爲 $$5\%$$ 時，沒有證據反對零假設。There is no evidence at the $$5\%$$ level against the null hypothesis.

1. 利用本節推導出的發生率比的條件對數似然方程，請嘗試進行對數似然比檢驗：心臟病發作率在無病史男性中和有病史男性中的比例爲 $$0.2$$

$\ell_c(\theta) = k_1 \text{log}\theta - k\text{log}(p_0 + \theta p_1) \\ \text{Where } \theta = \frac{\lambda_1}{\lambda_0}$

$\ell_c{\theta} = k_0\text{log}\theta - k\text{log}(p_1 + \theta p_0) \\ \text{H}_0: \theta_0 = 0.2 \text{ v.s. H}_1: \theta_0 \neq 0.2$

$llr_c(\theta_0) = \ell_c(\theta_0) - \ell_c(\hat\theta)$

\begin{aligned} \text{Let }\ell_c^\prime & = \frac{k_0}{\theta} - \frac{kp_0}{p_1+\theta p_0} = 0 \\ \Rightarrow \frac{k_0}{\theta} & = \frac{kp_0}{p_1+\theta p_0} \\ \Rightarrow \hat\theta & = \frac{k_0p_1}{p_0k_1} = \frac{k_0/p_0}{k_1/p_1} \\ \Rightarrow \hat\theta & = \frac{52\times512}{4862\times25} = 0.219037 \\ \Rightarrow \ell_c(\theta_0) & = k_0\text{log}0.2 - k\text{log}(p_1 + \theta p_0) \\ & = 52\times\text{log}0.2 - 77\times\text{log}(512 + 0.2\times4862) \\ & = -646.003 \\ \ell_c{\hat\theta} & = 52\times\text{log}0.219037 - 77\times\text{log}(512 + 0.219037\times4862)\\ & = -645.933 \\ \Rightarrow -2llr(\theta_0) & = -2\times(-646.003-(-645.933)) = 0.14 \end{aligned}