第 19 章多個參數時的統計推斷 – 子集似然函數 profile log-likelihoods

本章介紹的子集似然法是處理多個參數模型的主要方法。前章介紹的條件似然法也是相當出色的方法，但是許多情況下我們無法找到合適的“條件”來輔助我們擺脫那些模型中不需要的，障礙 (或者叫噪音) 參數 nuisance parameters。

我們還是沿用上一節的例子。

兩個獨立的人羣追蹤樣本，在 \(p_0, p_1\) 人年的隨訪中發生事件 A 的次數分別是 \(k_0, k_1\)。我們只關心兩組的事件 A 發生率的比 \(\text{Rate ratio:} \theta=\frac{\lambda_1}{\lambda_0}\)。兩個人羣的聯合對數似然函數如下：

\[ \ell(\lambda_0, \lambda_1) = k_0\text{log}\lambda_0 - \lambda_0p0 + k_1\text{log}\lambda_1 - \lambda_1p1 \]

Step 1. 先用 \(\lambda_1 = \lambda_0\theta\) 取代掉上面式子中的 \(\lambda_1\)。

\[ \begin{aligned} \Rightarrow \ell(\lambda_0, \theta) & = k\text{log}\lambda_0 + k_1\text{log}\theta - \lambda_0(P_0 + \theta p_1) \\ \text{Where } k & = k_0 + k_1 \end{aligned} \tag{19.1} \]

這一步先是消滅了一個障礙參數 \(\lambda_1\)，獲得了一個我們關心的參數 \(\theta\)，和 \(\lambda_0\) 的對數似然方程。接下來，我們尋找用 \(\theta\) 表示 \(\lambda_0\) (用 \(\hat\lambda_0(\theta)\) 標記) 的似然方程，使得只包含一個參數 \(\theta\) 的對數似然方程可以在每個 \(\lambda_0\) 時取得極大值。此時我們定義 \(\theta\) 的子集對數似然方程 profile log-likelihood是：

\[ \ell_p(\theta) = \ell(\hat\lambda_0(\theta),\theta) \]

Step 2. 爲了求 \(\hat\lambda_0(\theta)\)，先視 \(\theta\) 爲不變的，對上式 (19.1) 求 \(\lambda_0\) 的微分：

\[ \frac{\partial\ell(\lambda_0,\theta)}{\partial\lambda_0}=\frac{k}{\lambda_0} - (p_0+\theta p_1) \]

把該微分方程等於0，推導出 \(\hat\lambda_0=\frac{k}{p_0+\theta p_1}\) 就是 \(\theta\) 在取值範圍內所有能使對數似然方程 (19.1) 取極大值的對應 \(\lambda_0\)。

Step 3. 將這個 \(\theta\) 表示的 \(\lambda_0\text{ MLE}\) 代替 \(\lambda_0\) 代入對數似然方程 (19.1) 中去：

\[ \begin{aligned} \ell_p(\theta) &= k\text{log}\frac{k}{p_0 + \theta p_1} + k_1 \text{log}\theta - k \\ \text{Ignoring} &\text{ items not involving } \theta\\ \Rightarrow &= k_1\text{log}\theta - k\text{log}(p_0+\theta p_1) \end{aligned} \]

這個用子集似然法推導的關於參數 \(\theta\) 的似然方程和前一章用條件似然法 (Section 18.4) 推導的結果是完全一致的 (18.3)。

19.1 子集似然法推導的過程總結

多個參數中區分出我們感興趣的參數 \(\psi\) 和其餘的障礙(噪音)參數 \(\lambda\)；
爲了從對數似然方程中消除噪音參數，把它們一一通過微分求極值的辦法表達成用 \(\psi\) 標記的表達式，用這些包含了 \(\psi\) 的 \(\text{MLE}\) 代替所有的噪音參數；
整理最終獲得的只有感興趣的參數的對數似然方程，記得把不包含參數的部分忽略掉。

19.1.1 子集對數似然方程的分佈

\[ -2pllr(\psi) = -2\{ \ell_p(\psi) - \ell(\hat\psi)\} \stackrel{\cdot}{\sim} \chi^2_r \]

其中自由度 \(r\) 是想要檢驗的零假設中受限制的參數的個數。Degree of freedom \(r\) is the number of parameters restricted under the null hypothesis. 所以，如果 \(\psi\) 是一個維度 (dimension) 爲 \(p\) 的向量，如果零假設是 \(\text{H}_0: \psi = \psi_0\)，那麼自由度就是 \(p\)。

19.1.2 假設檢驗過程舉例

兩個獨立的二項分佈樣本：\(K_0 \sim \text{Bin}(n_0, \pi_0), K_1 \sim \text{Bin}(n_1, \pi_1)\)。它們的聯合對數似然爲：

\[ \ell(\pi_0, \pi_1) = \ell(\pi_0) + \ell(\pi_1) \]

如果要檢驗的零假設和替代假設分別是 \(\text{H}_0: \pi_0 = \pi_1 \text{ v.s. H}_1: \pi_0 \neq \pi_1\)。

如果令 \(\theta=\frac{\pi_1}{\pi_0}\)，那麼要檢驗的零假設和替代假設就變成了：

\[ \text{H}_0: \theta = 1 \text{ v.s. H}_1: \theta \neq 1 \\ \Rightarrow -2 pllr \stackrel{\cdot}{\sim} \chi^2_1 \]

而且在零假設條件下，\(\text{H}_0: K_0+K_1 \sim \text{Bin}(n_0+n_1, \pi)\)，那麼自己對數似然比檢驗的統計量是：

\[ \begin{aligned} -2 pllr & = -2\{ \text{max}[\underset{\text{H}_0}{\ell(\pi_0,\theta\pi_0)}] -\text{max}[\underset{\text{H}_1}{\ell(\pi_0,\theta\pi_0)}] \} \\ \Rightarrow -2 pllr & = -2\{ \text{max}[\underset{\text{H}_0}{\ell(\pi,\theta\pi)}] -\text{max}[\underset{\text{H}_1}{\ell(\pi_0,\pi_1)}] \} \\ \Rightarrow -2 pllr & = -2\{ \ell{(\hat\pi)} - \ell{(\hat\pi_0, \hat\pi_1)} \} \end{aligned} \]

19.2 子集對數似然比的近似

假如有兩個獨立樣本數據，參數分別只有一個 \(\beta_0, \beta_1\)，我們關心他們二者之間的差是否有意義 \(\gamma = \beta_1-\beta_0\)。如果 \(\beta_0\) 的對數似然比檢驗統計量的相應的 Wald 檢驗統計量 (二次方程近似法 Section 16.4) 可以用 \(\hat\beta_0, S_0\) 定義，其中 \(\beta_0\) 是 \(\text{MLE}\)，\(S_0\) 是標準誤差。類似的，\(\beta_1\) 的 Wald 檢驗統計量可以用 \(\hat\beta_1, S_1\) 定義。那麼，我們關心的參數，\(\gamma = \beta_1 - \beta_0\) 的 Wald 檢驗統計量可以用 \(\hat\gamma = \hat\beta_1 - \hat\beta_1, S=\sqrt{S^2_1 + S^2_0}\) 定義：

\[ \begin{aligned} pllr(\gamma) & = -\frac{1}{2}(\frac{\gamma-\hat\gamma}{\sqrt{S^2_1+S^2_0}})^2 \\ & = -\frac{1}{2}(\frac{(\beta_1-\beta_0)-(\hat\beta_1-\hat\beta_0)}{\sqrt{S^2_1+S^2_0}})^2 \end{aligned} \]

19.2.1 子集對數似然比近似的一般化

如果我們關心的參數，和模型參數的關係可以用下面的表達式來表示：

\[ \gamma = W_0\beta_0 + W_1\beta_1 + \cdots \\ \text{ Where } W_i \text{ are arbitrary cosntants} \]

如果，模型中的每個參數 \(\beta_0, \beta_1, \cdots\) 的 \(\text{MLE}\) 是 \(\hat\beta_0, \hat\beta_1, \cdots\)，標準誤是 \(S=\sqrt{(W_0S_0)^2+(W_1S_2)^2+\cdots}\)

19.2.2 事件發生率之比的 Wald 檢驗統計量

事件發生率 (Possion rate ratio) \(\theta = \frac{\lambda_1}{\lambda_0}\)

令 \(\beta_1 = \text{log}\lambda_1, \beta_0 = \text{log}\lambda_0, \gamma = \text{log}\theta\)。

所以有 \(\gamma=\beta_1-\beta_0\)。

由於

\[ \begin{aligned} \hat\beta_0 & = \text{log}(\frac{k_0}{p_0}), \\ \hat\beta_1 & = \text{log}(\frac{k_1}{p_1}) \\ \end{aligned} \]

因而

\[ \begin{aligned} \hat\gamma & = \text{log}\frac{k_1}{p_1} - \text{log}\frac{k_0}{p_0} \\ & = \text{log}\frac{k_1/p_1}{k_0/p_0} \end{aligned} \]

又由於 \(S_0 = \frac{1}{\sqrt{k_0}}, S_1 = \frac{1}{\sqrt{k_1}}\) (Section 14.2.1)。

所以 \(S=\sqrt{\frac{1}{k_0}+\frac{1}{k_1}}\)。

綜上，事件發生率之比的 Wald 檢驗統計量爲

\[ \begin{aligned} pllr(\gamma) & = -\frac{1}{2}(\frac{\gamma - \hat\gamma}{\sqrt{\frac{1}{k_0}+\frac{1}{k_1}}})^2 \\ & = -\frac{1}{2}(\frac{\text{log}\theta - \text{log}\frac{k_1/p_1}{k_0/p_0}}{\sqrt{\frac{1}{k_0}+\frac{1}{k_1}}})^2 \end{aligned} \]

19.3 Inference Practical 11

\(n\) 名肺癌 I 期患者的倖存時間 \(X_1, X_2, \cdots, X_n\) 被認爲服從指數分佈 (參數 \(\lambda_x\))，概率方程爲 \(\lambda_x e^{-x\lambda_x},\text{ where } x > 0\)。

證明 \(\lambda_x\) 的 \(\text{MLE}\) 是 \(\hat\lambda_x = \frac{1}{\bar{x}}\), 對數似然方程是 \[\ell(\lambda_x | \underline{x}) = n\text{log}\lambda_x - \lambda_x n \bar{x}\]

解

\[ \begin{aligned} f(\underline{x}|\lambda_x) & = \lambda_x\cdot e^{-x\lambda_x} \\ F(\underline{x}|\lambda_x) & = \prod_{i=1}^n\lambda_{x}\cdot e^{-x_i\lambda_{x}} \\ \Rightarrow L(\lambda_x | \underline{x}) & = \prod_{i=1}^n\lambda_xe^{-x_i\lambda_{x}} \\ \Rightarrow \ell(\lambda_x|\underline{x}) & = \sum_{i=1}^n(\text{log}\lambda_x + \text{log}e^{-x_i\lambda_{x}}) \\ & = n\text{log}\lambda_x + \sum_{i=1}^n(-x_i\lambda_{x}) \\ & = n\text{log}\lambda_x - n\bar{x}\lambda_x \\ \Rightarrow \ell^\prime(\lambda_x) & = \frac{n}{\lambda_x} - n\bar{x}\lambda_x \\ \text{Let } \ell^\prime(\lambda_x) & = 0 \Rightarrow \text{ MLE of } \lambda_x \text{ is } \hat\lambda_x = \frac{1}{\bar{x}} \\ \because \ell^{\prime\prime} = -\frac{n}{\lambda^2_x} & < 0 \therefore \frac{1}{\bar{x}} \text{ is the MLE} \end{aligned} \]

另一組獨立數據是樣本量爲 \(n\) ，但是肺癌診斷爲 II 期的患者的倖存時間 \(Y_1, \cdots, Y_n\)。這組數據也被認爲服從參數爲 \(\lambda_y\) 的指數分佈。用 \(\theta=\frac{\lambda_x}{\lambda_y}\) 標記兩組患者倖存時間之比，用 \(r=\frac{\bar{x}}{\bar{y}}\) 標記樣本的倖存時間均值之比。證明使兩個樣本數據的聯合對數似然取極大值的 \(\hat\lambda_y(\theta) = \frac{2}{\bar{y}(\theta r+1)}\)。

解

\[ \begin{aligned} \ell(\lambda_x|\underline{x}) & = n\text{log}\lambda_x - n \bar{x} \lambda_x \\ \ell(\lambda_y|\underline{y}) & = n\text{log}\lambda_y - n \bar{y} \lambda_y \\ \Rightarrow \text{ Joint log-likelihood: } & \ell(\lambda_x, \lambda_y | \underline{x}, \underline{y}) = n\text{log}\lambda_x - n\bar{x}\lambda_x \\ & \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;+ n\text{log} \lambda_y - n\bar{y}\lambda_y \\ \text{Subsitute } \lambda_x & =\theta\cdot\lambda_y \\ \Rightarrow \ell(\theta, \lambda_y) &= n\text{log}\theta\lambda_y - n\bar{x}\theta\lambda_y + n\text{log} \lambda_y - n\bar{y}\lambda_y \\ \ell(\theta, \lambda_y) & = n(\text{log}\theta + \text{log}\lambda_y - \bar{x}\theta\lambda_y + \text{log}\lambda_y - \bar{y}\lambda_y) \\ & = n[\text{log}\theta + 2\text{log}\lambda_y - \lambda_y(\bar{x}\theta + \bar{y})] \\ \Rightarrow \frac{\partial\ell(\theta, \lambda_y)}{\partial \lambda_y} & = n[\frac{2}{\lambda_y} - (\bar{x}\theta + \bar{y})] \\ \text{Let } \frac{\partial\ell(\theta, \lambda_y)}{\partial \lambda_y} & = 0 \text{ and because } r = \frac{\bar{x}}{\bar{y}} \\ \hat\lambda_y(\theta) & = \frac{2}{\bar{x}\theta + \bar{y}} = \frac{2}{\bar{y}(r\cdot\theta +1)} \end{aligned} \]

證明參數 \(\theta\) 的子集對數似然是 \(\ell_p(\theta|r) = n\text{log}\theta - 2n \text{log}(\theta\cdot r + 1)\)，且 \(\text{MLE}\) 是 \(\hat\theta = \frac{1}{r}\)

解

\[ \begin{aligned} \ell_p (\theta) & = n[\text{log}\theta + 2\cdot\text{log}\frac{2}{\bar{y}(r\cdot\theta +1)} - \text{log}\frac{2}{\bar{y}(r\cdot\theta +1)}(\bar{x}\theta+\bar{y})] \\ & = n\{\text{log}\theta + 2\cdot\text{log}2 - 2\cdot\text{log}[\bar{y}(r\theta+1)] -2 \} \\ \text{Ignoring } & \text{ items not involving } \theta\\ & = n[\text{log}\theta - 2\text{log}(r\theta+1)] \\ \Rightarrow \ell_p^{\prime}(\theta) & = n(\frac{1}{\theta} - \frac{2r}{r\theta+1}) \\ \text{Let } \ell_p^{\prime}(\theta) & = 0 \Rightarrow n(\frac{1}{\theta} - \frac{2r}{r\theta+1}) = 0 , \hat\theta=\frac{1}{r}\\ \because \ell_p^{\prime\prime}(\theta) & = -\frac{1}{\theta^2} - \frac{2r^2}{(r\theta^2+1)^2} < 0 \\ \therefore \hat\theta & =\frac{1}{r} \text{ is the MLE} \end{aligned} \]

根據 \(\text{MLE}\) 的恆定性，可以直接推導出 \(\theta\) 的 \(\text{MLE}\) 嗎?

解

\[ \because \hat\lambda_x = \frac{1}{x} , \hat\lambda_y = \frac{1}{y} \\ \therefore \theta = \frac{\lambda_x}{\lambda_y} \Rightarrow \hat\theta = \frac{\hat\lambda_x}{\hat\lambda_y} = \frac{1}{r} \]

證明檢驗下列假設 \(\text{H}_0: \theta_0 = 1 \text{ v.s. H}_1: \theta_0 \neq 1\) 的子集對數似然比檢驗統計量是 \(2n\text{log}\frac{(r+1)^2}{4r}\)，並進行 \(n=16, r=2\) 的假設檢驗。

解

\[ \begin{aligned} \text{Under H}_0 & \Rightarrow \text{ test statistic is } \\ -2llr(\theta_0) & = -2[\ell(\theta_0) - \ell(\hat\theta)] \stackrel{\cdot}{\sim} \chi^2_1 \\ \Rightarrow \ell_p(\theta_0) & = n\text{log}1 - 2n \text{log}(r+1) = -2n\text{log}(r+1) \\ \ell_p(\hat\theta) & = n\text{log}\frac{1}{r} - 2n\text{log}(2) \\ & = -n\text{log}r-2n\text{log}2 = -n\text{log}4r\\ \Rightarrow \ell_p(\theta_0) - \ell_p(\hat\theta) & = -2n\text{log}(r+1) + n\text{log}4r = n\text{log}\frac{4r}{(r+1)^2} \\ \Rightarrow -2llr(\theta_0) & = -2n\text{log}\frac{4r}{(r+1)^2} = 2n\text{log}\frac{(r+1)^2}{4r} \\ \text{ When } n=16, r=2 -2llr(\theta_0) & = 2\times16\times\text{log}(\frac{2+1}{4\times2})^2 = 3.769 < \chi^2_{1,0.95} = 3.84\\ \text{ We do not reject }&\text{ the null hypothesis at the } 5\% \text{ level.} \end{aligned} \]

此時如果精確計算可以獲得 \(p=0.052\)，從檢驗統計量的計算值我們也能看出距離拒絕零假設的拒絕域十分接近。此時可以認爲是一個臨界的 \(p\) 值。所以數據提供了臨界 \(p=0.052\) 的證據證明肺癌 II 期患者的倖存時間平均要少於 I 期患者。

第 19 章 多個參數時的統計推斷 – 子集似然函數 profile log-likelihoods