# 第 50 章 流行病學中的邏輯迴歸

## 50.1 流行病學研究最常用的實驗設計

• 隊列研究/前瞻性研究 cohort or prospective studies；
• 病例對照/回顧性研究 case-control or retrospective studies。

## 50.2 以簡單二分類暴露變量爲例

### 50.2.1 先決條件

• 觀察對象樣本量爲 $$n, i = 1,\cdots,n$$
• $$X_i$$ 爲一個二分類暴露變量 (是否接觸了某種化學試劑，$$1=$$是，$$0=$$否)；
• $$D_i$$ 爲一個二分類結果變量 (是否有食道癌，$$1=$$是，$$0=$$否)。

\begin{aligned} \pi_{xd} & = \text{Pr}(X=x, D=d) \\ \pi_{11} &+ \pi_{10} + \pi_{01} + \pi_{00} = 1 \end{aligned} \tag{50.1}

$$D$$
$$0$$ $$1$$
$$X$$ $$0$$ $$\pi_{00}$$ $$\pi_{01}$$
$$1$$ $$\pi_{10}$$ $$\pi_{11}$$

$$$\text{Pr}(D=d|X=x) = \frac{\pi_{xd}}{\pi_{x0} + \pi_{x1}}$$$

$$D$$
$$\text{Pr}(D=d|X=x)$$
$$X$$ $$0$$ $$1$$
$$0$$ $$\pi_{00}$$ $$\pi_{01}$$ $$\frac{\pi_{01}}{\pi_{01} + \pi_{00}}$$
$$1$$ $$\pi_{10}$$ $$\pi_{11}$$ $$\frac{\pi_{11}}{\pi_{10} + \pi_{11}}$$

$\text{Pr}(X=x|D=d) = \frac{\pi_{xd}}{\pi_{0d}+\pi_{1d}}$

$$D$$
$$0$$ $$1$$
$$X$$ $$0$$ $$\pi_{00}$$ $$\pi_{01}$$
$$1$$ $$\pi_{10}$$ $$\pi_{11}$$
$$\text{Pr}(X=x|D=d)$$ $$\frac{\pi_{10}}{\pi_{10}+\pi_{00}}$$ $$\frac{\pi_{11}}{\pi_{11}+\pi_{01}}$$

### 50.2.2 比值比 Odds ratios

$\text{Odds}_1 = \frac{\text{Pr}(D=1|X=1)}{1-\text{Pr}(D=1|X=1)} = \frac{\pi_{11}/(\pi_{10} + \pi_{11})}{1-\pi_{11}/(\pi_{10} + \pi_{11})}$

$\text{Odds}_2 = \frac{\text{Pr}(D=1|X=0)}{1-\text{Pr}(D=1|X=0)} = \frac{\pi_{01}/(\pi_{01} + \pi_{00})}{1-\pi_{01}/(\pi_{01} + \pi_{00})}$

\begin{aligned} \text{Odds Ratio}_{\text{cohort}} = \frac{\text{Odds}_1}{\text{Odds}_2} & = \frac{\frac{\text{Pr}(D=1|X=1)}{1-\text{Pr}(D=1|X=1)}}{\frac{\text{Pr}(D=1|X=0)}{1-\text{Pr}(D=1|X=0)}}\\ & = \frac{\frac{\pi_{11}/(\pi_{10} + \pi_{11})}{1-\pi_{11}/(\pi_{10} + \pi_{11})}}{\frac{\pi_{01}/(\pi_{01} + \pi_{00})}{1-\pi_{01}/(\pi_{01} + \pi_{00})}} \\ & = \frac{\frac{\pi_{11}/(\pi_{10}+\pi_{11})}{\pi_{10}/(\pi_{10}+\pi_{11})}}{\frac{\pi_{01}/(\pi_{01}+\pi_{00})}{\pi_{00}/(\pi_{01}+\pi_{00})}} \\ & = \frac{\pi_{11}\pi_{00}}{\pi_{10}\pi_{01}} \end{aligned}

\begin{aligned} \text{Odds Ratio}_{\text{case-control}} = \frac{\text{Odds}^\prime_1}{\text{Odds}^\prime_2} & = \frac{\frac{\text{Pr}(X=1|D=1)}{1-\text{Pr}(X=1|D=1)}}{\frac{\text{Pr}(X=0|D=0)}{1-\text{Pr}(X=0|D=0)}} \\ & = \frac{\frac{\pi_{11}/(\pi_{11} + \pi_{01})}{1-\pi_{11}/(\pi_{11} + \pi_{01})}}{\frac{\pi_{10}/(\pi_{10} + \pi_{00})}{1-\pi_{10}/(\pi_{10} + \pi_{00})}} \\ & = \frac{\frac{\pi_{11}/(\pi_{11}+\pi_{01})}{\pi_{01}/(\pi_{11}+\pi_{01})}}{\frac{\pi_{10}/(\pi_{10}+\pi_{00})}{\pi_{00}/(\pi_{10}+\pi_{00})}} \\ & = \frac{\pi_{11}\pi_{00}}{\pi_{10}\pi_{01}} \end{aligned}

### 50.2.3 邏輯迴歸應用於病例對照研究的合理性

$(D_i|X_i = x_i) \sim \text{Binomial}(1, \pi_i)$

$\text{logit}(\pi_i) = \text{log}(\frac{\pi_i}{1-\pi_i}) = \alpha + \beta x_i$

\begin{aligned} \text{Pr}(D=1|X=1) & = \frac{e^{\alpha + \beta}}{1 + e^{\alpha + \beta}} \\ \text{Pr}(D=1|X=0) & = \frac{e^\alpha}{1 + e^\alpha} \\ \text{Where, }\alpha & = \text{log}{\frac{\pi_{01}}{\pi_{00}}} \\ \beta & = \text{log}{\frac{\pi_{11}\pi_{00}}{\pi_{10}\pi_{01}}} \end{aligned}

$(X_i | D_i = d_i) \sim \text{Binomial}(1,\pi_i^*)$

$\text{logit}(\pi_i^*) = \text{log}(\frac{\pi_i^*}{1-\pi_i^*}) = \alpha^* + \beta d_i$

\begin{aligned} \text{Pr}(X=1|D=1) & = \frac{e^{\alpha^* + \beta}}{1 + e^{\alpha^* + \beta}} \\ \text{Pr}(X=1|D=0) & = \frac{e^{\alpha^*}}{1 + e^{\alpha^*}} \\ \text{Where, }\alpha & = \text{log}{\frac{\pi_{10}}{\pi_{00}}} \\ \beta & = \text{log}{\frac{\pi_{11}\pi_{00}}{\pi_{10}\pi_{01}}} \end{aligned}

\begin{aligned} L_{\text{cohort}} & = \prod_{i=1}^n(\frac{e^{\alpha + \beta x_i}}{1+e^{\alpha + \beta x_i}})^{d_i}(\frac{1}{e^{\alpha + \beta x_i}})^{1-d_i} \\ \text{Where } d_i & = \left\{ \begin{array}{ll} 0 \text{ if subjects were not observed with the outcome}\\ 1 \text{ if subjects were observed with the outcome}\\ \end{array} \right. \\ x_i & = \left\{ \begin{array}{ll} 0 \text{ if subjects were not observed with the exposure}\\ 1 \text{ if subjects were observed with the exposure}\\ \end{array} \right. \end{aligned}

\begin{aligned} L_{\text{case-control}} & = \prod_{i=1}^n(\frac{e^{\alpha + \beta d_i}}{1+e^{\alpha + \beta d_i}})^{x_i}(\frac{1}{e^{\alpha + \beta d_i}})^{1-x_i} \\ \text{Where } d_i & = \left\{ \begin{array}{ll} 0 \text{ if subjects were not observed with the outcome}\\ 1 \text{ if subjects were observed with the outcome}\\ \end{array} \right. \\ x_i & = \left\{ \begin{array}{ll} 0 \text{ if subjects were not observed with the exposure}\\ 1 \text{ if subjects were observed with the exposure}\\ \end{array} \right. \end{aligned}

## 50.3 拓展到多個暴露變量的邏輯迴歸模型

• $$D_i = 0 \text{ or } 1$$，第 $$i$$ 名研究對象觀察到有 $$(=1)$$，或沒有 $$(=0)$$ 結果變量 (如發生胰腺癌)；
• $$X_{i1} = 0 \text{ or } 1$$，第 $$i$$ 名研究對象有 $$(=1)$$，或沒有 $$(=0)$$ 暴露變量 (如吸菸)；
• $$X_{i2} = 0 \text{ or } 1$$，第 $$i$$ 名研究對象是男性 $$(=1)$$，或女性 $$(=0)$$
• $$X_{i3}$$，第 $$i$$ 名研究對象的年齡 (years)。

### 50.3.2 隊列研究和病例對照研究的似然

\begin{aligned} D_i=1 | (X_{i1} & = x_{i1}, \cdots, X_{ip} = x_{ip}) \sim \text{Binomial}(1, \pi_i) \\ \text{logit} (\pi_i) & = \text{log}(\frac{\pi_i}{1-\pi_i}) = \alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip} \end{aligned}

$\text{Pr}(D_i = 1 | X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip}) = \frac{e^{\alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}{1+e^{\alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}$

• 截距 $$\alpha$$ 的含義是，當所有的暴露變量都取 $$0$$ 時，研究對象觀察到結果變量爲 $$1$$ 的對數比值 $$(\text{log odds})$$
• 迴歸係數 $$\beta_k$$ 的含義是，當其餘的暴露變量保持不變時，$$x_k$$ 每增加一個單位，結果變量爲 $$1$$ 的對數比值比 $$(\text{log odds-ratio})$$ (即，調整了其餘所有變量之後，$$x_k$$ 和結果變量之間的對數比值比)。

\begin{aligned} L_{\text{cohort}} & = \prod_{i=1}^n\text{Pr}(D_i = d_i | X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip}) \\ & = \prod_{i=1}^n\text{Pr}(\frac{e^{\alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}{1+e^{\alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}})^{d_i}(\frac{1}{1+e^{\alpha + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}})^{1-d_i} \end{aligned}

$L_{\text{case-control}} = \prod_{i=1}^n\text{Pr}(X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip} |D_i = d_i)$

### 50.3.3 病例對照研究中的邏輯迴歸

$$\text{Pr}(S_i=1 \text{ or } 0)$$ 表示在潛在研究人羣 (underlying study population) 中，被抽樣 (或者沒有被抽樣) 進入該隊列研究的概率。那麼，理想情況下，可認爲實施病例對照研究時，病例是稀少的，即我們收集到的病例，幾乎等價於我們關心的潛在研究人羣中全部的病例，且可以被證明：

\begin{aligned} & \text{Pr}(X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip} |D_i = 1) \\ =& \text{Pr}(X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip} |D_i = 1, S_i=1) \\ =& \frac{e^{\alpha^* + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}{1+e^{\alpha^* + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}} \\ & \;\;\;\; \times \frac{\text{Pr}(X_{i1} = x_{i1}, \cdots, X_{ip} = x_{ip} |S_i=1)}{\text{Pr}(D_i = 1 | S_i = 1)} \\ \text{Where } \alpha^* & = \alpha + \text{log}(\frac{\text{Pr}(D_i = 0)}{\text{Pr}(D_i = 1)}) + \text{log}(\frac{\text{Pr}(D_i = 1|S_i=1)}{\text{Pr}(D_i = 0|S_i=1)}) \end{aligned} \tag{50.2}

$L_{\text{case-control}} \propto \prod_{i=1}^n(\frac{e^{\alpha^* + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}}{1+e^{\alpha^* + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}})^{d_i}(\frac{1}{1+e^{\alpha^* + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}}})^{1-d_i}$

## 50.4 流行病學研究中變量的調整策略

50.2 展示的是在潛在研究人羣中 $$W (\text{potential confounder}),X (\text{exposure}),D (\text{outcome})$$ 三者之間可能存在的四種關係。

• 50.2 - (a) $$W$$$$X, D$$ 都沒有關係，那麼我們研究 $$X,D$$ 之間的關係時，完全可以忽略掉 $$W$$，不用調整。
但是，如果在邏輯迴歸模型中調整了一個和暴露變量結果變量之間無關的變量，獲得的比值比估計幾乎不會有太大改變，但是代價是會獲得較大的對數比值比的標準誤 (standard error)，降低了對比值比估計的精確程度
• 50.2 - (b) $$W$$$$X, D$$ 同時都相關，且不在 $$X\rightarrow D$$ 的因果關係通路上，此種情況下，必須對 $$W$$ 進行調整，否則獲得的比值比估計是帶有嚴重偏倚的。
• 50.2 - (c) $$W$$ 僅僅和 $$X$$ 有關係，和結果變量 $$D$$ 沒有相關性。此時研究 $$X,D$$ 之間的關係時，忽略掉 $$W$$，不需要對之進行任何調整。和 (a) 一樣，如果此時調整了 $$W$$，估計的比值比不會發生質變，但是會損失估計的精確度。
• 50.2 - (d) $$W$$ 僅僅和結果變量 $$D$$ 有關係，和暴露變量 $$X$$ 無關時，如果模型對 $$W$$ 進行調整，我們會獲得完全不同的比值比估計，因爲這種情況下其實調整 $$W$$ 前後的比值比估計的是具有不同含義的，二者都具有實際意義。調整前的估計量，是總體估計，有助於作總體的決策；調整後的估計量，是帶有某些特徵的部分人羣估計，有助於評價個人水平的 $$X,D$$ 之間的關係。