第 17 章 正態誤差模型 Normal error models

1. $$F$$ 分佈和 $$t$$ 分佈，試着闡述如何將 $$t$$ 分佈應用於兩個獨立樣本均值的比較；
2. $$\chi^2$$ 分佈在統計學中各種常用分佈中的中心位置。

17.1 服從正態分佈的隨機變量

$X_1,\cdots,X_n \stackrel{i.i.d}{\sim} N(\mu,\sigma^2) \Leftrightarrow \bar{X} \sim N(\mu, \frac{\sigma^2}{n})$

\begin{aligned} & Z=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1) \\ & 95\% \text{CI for } \mu = \bar{X} \pm Z_{0.975}\frac{\sigma}{\sqrt{n}} \\ & \text{H}_0: \mu=\mu_0 \Rightarrow \frac{\bar{X}-\mu_0}{\sigma/\sqrt{n}} \sim N(0,1) \end{aligned}

$T=\frac{\bar{X}-\mu_0}{\hat\sigma/\sqrt{n}} \sim ?????????$

17.2$$F$$ 分佈和 $$t$$ 分佈的概念

$$F$$ 分佈和 $$t$$ 分佈是建立在 $$\chi^2$$ 分佈的基礎上的：

• $$F$$ 分佈： $$Y_1, Y_2$$ 是獨立的兩個隨機變量，且 $$Y_1 \sim \chi^2_{k_1}; Y_2 \sim \chi^2_{k_2}$$，那麼

$F=\frac{Y_1/k_1}{Y_2/k_2} \sim F_{k_1, k_2}$

• $$t$$ 分佈，是 $$F$$ 分佈的特殊情況 $$(k_1=1)$$

$T\sim t_{k_2} \Rightarrow T^2 = \frac{Y_1/1}{Y_2/k_2} \sim F_{1,k_2}$

$Y_i \stackrel{i.i.d}{\sim} N(\mu,\sigma^2) \text{ where } i = 1, \cdots, n$

\begin{aligned} & Y_i = \mu + \varepsilon_i \\ & \text{Where } \varepsilon_i \stackrel{i.i.d}{\sim} N(0,\sigma^2) \end{aligned}

\begin{aligned} &Y_i | x \stackrel{i.i.d}{\sim} N(\mu+\beta x_i, \sigma^2)\\ & E(Y|x) = \mu+\beta x, \text{Var}(Y|x) = \sigma^2 \\ & \text{ or } Y_i|x = \mu + \beta x_i + \varepsilon_i ; \text{ where } \varepsilon_i \stackrel{i.i.d}{\sim} N(0, \sigma^2) \end{aligned}

17.3 兩個參數的模型

17.3.1 一組數據兩個參數

\begin{aligned} L(\theta, \phi | \underline{x}) &= \prod_{i=1}^nf(x_i | \theta, \phi) \\ \ell(\theta, \phi | \underline{x}) &= \sum_{i=1}^n\text{log}f(x_i | \theta, \phi) \end{aligned}

$\left\{ \begin{array}{ll} \frac{\partial\ell}{\partial\theta} = 0\\ \frac{\partial\ell}{\partial\phi} = 0 \\ \end{array} \right.$

17.3.2 兩組數據各一個參數

$X_1, \cdots, X_n \stackrel{i.i.d}\sim f(\theta_1) \\ Y_1, \cdots, Y_m \stackrel{i.i.d}\sim f(\theta_2)$

We describe the likelihood as the joint likelihood, conditional on jointly observing both datasets:

$L(\theta_1, \theta_2|\underline{x},\underline{y}) = \prod_{i=1}^nf_1(x_{1i}|\theta_1) \times \prod_{i=1}^mf_2(y_{i}|\theta_2)$

$\ell(\theta_1,\theta_2|\underline{x},\underline{y}) = \sum_{i=1}^n\text{log} f(x_i|\theta_1) + \sum_{i=1}^m\text{log} f(y_i|\theta_2)$

17.4 正態分佈概率密度方程中總體均值和方差都未知 (單樣本 $$t$$ 檢驗 one sample $$t$$ test 的統計學推導)

$Y_1,\cdots,Y_n \stackrel{i.i.d}{\sim} N(\mu, \sigma^2) \\ \ell(\mu, \sigma^2 | \underline{y}) = -\frac{n}{2}\text{log}\sigma^2 - \frac{1}{2\sigma^2}\sum^n_{i=1} (x_i - \mu)^2$

\begin{aligned} & \mu: \frac{\partial \ell}{\partial \mu} = \frac{\sum^n_{i=1}(y_i-\mu)}{2\sigma^2} = 0 \Rightarrow \hat\mu = \bar{y}\\ & \sigma^2: \frac{\partial \ell}{\partial (\sigma^2)} = -\frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum^n_{i=1}(y_i-\mu)^2 \\ & \text{ Substituting } \mu=\hat\mu = \bar{y} \text{ and set equal to } 0\\ & \frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum^n_{i=1}(y_i-\bar{y})^2 = 0 \\ & \Rightarrow \hat\sigma^2 = \frac{1}{n}\sum^n_{i=1}(y_i - \bar{y})^2 \end{aligned}

\begin{aligned} \sum^n_{i=1}(y_i-\mu)^2 & = \sum^n_{i=1}(y_i - \bar{y} + \bar{y} -\mu)^2 \\ & = \sum^n_{i=1}(y_i - \bar{y})^2 + \sum^n_{i=1}(\bar{y}-\mu)^2 \\ \Rightarrow \sum^n_{i=1}(y_i - \bar{y})^2 & = \sum^n_{i=1}(y_i-\mu)^2 - \sum^n_{i=1}(\bar{y}-\mu)^2 \end{aligned}

We can “partition” the probability of observing the data, conditional on unknown $$\mu$$ and $$\sigma^2$$, into

1. the probability of observing the data conditional on the observed sample mean $$\bar{y}$$ and unknown $$\sigma^2$$ ;
2. the probability of observing the sample mean $$\bar{y}$$ conditional on the two unknown parameters.

\begin{aligned} & \text{Prob}(\underline{y} | \mu, \sigma^2) = \text{Prob}(\underline{y}|\bar{y}, \sigma^2) \times \text{Prob}(\bar{y}|\mu, \sigma^2) \\ &\Rightarrow \text{Prob}(\underline{y} | \bar{y}, \sigma^2) = \frac{\text{Prob}(\underline{y} | \mu, \sigma^2)}{\text{Prob}(\bar{y}|\mu, \sigma^2)} \end{aligned}

$f(x|Y=y) = \frac{f(x,y)}{f(y)}$

\begin{aligned} f(\underline{y} | \bar{y}, \sigma^2) &= \frac{ \color{red}{f(\underline{y} | \mu, \sigma^2)} }{f(\bar{y}|\mu, \sigma^2)} \\ &= \frac{ \color{red}{(\frac{1}{\sqrt{2\pi\sigma^2}})^ne^{-\frac{1}{2\sigma^2}\sum^n_{i=1}(y_i - \mu)^2}} }{(\frac{1}{\sqrt{2\pi\sigma^2/n}})e^{-\frac{1}{2\sigma^2/n}(\bar{y}-\mu)^2}} \\ \Rightarrow \ell(\sigma^2| \underline{y}, \bar{y}) &= \color{red}{-\frac{n}{2}\text{log}\sigma^2 - \frac{1}{2\sigma^2}\sum^n_{i=1}(y_i-\mu)^2} \\ & \;\;\;+\frac{1}{2}\text{log}\frac{\sigma^2}{n} + \frac{1}{2\sigma^2/n}(\bar{y}-\mu)^2 \\ &= -\frac{n-1}{2}\text{log}\sigma^2 - \frac{1}{2\sigma^2}(\sum^n_{i=1}(y_i-\mu)^2 - n(\bar{y}-\mu)^2) \\ \text{Because } &\sum^n_{i=1}(y_i - \bar{y})^2 = \sum^n_{i=1}(y_i-\mu)^2 - \sum^n_{i=1}(\bar{y}-\mu)^2 \\ \Rightarrow \ell(\sigma^2| \underline{y}, \bar{y}) &= -\frac{n-1}{2}\text{log}\sigma^2 -\frac{1}{2\sigma^2}\sum^n_{i=1}(y_i - \bar{y})^2 \\ \text{Note that the } &\text{above conditional log-likelihood is now free of } \mu \\ \Rightarrow \ell^\prime(\sigma^2) &= -\frac{n-1}{2\sigma^2} + \frac{1}{2(\sigma^2)^2}\sum^n_{i=1}(y_i-\bar{y})^2 \\ \text{Set equal } & \text{to zero and rearrange} \\ \Rightarrow \hat\sigma^2 &= \frac{1}{n-1}\sum^n_{i=1}(y_i-\bar{y})^2\\ \text{This is the } &\color{red}{\text{unbiased estimate of } \sigma^2} \end{aligned}

$\text{H}_0: \mu = \mu_0 \text{ v.s H}_1: \mu > \mu_0$

$$\sigma^2$$已知的，在零假設條件下的檢驗統計量是：

\begin{aligned} & \text{H}_0 \Rightarrow (\frac{\bar{Y}-\mu_0}{\sigma/\sqrt{n}}) \sim N(0,1) \\ & \text{Or equivalently, } \\ & (\frac{\bar{Y}-\mu_0}{\sigma/\sqrt{n}})^2 \sim \chi_1^2 \end{aligned} \tag{17.1}

$$\sigma^2$$未知的，它需要通過樣本數據來估計時。我們就該使用前面從條件對數似然方程推導出的方差無偏估計：

$\hat\sigma^2 = S^2 = \frac{1}{n-1}\sum^n_{i=1}(y_i-\bar{y})^2$

$(\frac{\bar{Y}-\mu_0}{s/\sqrt{n}})^2$

$$$\frac{n-1}{\sigma^2}S^2 \sim \chi^2_{n-1}\\ \Rightarrow \frac{S^2}{\sigma^2} = \frac{\chi^2_{n-1}}{n-1}$$ \tag{17.2}$

$\frac{(\bar{Y}-\mu_0)^2}{S^2/n} \sim \frac{\chi^2_1/1}{\chi^2_{n-1}/n-1} = F_{1,n-1}$

\begin{aligned} & T=\frac{\bar{Y}-\mu_0}{S/\sqrt{n}} \\ & \text{Then under H}_0: T^2 \sim F_{1,n-1} \text{ or equivalently } T \sim \sqrt{F_{1,n-1}}=t_{n-1} \end{aligned}

$95\% \text{ CI for } \mu: \bar{Y} \pm t_{n-1,0.975}\frac{S}{\sqrt{n}}$

17.5 比較兩組獨立數據的均值 two sample $$t$$ test with equal unknown $$\sigma^2$$

$X_1, \cdots, X_n \stackrel{i.i.d}{\sim} N(\mu_1, \sigma^2); Y_1, \cdots, Y_m, \stackrel{i.i.d}{\sim} N(\mu_2, \sigma^2)$

$\text{H}_0: \mu_1 = \mu_2 \text{ v.s. } \text{H}_1: \mu_1 > \mu_2$

$$$\hat\sigma^2 = S^2_p = \frac{\sum^n_{i=1}(X_i-\bar{X})^2 + \sum^m_{i=1}(Y_i-\bar{Y})^2}{n+m-2}\\$$ \tag{17.3}$

\begin{aligned} & \frac{1}{\sigma^2}\sum^n_{i=1}(X_i - \bar{X})^2 \sim \chi^2_{n-1} \\ & \frac{1}{\sigma^2}\sum^m_{i=1}(Y_i - \bar{Y})^2 \sim \chi^2_{m-1} \\ \Rightarrow &\frac{1}{\sigma^2}\{ \sum^n_{i=1}(X_i - \bar{X})^2 + \sum^m_{i=1}(Y_i - \bar{Y})^2 \} \sim \chi^2_{n+m-2} \end{aligned}

$$$(n+m-2)\frac{S^2_p}{\sigma^2} \sim \chi^2_{n+m-2}$$ \tag{17.4}$

$$$\Rightarrow \frac{\bar{X}-\bar{Y}}{\sqrt{\sigma^2(\frac{1}{n}+\frac{1}{m})}} \sim N(0,1) \\ \Leftrightarrow \frac{(\bar{X}-\bar{Y})^2}{\sigma^2(\frac{1}{n}+\frac{1}{m})} \sim \chi^2_1$$ \tag{17.5}$

\begin{aligned} &\frac{(\bar{X}-\bar{Y})^2}{\sigma^2(\frac{1}{n}+\frac{1}{m})} \times \frac{\sigma^2}{S^2_p(n+m-2)} = \frac{\chi^2_1/1}{\chi^2_{n+m-2}} \\ &\Rightarrow T^2 = \frac{(\bar{X}-\bar{Y})^2}{S^2_p(\frac{1}{n}+\frac{1}{m})} = \frac{\chi^2_1/1}{\chi^2_{n+m-2}/(n+m-2)} \sim F_{1,n+m-2} \\ &\Rightarrow T = \frac{\bar{X}-\bar{Y}}{S_p\sqrt{\frac{1}{n}+\frac{1}{m}}} \sim t_{n+m-2} \end{aligned}

17.6 各個統計分佈之間的關係

$\{N(0,1)\}^2 = \chi^2_1 \\ \chi^2_k = \sum_{i-1}^k \chi^2_1 \\ F_{k,n} = \frac{\chi^2_k/k}{\chi^2_n/n}\\ t^2_n = F_{1,n} =\frac{\chi^2_1/1}{\chi^2_n/n}$