「統計解析のための線形代数」復習筆記7
分解平方和 1
樣本量均爲 \(n\) 的兩變量 \(z, \hat{z}\) 如下表,已知這兩個變量滿足條件:
\(\bar{z}=\frac{1}{n}\sum\limits_{i=1}^nz_i=\frac{1}{n}\sum\limits_{i=1}^n\hat{z}_i=\bar{\hat{z}},\)
\(\sum\limits_{i=1}^n(z_i-\hat{z_i})(\hat{z_i}-\bar{z})=0\)
個体の番号 | 変量 \(z\) | 変量 \(\hat{z}\) |
---|---|---|
\(1\) | \(z_1\) | \(\hat{z}_1\) |
\(2\) | \(z_2\) | \(\hat{z}_2\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) |
\(i\) | \(z_i\) | \(\hat{z}_i\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) |
\(n\) | \(z_n\) | \(\hat{z}_n\) |
此時我們有:
- 全平方和(全変動,總平方和,總變動, Total sum of Squares):
\(S_T=(z_i-\bar{z})^2+(z_2-\bar{z})^2+\cdots+(z_n-\bar{z})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(z_i-\bar{z})^2\) - 回歸平方和(回歸變動,Regression sum of Squares)
\(S_R=(\hat{z_1}-\bar{\hat{z}})^2+(\hat{z_2}-\bar{\hat{z}})^2+\cdots+(\hat{z_n}-\bar{\hat{z}})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(\hat{z_i}-\bar{\hat{z}})^2=\sum\limits_{i=1}^n(\hat{z_i}-\bar{z})^2\) - 殘差平方和(誤差平方和,殘差變動,誤差變動,residual sum of Squares)
\(S_e=(z_1-\hat{z_1})^2+(z_2-\hat{z_2})^2+\cdots+(z_n-\hat{z_n})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(z_i-\hat{z_i})^2\) - 上面三個平方和之間,有如下的關系:
\[\begin{equation}
S_T=S_R+S_e
\tag{1}
\end{equation}\]
既:全平方和等於殘差平方和與回歸平方和之和。(1)式被稱爲平方和的分解(decomposition of sum of squares)
證明(1)式
解:
\[
\begin{equation}
\begin{split}
S_T & = \sum\limits_{i=1}^n(z_i-\bar{z})^2 \\
& = \sum\limits_{i=1}^n\left\{(z_i-\hat{z_i})+(\hat{z_i}-\bar{z})\right\}^2\\
& = \sum\limits_{i=1}^n\left\{(z_i-\hat{z_i})^2+(\hat{z_i}-\bar{z})^2+2(z_i-\hat{z_i})(\hat{z_i}-\bar{z})\right\}\\
& = \sum\limits_{i=1}^n(z_i-\hat{z_i})^2+\sum\limits_{i=1}^n(\hat{z_i}-\bar{z})^2 + 0\\
& = S_e + S_R
\end{split}
\end{equation}
\]
最後一步等式,利用了一開始給出的條件 \(\sum\limits_{i=1}^n(z_i-\hat{z_i})(\hat{z_i}-\bar{z})=0\)
這裏的平方和分解與回歸分析有着緊密的聯系。
分解平方和 2
有樣本量爲 \(n\) 的變量 \(z_1\) 與樣本量爲 \(m\) 的變量 \(z_2\) 的數據如下表:
変量 \(z_1\) | 変量 \(z_2\) |
---|---|
\(z_{11}\) | \(z_{12}\) |
\(z_{21}\) | \(z_{22}\) |
\(\vdots\) | \(\vdots\) |
\(z_{i1}\) | \(z_{i2}\) |
\(\vdots\) | \(\vdots\) |
\(z_{n1}\) | \(\vdots\) |
\(z_{m2}\) |
此時我們有:
- 樣本平均值:
\(\bar{z_1}=\frac{1}{n}\sum\limits_{i=1}^nz_{i1}, \;\bar{z_2}=\frac{1}{m}\sum\limits_{i=1}^mz_{i2}\) - 樣本總平均值:
\(\bar{z}=\frac{1}{n+m}(\sum\limits_{i=1}^nz_{i1}+\sum\limits_{i=1}^mz_{i2})\) - 全平方和 (全変動,總平方和,總變動, Total sum of Squares):
\(S_T=\left\{(z_{11}-\bar{z})^2+(z_{21}-\bar{z})^2+\cdots+(z_{n1}-\bar{z})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(z_{12}-\bar{z})^2+(z_{22}-\bar{z})^2+\cdots+(z_{m2}-\bar{z})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(z_{i1}-\bar{z})^2+\sum\limits_{i=1}^m(z_{i2}-\bar{z})^2\) - 羣內平方和(組內平方和,級內平方和,羣內變動,級內變動,變量內平方和,變量內變動,Within-groups sum of Squares):
\(S_W=\left\{(z_{11}-\bar{z_1})^2+(z_{21}-\bar{z_1})^2+\cdots+(z_{n1}-\bar{z_1})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(z_{12}-\bar{z_2})^2+(z_{22}-\bar{z_2})^2+\cdots+(z_{m2}-\bar{z_2})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})^2+\sum\limits_{i=1}^m(z_{i2}-\bar{z_2})^2\) - 羣間平方和(組間平方和,級間平方和,羣間變動,級間變動,變量間平方和,變量間變動,Between-groups sum of Squares):
\(S_B=\left\{(\bar{z_1}-\bar{z})^2+(\bar{z_1}-\bar{z})^2+\cdots+(\bar{z_1}-\bar{z})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(\bar{z_2}-\bar{z})^2+(\bar{z_2}-\bar{z})^2+\cdots+(\bar{z_2}-\bar{z})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(\bar{z_1}-\bar{z})^2+\sum\limits_{i=1}^m(\bar{z_2}-\bar{z})^2\\ \;\;\;\;=n(\bar{z_1}-\bar{z})^2+m(\bar{z_2}-\bar{z})^2\) - 上面三個平方和之間有如下的關系: \[\begin{equation} S_T=S_W+S_B \tag{2} \end{equation}\]
證明(2)式
解:
注意利用:\(\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})(\bar{z_1}-\bar{z})=(\bar{z_1}-\bar{z})\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\:=(\bar{z_1}-\bar{z})(\sum\limits_{i=1}^nz_{i1}-n\bar{z_1})\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\:=(\bar{z_1}-\bar{z})(n\bar{z_1}-n\bar{z_1})=0\)
因此
\[ \begin{equation} \begin{split} S_T & = \sum_{i=1}^n(z_{i1}-\bar{z})^2+\sum_{i=1}^m(z_{i2}-\bar{z})^2\\ & = \sum_{i=1}^n\left\{(z_{i1}-\bar{z_1})+(\bar{z_1}-\bar{z})\right\}^2\\ &\;\;\;\;\; + \sum_{i=1}^m\left\{(z_{i2}-\bar{z_2})+(\bar{z_2}-\bar{z})\right\}^2\\ & = \sum_{i=1}^n\left\{(z_{i1}-\bar{z_1})^2+2(z_{i1}-\bar{z_1})(\bar{z_1}-\bar{z})+(\bar{z_1}-\bar{z})^2\right\}\\ &\;\;\;\;\; +\sum_{i=1}^m\left\{(z_{i2}-\bar{z_2})^2+2(z_{i2}-\bar{z_2})(\bar{z_2}-\bar{z})+(\bar{z_2}-\bar{z})^2\right\}\\ & = \sum_{i=1}^n(z_{i1}-\bar{z_1})^2 + n(\bar{z_1}-\bar{z})^2\\ &\;\;\;\;\; + \sum_{i=1}^m(z_{i2}-\bar{z_2})^2 + m(\bar{z_2}-\bar{z})^2\\ & = S_W+S_B \end{split} \end{equation} \]
變量的合成與加權
我們說,將 \(p\) 個變量 \(x_1,x_2,\cdots,x_p\) 轉變成一次式:\(w_1x_1+w_2x_2+\cdots+w_px_p (=\hat{y})\) 的過程稱爲變量的合成 (linear combination of variables) \(\hat{y}\) 被叫做合成變量。系數 \(w_1,w_2,\cdots,w_p\) 被叫做權重 (weight)。假如 \(x_1,x_2,\cdots,x_p\) 是 \(p\) 個科目的考試得分,那麼:
- \(w_1=w_2=\cdots=w_p=1\) 時,\(\hat{y}\) 意思就是 \(p\) 個科目的總分
- \(w_1=w_2=\cdots=w_p=\frac{1}{p}\) 時,\(\hat{y}\) 意思就是 \(p\) 個科目的平均分
多元變量分析中,我們實質上做的許多事就是思考如何合理的決定這個權重。