「統計解析のための線形代数」復習筆記7

分解平方和 1

樣本量均爲 \(n\) 的兩變量 \(z, \hat{z}\) 如下表,已知這兩個變量滿足條件:
\(\bar{z}=\frac{1}{n}\sum\limits_{i=1}^nz_i=\frac{1}{n}\sum\limits_{i=1}^n\hat{z}_i=\bar{\hat{z}},\)
\(\sum\limits_{i=1}^n(z_i-\hat{z_i})(\hat{z_i}-\bar{z})=0\)

個体の番号 変量 \(z\) 変量 \(\hat{z}\)
\(1\) \(z_1\) \(\hat{z}_1\)
\(2\) \(z_2\) \(\hat{z}_2\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(i\) \(z_i\) \(\hat{z}_i\)
\(\vdots\) \(\vdots\) \(\vdots\)
\(n\) \(z_n\) \(\hat{z}_n\)

此時我們有:

  • 全平方和(全変動,總平方和,總變動, Total sum of Squares):
    \(S_T=(z_i-\bar{z})^2+(z_2-\bar{z})^2+\cdots+(z_n-\bar{z})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(z_i-\bar{z})^2\)
  • 回歸平方和(回歸變動,Regression sum of Squares)
    \(S_R=(\hat{z_1}-\bar{\hat{z}})^2+(\hat{z_2}-\bar{\hat{z}})^2+\cdots+(\hat{z_n}-\bar{\hat{z}})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(\hat{z_i}-\bar{\hat{z}})^2=\sum\limits_{i=1}^n(\hat{z_i}-\bar{z})^2\)
  • 殘差平方和(誤差平方和,殘差變動,誤差變動,residual sum of Squares)
    \(S_e=(z_1-\hat{z_1})^2+(z_2-\hat{z_2})^2+\cdots+(z_n-\hat{z_n})^2\\ \;\;\;\;=\sum\limits_{i=1}^n(z_i-\hat{z_i})^2\)
  • 上面三個平方和之間,有如下的關系: \[\begin{equation} S_T=S_R+S_e \tag{1} \end{equation}\]
    既:全平方和等於殘差平方和與回歸平方和之和。(1)式被稱爲平方和的分解(decomposition of sum of squares)
證明(1)
解:

\[ \begin{equation} \begin{split} S_T & = \sum\limits_{i=1}^n(z_i-\bar{z})^2 \\ & = \sum\limits_{i=1}^n\left\{(z_i-\hat{z_i})+(\hat{z_i}-\bar{z})\right\}^2\\ & = \sum\limits_{i=1}^n\left\{(z_i-\hat{z_i})^2+(\hat{z_i}-\bar{z})^2+2(z_i-\hat{z_i})(\hat{z_i}-\bar{z})\right\}\\ & = \sum\limits_{i=1}^n(z_i-\hat{z_i})^2+\sum\limits_{i=1}^n(\hat{z_i}-\bar{z})^2 + 0\\ & = S_e + S_R \end{split} \end{equation} \] 最後一步等式,利用了一開始給出的條件 \(\sum\limits_{i=1}^n(z_i-\hat{z_i})(\hat{z_i}-\bar{z})=0\)
這裏的平方和分解與回歸分析有着緊密的聯系。

分解平方和 2

有樣本量爲 \(n\) 的變量 \(z_1\) 與樣本量爲 \(m\) 的變量 \(z_2\) 的數據如下表:

変量 \(z_1\) 変量 \(z_2\)
\(z_{11}\) \(z_{12}\)
\(z_{21}\) \(z_{22}\)
\(\vdots\) \(\vdots\)
\(z_{i1}\) \(z_{i2}\)
\(\vdots\) \(\vdots\)
\(z_{n1}\) \(\vdots\)
\(z_{m2}\)

此時我們有:

  • 樣本平均值
    \(\bar{z_1}=\frac{1}{n}\sum\limits_{i=1}^nz_{i1}, \;\bar{z_2}=\frac{1}{m}\sum\limits_{i=1}^mz_{i2}\)
  • 樣本總平均值
    \(\bar{z}=\frac{1}{n+m}(\sum\limits_{i=1}^nz_{i1}+\sum\limits_{i=1}^mz_{i2})\)
  • 平方和 (全変動,總平方和,總變動, Total sum of Squares):
    \(S_T=\left\{(z_{11}-\bar{z})^2+(z_{21}-\bar{z})^2+\cdots+(z_{n1}-\bar{z})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(z_{12}-\bar{z})^2+(z_{22}-\bar{z})^2+\cdots+(z_{m2}-\bar{z})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(z_{i1}-\bar{z})^2+\sum\limits_{i=1}^m(z_{i2}-\bar{z})^2\)
  • 羣內平方和(組內平方和,級內平方和,羣內變動,級內變動,變量內平方和,變量內變動,Within-groups sum of Squares):
    \(S_W=\left\{(z_{11}-\bar{z_1})^2+(z_{21}-\bar{z_1})^2+\cdots+(z_{n1}-\bar{z_1})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(z_{12}-\bar{z_2})^2+(z_{22}-\bar{z_2})^2+\cdots+(z_{m2}-\bar{z_2})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})^2+\sum\limits_{i=1}^m(z_{i2}-\bar{z_2})^2\)
  • 羣間平方和(組間平方和,級間平方和,羣間變動,級間變動,變量間平方和,變量間變動,Between-groups sum of Squares):
    \(S_B=\left\{(\bar{z_1}-\bar{z})^2+(\bar{z_1}-\bar{z})^2+\cdots+(\bar{z_1}-\bar{z})^2\right\}\\ \;\;\;\;\;\;\;\;+\left\{(\bar{z_2}-\bar{z})^2+(\bar{z_2}-\bar{z})^2+\cdots+(\bar{z_2}-\bar{z})^2\right\}\\ \;\;\;\;=\sum\limits_{i=1}^n(\bar{z_1}-\bar{z})^2+\sum\limits_{i=1}^m(\bar{z_2}-\bar{z})^2\\ \;\;\;\;=n(\bar{z_1}-\bar{z})^2+m(\bar{z_2}-\bar{z})^2\)
  • 上面三個平方和之間有如下的關系: \[\begin{equation} S_T=S_W+S_B \tag{2} \end{equation}\]
證明(2)
解:

注意利用:\(\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})(\bar{z_1}-\bar{z})=(\bar{z_1}-\bar{z})\sum\limits_{i=1}^n(z_{i1}-\bar{z_1})\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\:=(\bar{z_1}-\bar{z})(\sum\limits_{i=1}^nz_{i1}-n\bar{z_1})\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\:=(\bar{z_1}-\bar{z})(n\bar{z_1}-n\bar{z_1})=0\)
因此

\[ \begin{equation} \begin{split} S_T & = \sum_{i=1}^n(z_{i1}-\bar{z})^2+\sum_{i=1}^m(z_{i2}-\bar{z})^2\\ & = \sum_{i=1}^n\left\{(z_{i1}-\bar{z_1})+(\bar{z_1}-\bar{z})\right\}^2\\ &\;\;\;\;\; + \sum_{i=1}^m\left\{(z_{i2}-\bar{z_2})+(\bar{z_2}-\bar{z})\right\}^2\\ & = \sum_{i=1}^n\left\{(z_{i1}-\bar{z_1})^2+2(z_{i1}-\bar{z_1})(\bar{z_1}-\bar{z})+(\bar{z_1}-\bar{z})^2\right\}\\ &\;\;\;\;\; +\sum_{i=1}^m\left\{(z_{i2}-\bar{z_2})^2+2(z_{i2}-\bar{z_2})(\bar{z_2}-\bar{z})+(\bar{z_2}-\bar{z})^2\right\}\\ & = \sum_{i=1}^n(z_{i1}-\bar{z_1})^2 + n(\bar{z_1}-\bar{z})^2\\ &\;\;\;\;\; + \sum_{i=1}^m(z_{i2}-\bar{z_2})^2 + m(\bar{z_2}-\bar{z})^2\\ & = S_W+S_B \end{split} \end{equation} \]

變量的合成與加權

我們說,將 \(p\) 個變量 \(x_1,x_2,\cdots,x_p\) 轉變成一次式:\(w_1x_1+w_2x_2+\cdots+w_px_p (=\hat{y})\) 的過程稱爲變量的合成 (linear combination of variables) \(\hat{y}\) 被叫做合成變量。系數 \(w_1,w_2,\cdots,w_p\) 被叫做權重 (weight)。假如 \(x_1,x_2,\cdots,x_p\)\(p\) 個科目的考試得分,那麼:

  • \(w_1=w_2=\cdots=w_p=1\) 時,\(\hat{y}\) 意思就是 \(p\) 個科目的總分
  • \(w_1=w_2=\cdots=w_p=\frac{1}{p}\) 時,\(\hat{y}\) 意思就是 \(p\) 個科目的平均分

多元變量分析中,我們實質上做的許多事就是思考如何合理的決定這個權重

Avatar
王 超辰 - Chaochen Wang
Real World Evidence Scientist

All models are wrong, but some are useful.

comments powered by Disqus

Related