第 116 章粗率比和分層率比 Crude and stratified rate ratios

Epidemiology is like a bikini: what is revealed is interesting; what is concealed is crucial.

Peter Duesberg

The Statistical Methods in Epidemiology lectures were orgainised and taught by Professor Simon Cousens, Professor Katherine Fielding, Dr Susannah Woodd, also special thanks to Dr Emily Webb.

本章內容不討論任何理論的東西，着重強調用 R/Stata 進行實際數據的分析，並加強對輸出結果的理解。

此次實戰演練的目的是學會怎樣計算死亡率比 (Rate Ratios, RR)。學會用 Mantel-Haenszel 法總結 RR，並討論其意義。同時我們儘可能展示Stata的使用方法，並且儘量也給出R的解決方案。

數據是從 Whitehall Cohort Study (whitehal.dta) 隨機採集10%的樣本數據。該研究的研究對象是英國的公務員羣體，調查的是心血管疾病的發病率和一些生活習慣之間的關係，更詳細的內容可以參考原始論文 (Marmot et al. 1978)，或者維基百科主頁。

116.1 Q1-Q3 讀取數據，簡單歸納，分割年齡，計算年齡組的粗死亡率

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . stset timeout, fail(all) enter(timein) origin(timein) id(id) scale(365.25)
## 
##                 id:  id
##      failure event:  all != 0 & all < .
## obs. time interval:  (timeout[_n-1], timeout]
##  enter on or after:  time timein
##  exit on or before:  failure
##     t for analysis:  (time-origin)/365.25
##             origin:  time timein
## 
## ------------------------------------------------------------------------------
##       1,677  total observations
##           0  exclusions
## ------------------------------------------------------------------------------
##       1,677  observations remaining, representing
##       1,677  subjects
##         403  failures in single-failure-per-subject data
##  27,605.371  total analysis time at risk and under observation
##                                                 at risk from t =         0
##                                      earliest observed entry t =         0
##                                           last observed exit t =  19.38123
## 
## .

經過 Stata stset 命令的生存時間總結。我們指定了數據 whitehal 中的總死亡 all 作爲觀察結果，時間變量分別是 timein 和 timeout，單位是“日”。我們發現在這個樣本中，1677名隨訪對象一共有403名在研究結束以前死亡。時間尺度 timein 是從被錄用爲公務員的第一天，隨訪結束時間是 timeout，最長的人觀察了 19.4 年。相似的信息在R裏面實現的方法也很簡單：

whitehal <- read_dta("../backupfiles/whitehal.dta") # read in the dataset
whitehal$followupyrs <- (whitehal$timeout - whitehal$timein)/365.25 # change time as years
max(whitehal$followupyrs*365.25) # time difference in days

## Time difference of 7078.9927 days

summary(whitehal$followupyrs <- as.numeric(whitehal$followupyrs)) # time difference in years

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.15063 17.16357 17.96020 16.46116 18.56262 19.38123

epiDisplay::tab1(whitehal$all, graph = FALSE)

## whitehal$all : 
##         Frequency Percent Cum. percent
## 0            1274      76           76
## 1             403      24          100
##   Total      1677     100          100

該數據中的變量 agein 其實是研究參與對象在進入隊列時的年齡。我們打算把這個變量改成 5 歲爲階梯的分類型變量 (40-44, 45-49, 50-54, …, 65-69)。在 Stata 會用到 egen 命令。同時，我們可以使用 strate 命令來查看這個樣本中隨着年齡增加，粗死亡率的變化是怎樣的。另一個獲取相同結果的命令是 stptime 命令。

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . quietly stset timeout, fail(all) enter(timein) origin(timein) id(id) scale(36
## > 5.25)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . strate agecat, per(1000)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Estimated failure rates
## Number of records  =       1677
## 
##   +------------------------------------------------------+
##   | agecat     D        Y      Rate     Lower      Upper |
##   |------------------------------------------------------|
##   |    40-    24   4.9186    4.8794    3.2705     7.2798 |
##   |    45-    45   7.8549    5.7289    4.2774     7.6729 |
##   |    50-    82   6.0598   13.5319   10.8983    16.8019 |
##   |    55-   118   5.2820   22.3400   18.6519    26.7573 |
##   |    60-   101   3.0948   32.6349   26.8525    39.6625 |
##   |------------------------------------------------------|
##   |    65-    33   0.3952   83.4921   59.3567   117.4412 |
##   +------------------------------------------------------+
##    Notes: Rate = D/Y = failures/person-time (per 1000).
##           Lower and Upper are bounds of 95% confidence intervals.
## 
## 
## . stptime, by(agecat) per(1000)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
##     agecat |  Person-time   Failures        Rate   [95% Conf. Interval]
## -----------+-----------------------------------------------------------
##        40- |    4918.6177         24   4.8794197   3.270524    7.279792
##        45- |    7854.8937         45   5.7289126   4.277427    7.672941
##        50- |    6059.7504         82    13.53191   10.89832    16.80191
##        55- |    5282.0143        118   22.339962    18.6519    26.75728
##        60- |    3094.8473        101    32.63489   26.85248    39.66248
##        65- |    395.24719         33   83.492055   59.35673    117.4412
## -----------+-----------------------------------------------------------
##      Total |    27605.371        403   14.598609   13.24067    16.09581
## 
## .

在 R 裏面你可以使用 epitools:pois.exact 的函數來計算並獲得相似，但不完全相同的結果（點估計完全一致，信賴區間的估計存在略微但不明顯的差別）：

# categorize agein into groups (40-44, 45-49, 50-54, ... , 65-69)
whitehal$agecat <- cut(whitehal$agein, breaks = seq(40, 70, 5), right = FALSE)
with(whitehal, table(agecat))

## agecat
## [40,45) [45,50) [50,55) [55,60) [60,65) [65,70) 
##     277     445     362     340     215      38

# examine how mortality rates change with age at entry
# 

whitehall_sum <- whitehal %>% group_by(agecat) %>%
  summarise(D = sum(all),
            Y = sum(followupyrs)) 
cbind(whitehall_sum, pois.exact(x = whitehall_sum$D, 
                                pt = whitehall_sum$Y/1000)[,-c(1, 6)])

##    agecat   D          Y         pt       rate      lower       upper
## 1 [40,45)  24 4918.61771 4.91861771  4.8794197  3.1263373   7.2601897
## 2 [45,50)  45 7854.89374 7.85489374  5.7289126  4.1787079   7.6657358
## 3 [50,55)  82 6059.75041 6.05975041 13.5319105 10.7623304  16.7966726
## 4 [55,60) 118 5282.01431 5.28201431 22.3399622 18.4913772  26.7533361
## 5 [60,65) 101 3094.84730 3.09484730 32.6348896 26.5816512  39.6543589
## 6 [65,70)  33  395.24719 0.39524719 83.4920545 57.4720887 117.2539259

116.2 Q4 計算不同年齡組相對於最年輕的年齡組的死亡率比 Rate ratio, RR

在 Stata 你需要使用 stmh 命令來計算死亡率比。

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . quietly stset timeout, fail(all) enter(timein) origin(timein) id(id) scale(36
## > 5.25) 
## 
## . stmh agecat, c(1, 0)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing agecat==1 vs. agecat==0
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.174         0.40         0.5250              0.715      1.927
## ------------------------------------------------------------------------
## 
## . stmh agecat, c(2, 0)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing agecat==2 vs. agecat==0
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          2.773        21.05         0.0000              1.760      4.371
## ------------------------------------------------------------------------
## 
## . stmh agecat, c(3, 0)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing agecat==3 vs. agecat==0
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          4.578        55.78         0.0000              2.952      7.101
## ------------------------------------------------------------------------
## 
## . stmh agecat, c(4, 0)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing agecat==4 vs. agecat==0
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          6.688        93.81         0.0000              4.286     10.438
## ------------------------------------------------------------------------
## 
## . stmh agecat, c(5, 0)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing agecat==5 vs. agecat==0
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##         17.111       210.78         0.0000             10.114     28.949
## ------------------------------------------------------------------------
## 
## .

但是在 R 裏面，需要使用最簡單的泊松回歸模型來計算上述死亡率比。

## rate ratios and 95% CIs for each age category compare with [40,44) age group
Model0 <- glm(all ~ agecat + offset(log(followupyrs)), family = poisson(link = "log"), data = whitehal); 
ci.exp(Model0)

##                   exp(Est.)          2.5%         97.5%
## (Intercept)    0.0048794197  0.0032705278  0.0072797841
## agecat[45,50)  1.1740971320  0.7154062934  1.9268827913
## agecat[50,55)  2.7732622719  1.7597191900  4.3705743919
## agecat[55,60)  4.5784055712  2.9519678778  7.1009572062
## agecat[60,65)  6.6882727538  4.2856745766 10.4377949444
## agecat[65,70) 17.1110624279 10.1140257533 28.9487553770

## The rate ratios are increasing with age although there is no statistical evidence
## at 5% level that the rate among 45-49 year olds is different to the rate among men
## who are <40 years

我們發現，死亡率比隨着年齡的增加而顯著升高。

116.3 Q5 分析另一個關於職業等級 `grade` 和死亡之間的關係。使用 Stata 的 `stmh` 命令來計算並比較較低等級職業 (low grade) 和較高等級職業 (high grade) 相比，死亡率比是多少。你認爲結果是否提供強有力的證據證明這兩種職業等級之間的死亡率存在顯著差異？

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . quietly stset timeout, fail(all) enter(timein) origin(timein) id(id) scale(36
## > 5.25) 
## 
## . strate grade, per(1000)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Estimated failure rates
## Number of records  =       1677
## 
##   +-----------------------------------------------------+
##   | grade     D         Y      Rate     Lower     Upper |
##   |-----------------------------------------------------|
##   |     1   221   20.3398   10.8654    9.5233   12.3966 |
##   |     2   182    7.2656   25.0496   21.6624   28.9665 |
##   +-----------------------------------------------------+
##    Notes: Rate = D/Y = failures/person-time (per 1000).
##           Lower and Upper are bounds of 95% confidence intervals.
## 
## 
## . stmh grade
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing grade==2 vs. grade==1
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          2.305        73.78         0.0000              1.895      2.805
## ------------------------------------------------------------------------
## 
## .

由於很顯然地，相比高等級的職業來說，較低等級職業的死亡率比是2.31倍之高，且95%信賴區間也不包含1，(p < 0.0001)。所以，我們認爲數據提供了很強的證據證明兩種職業等級之間的死亡率存在顯著差異。相同的結果可以在R裏面通過下面的R代碼來計算獲得。

Model1 <- glm(all ~ factor(grade) + offset(log(followupyrs)), family = poisson(link = "log"), data = whitehal); ci.exp(Model1)

##                  exp(Est.)         2.5%       97.5%
## (Intercept)    0.010865404 0.0095233128 0.012396632
## factor(grade)2 2.305446287 1.8947528380 2.805158793

## There is strong evidence that the all cause mortality rate differs between high
## and low grade workers.

## To examine whether the estimated RR for grade is confounded by age at entry
## we compare the crude RR =2.31 (1.90, 2.81) with the Mantel-Haenszel summary
## estimate.

116.4 Q6-Q8 試分析上一題中觀察到的職業等級和死亡率之間的關係，是否受到年齡的影響。先嘗試使用 `stmh, by()` 函數來分析不同年齡層中職業等級之間的死亡率比。是否有證據證明職業等級的高低和年齡之間在死亡率比的關係上存在交互作用？又或者是否有年齡的混淆因素會對職業等級和死亡之間的關係造成影響？

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . quietly stset timeout, fail(all) enter(timein) origin(timein) id(id) scale(36
## > 5.25) 
## 
## . stmh grade , by(agecat)
## 
##          failure _d:  all
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##   enter on or after:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimates of the rate ratio
##   comparing grade==2 vs. grade==1
##   by agecat
## 
##   +-------------------------------------+
##   | agecat   Rate ratio   Lower   Upper |
##   |-------------------------------------|
##   |    40-         1.22    0.42    3.57 |
##   |    45-         1.36    0.67    2.75 |
##   |    50-         1.92    1.23    3.01 |
##   |    55-         1.43    1.00    2.06 |
##   |    60-         1.21    0.82    1.80 |
##   |-------------------------------------|
##   |    65-         1.40    0.54    3.62 |
##   +-------------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for agecat
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.429        11.36         0.0008              1.160      1.761
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(5)   =     2.44
##                                                     Pr>chi2   =   0.7854
## 
## .

在Stata的計算過程中，我們很輕鬆地獲得了不同年齡層級內職業等級之間死亡率比，和對應的信賴區間。並且發現 Mantel-Haenszel 總結的控制了年齡因素之後的年齡調整死亡率比 (rate ratio adjusted by age categories) 是 1.43 (1.16, 1.76)，和上一題計算的粗死亡率比 2.31 (1.90, 2.81) 之間確實有些許不同。所以大概可以認爲年齡對這一關係造成一定程度的混雜 confounding。然而，最後命令行對所有年齡層級的死亡率比的變化做了交互作用的鑑定，並且比較過後發現並無顯著證據證明年齡和職業等級之間存在交互作用 (p = 0.79)。各年齡層的死亡率比並沒有太過誇張的變化，也就是 1.2 - 1.9 之間的差別。所以，我們認爲不需要報告年齡層個字的死亡率比。

這裏需要指出的是，Stata這三行的計算過程在 R 裏的實現稍微有一點點複雜：

whitehal_table <- aggregate(cbind(all, followupyrs) ~ grade + agecat, data = whitehal, sum)

stmh_array <- array(c(4, 20,   693.1284,  4225.4893,
                      10,35,   1363.821,  6491.072,
                      30,52,   1399.63,   4660.12,                                                        
                      51,67,   1832.169,  3449.846,
                      59,42,   1660.597,  1434.251,
                      28,5,    316.23840, 79.00879),
                      dim=c(2,2,6),
                      dimnames = list(
                      Grade=c("2","1"),
                      c("death", "Person_years"),
                      Agecat=names(table(whitehal$agecat))
                    ))
stmh_array

## , , Agecat = [40,45)
## 
##      
## Grade death Person_years
##     2     4     693.1284
##     1    20    4225.4893
## 
## , , Agecat = [45,50)
## 
##      
## Grade death Person_years
##     2    10     1363.821
##     1    35     6491.072
## 
## , , Agecat = [50,55)
## 
##      
## Grade death Person_years
##     2    30      1399.63
##     1    52      4660.12
## 
## , , Agecat = [55,60)
## 
##      
## Grade death Person_years
##     2    51     1832.169
##     1    67     3449.846
## 
## , , Agecat = [60,65)
## 
##      
## Grade death Person_years
##     2    59     1660.597
##     1    42     1434.251
## 
## , , Agecat = [65,70)
## 
##      
## Grade death Person_years
##     2    28    316.23840
##     1     5     79.00879

# mhgrade_age <- epi.2by2(stmh_array, method = "cohort.time", units = 1000)
mhgrade_age <- epi.2by2(stmh_array, method = "cohort.count", units = 1000)
mhgrade_age

##              Outcome +    Outcome -      Total        Inc risk *        Odds
## Exposed +          182         7266       7448              24.4      0.0250
## Exposed -          221        20340      20561              10.7      0.0109
## Total              403        27605      28008              14.4      0.0146
## 
## 
## Point estimates and 95% CIs:
## -------------------------------------------------------------------
## Inc risk ratio (crude)                         2.27 (1.87, 2.76)
## Inc risk ratio (M-H)                           1.42 (1.15, 1.74)
## Inc risk ratio (crude:M-H)                     1.60
## Odds ratio (crude)                             2.31 (1.89, 2.81)
## Odds ratio (M-H)                               1.43 (1.16, 1.77)
## Odds ratio (crude:M-H)                         1.61
## Attrib risk in the exposed (crude) *           13.69 (9.91, 17.47)
## Attrib risk in the exposed (M-H) *             6.03 (0.55, 11.52)
## Attrib risk (crude:M-H)                        2.27
## -------------------------------------------------------------------
##  M-H test of homogeneity of PRs: chi2(5) = 2.425 Pr>chi2 = 0.788
##  M-H test of homogeneity of ORs: chi2(5) = 2.447 Pr>chi2 = 0.784
##  Test that M-H adjusted OR = 1:  chi2(1) = 11.133 Pr>chi2 = <0.001
##  Wald confidence limits
##  M-H: Mantel-Haenszel; CI: confidence interval
##  * Outcomes per 1000 population units

## Overall estimate and Wald 95% confidence intervals,
## controlling for agecate
mhgrade_age$massoc.detail$RR.mh.wald

##         est     lower     upper
## 1 1.4180516 1.1538795 1.7427039

mhgrade_age$massoc.detail$chi2.mh ## p-value for age-adjusted MH rate ratio comparing with mull value 1.

##   test.statistic df   p.value.1s    p.value.2s
## 1      11.132745  1 0.0004240849 0.00084816981

## The Mantel-Haenszel summary estimate RR = 1.43 (1.16, 1.76).
## The result shows that the crude estimate of the effect of grade was
## partly confounded by age at entry.

## To assess whether there is effect modification betwee grade and
## agecat we examine the stratum specific estimates and assess
## whether there is evidence of important variation between them.
mhgrade_age$massoc.detail$RR.strata.wald

##         est      lower     upper
## 1 1.2179935 0.41756313 3.5527757
## 2 1.3572307 0.67373201 2.7341362
## 3 1.9015625 1.21802618 2.9686881
## 4 1.4215411 0.99187791 2.0373265
## 5 1.2059693 0.81689044 1.7803635
## 6 1.3666378 0.54398066 3.4333923

## The result indicates that the data are compatible with the assumption
## of no interaction/effect modification (p=0.79)

## test for unequal RRs (effect modification):
mhgrade_age$massoc.detail$wRR.homog

##   test.statistic df    p.value
## 1      2.4254139  5 0.78768357

## Hence, we do not need to present the stratum-specific estimates.

116.5 Q9 試分析職業等級和CHD，冠心病死亡之間的關係。此時在Stata需要重新修改你的觀察結果爲 `chd`。職業等級和冠心病死亡之間的關係是否受到吸菸習慣的影響，或者有交互作用呢？

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . stset timeout, fail(chd) origin(timein) id(id) scale(365.25)
## 
##                 id:  id
##      failure event:  chd != 0 & chd < .
## obs. time interval:  (timeout[_n-1], timeout]
##  exit on or before:  failure
##     t for analysis:  (time-origin)/365.25
##             origin:  time timein
## 
## ------------------------------------------------------------------------------
##       1,677  total observations
##           0  exclusions
## ------------------------------------------------------------------------------
##       1,677  observations remaining, representing
##       1,677  subjects
##         154  failures in single-failure-per-subject data
##  27,605.371  total analysis time at risk and under observation
##                                                 at risk from t =         0
##                                      earliest observed entry t =         0
##                                           last observed exit t =  19.38123
## 
## . stmh grade 
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing grade==2 vs. grade==1
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.991        18.44         0.0000              1.445      2.743
## ------------------------------------------------------------------------
## 
## . stmh grade, by(smok)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimates of the rate ratio
##   comparing grade==2 vs. grade==1
##   by smok
## 
##   +-----------------------------------+
##   | smok   Rate ratio   Lower   Upper |
##   |-----------------------------------|
##   |    1         3.88    1.18   12.72 |
##   |    2         1.64    0.93    2.91 |
##   |    3         2.03    1.04    3.94 |
##   |    4         1.33    0.70    2.56 |
##   |    5         1.93    0.72    5.18 |
##   +-----------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for smok
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.761        12.06         0.0005              1.274      2.435
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(4)   =     2.76
##                                                     Pr>chi2   =   0.5988
## 
## .

修改了觀察結果爲冠心病死亡之後，我們計算了職業等級高低之間相比較下的冠心病死亡率比，它未進行吸菸習慣調整時的結果是 1.99 (1.45, 2.74)，調整了吸菸習慣之後的 Mantel Haenszel 死亡率比變成了 1.76 (1.27, 2.44)。同時，檢驗結果表示並無證據證明吸菸和職業等級之間存在統計學意義上的交互作用。

116.6 Further Q1 請分析膽固醇水平和冠心病死亡之間的關係。這一關係是否受到年齡的混雜影響？

## 
## . use "../backupfiles/whitehal.dta", cl(Whitehall Study - 10% sample)
## 
## . egen agecat = cut(agein), at(40, 45, 50, 55, 60, 65, 70) label
## 
## . quietly stset timeout, fail(chd) origin(timein) id(id) scale(365.25)
## 
## . 
## . strate cholgrp, per(1000)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Estimated failure rates
## Number of records  =       1677
## 
##   +----------------------------------------------------+
##   | cholgrp    D         Y     Rate    Lower     Upper |
##   |----------------------------------------------------|
##   |       1   14    3.9523   3.5422   2.0979    5.9809 |
##   |       2   52   11.1837   4.6496   3.5430    6.1018 |
##   |       3   59    8.9152   6.6179   5.1275    8.5415 |
##   |       4   29    3.5541   8.1596   5.6703   11.7418 |
##   +----------------------------------------------------+
##    Notes: Rate = D/Y = failures/person-time (per 1000).
##           Lower and Upper are bounds of 95% confidence intervals.
## 
## 
## . 
## . stmh cholgrp, c(2,1)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing cholgrp==2 vs. cholgrp==1
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.313         0.82         0.3648              0.728      2.368
## ------------------------------------------------------------------------
## 
## . stmh cholgrp, c(3,1)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing cholgrp==3 vs. cholgrp==1
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.868         4.57         0.0326              1.043      3.346
## ------------------------------------------------------------------------
## 
## . stmh cholgrp, c(4,1)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimate of the rate ratio
##   comparing cholgrp==4 vs. cholgrp==1
## 
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          2.304         6.96         0.0083              1.217      4.359
## ------------------------------------------------------------------------
## 
## . stmh cholgrp
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Score test for trend of rates with cholgrp
## 
## Overall Mantel-Haenszel estimate
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.341        10.45         0.0012              1.122      1.601
## ------------------------------------------------------------------------
## Note: The Rate ratio estimate is an approximation to the rate ratio
##       for a one-unit increase in cholgrp.
## 
## . 
## . stmh cholgrp, c(2,1) by(agecat)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimates of the rate ratio
##   comparing cholgrp==2 vs. cholgrp==1
##   by agecat
## 
##   +-------------------------------------+
##   | agecat   Rate ratio   Lower   Upper |
##   |-------------------------------------|
##   |    40-         0.83    0.08    9.16 |
##   |    45-         1.57    0.18   14.05 |
##   |    50-         1.94    0.24   15.53 |
##   |    55-         0.90    0.39    2.09 |
##   |    60-            .       .       . |
##   |-------------------------------------|
##   |    65-         0.46    0.09    2.26 |
##   +-------------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for agecat
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.333         0.86         0.3538              0.725      2.451
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(5)   =     6.79
##                                                     Pr>chi2   =   0.2371
## 
## . stmh cholgrp, c(3,1) by(agecat)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimates of the rate ratio
##   comparing cholgrp==3 vs. cholgrp==1
##   by agecat
## 
##   +-------------------------------------+
##   | agecat   Rate ratio   Lower   Upper |
##   |-------------------------------------|
##   |    40-         2.82    0.34   23.42 |
##   |    45-         3.23    0.39   26.86 |
##   |    50-         5.92    0.79   44.22 |
##   |    55-         0.83    0.35    1.96 |
##   |    60-            .       .       . |
##   |-------------------------------------|
##   |    65-         0.64    0.15    2.70 |
##   +-------------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for agecat
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.740         3.51         0.0609              0.968      3.129
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(5)   =     9.23
##                                                     Pr>chi2   =   0.1001
## 
## . stmh cholgrp, c(4,1) by(agecat)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Mantel-Haenszel estimates of the rate ratio
##   comparing cholgrp==4 vs. cholgrp==1
##   by agecat
## 
##   +-------------------------------------+
##   | agecat   Rate ratio   Lower   Upper |
##   |-------------------------------------|
##   |    40-         2.91    0.26   32.06 |
##   |    45-         4.57    0.53   39.08 |
##   |    50-         4.22    0.51   35.02 |
##   |    55-         2.05    0.81    5.18 |
##   |    60-            .       .       . |
##   |-------------------------------------|
##   |    65-         0.00       .       . |
##   +-------------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for agecat
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          3.103        11.57         0.0007              1.560      6.175
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(5)   =     6.21
##                                                     Pr>chi2   =   0.2862
## 
## . stmh cholgrp, by(agecat)
## 
##          failure _d:  chd
##    analysis time _t:  (timeout-origin)/365.25
##              origin:  time timein
##                  id:  id
## 
## Score test for trend of rates with cholgrp
## 
## Mantel-Haenszel estimates of the rate ratio
##   by agecat
## 
##   +-------------------------------------+
##   | agecat   Rate ratio   Lower   Upper |
##   |-------------------------------------|
##   |    40-         1.70    0.87    3.32 |
##   |    45-         1.69    1.01    2.84 |
##   |    50-         1.63    1.10    2.40 |
##   |    55-         1.21    0.88    1.66 |
##   |    60-         1.55    1.01    2.37 |
##   |-------------------------------------|
##   |    65-         0.78    0.37    1.63 |
##   +-------------------------------------+
##   Note: Lower and Upper are bounds of 95% confidence intervals.
## 
## Overall Mantel-Haenszel estimate, controlling for agecat
## ------------------------------------------------------------------------
##     Rate ratio         chi2         P>chi2          [95% Conf. Interval]
## ------------------------------------------------------------------------
##          1.408        13.38         0.0003              1.172      1.691
## ------------------------------------------------------------------------
## Approx. test for unequal RRs (effect modification): chi2(5)   =     4.90
##                                                     Pr>chi2   =   0.4276
## 
## Note: The Rate ratio estimates are approximations to the rate ratios
##       for a one-unit increase in cholgrp.
## 
## .

把上面這些分析結果總結成爲一個簡潔的表格來，可以寫作：

	Events	Person-years	Rate/ 1000py	Crude RR (CI)	Age Adjusted RR (95% CI)
Cholesterol (mmol) <150 150-199 200-249 250+	14 52 59 29	3952 11183 8915 3554	3.54 4.65 6.62 8.16	1.00 1.31 (0.73-2.37) 1.87 (1.04-3.35) 2.30 (1.22-4.36) Ptrend = 0.001	1.00 1.33 (0.73–2.45) 1.74 (0.97–3.13) 3.10 (1.56–6.18) Ptrend = 0.0003

所以，我們的分析結果提供了很強的證據證明，參加實驗的這些公務員的血液膽固醇水平，在沒有任何調整的情況下和冠心病死亡成正相關（傾向性 Ptrend 值爲 0.001）。雖然似乎有跡象表明膽固醇更高的人羣可能調整年齡因素之後和冠心病死亡的關係更加顯著（負混雜, negative confounding），總體來說年齡調整前後的各個膽固醇水平和冠心病死亡之間的死亡率比並沒有太多不一樣，其結果同樣提供了極強的證據證明膽固醇水平和冠心病死亡之間呈正相關，傾向性 Ptrend 值爲 0.0003。

116.7 Further Q2 你認爲，當我們在分析膽固醇和冠心病死亡率之間的關係的時候，需要考慮調整收縮期血壓嗎 (systolic blood pressure)？如果你不同意，爲什麼？

思考這個問題的時候，首先需要考慮的是，收縮期血壓本身是否處在膽固醇和冠心病死亡這二者之間的的因果關係通路上 (on the causal pathway between cholesterol and CHD mortality)？

如果收縮期血壓在生物學機制上不在膽固醇和冠心病死亡之間的因果關係通路上，但是收縮期血壓本身獨立地對冠心病死亡率造成影響 (it has an independent effect on the risk of CHD mortality)。你如果同意這一觀點，那麼調整收縮期血壓是值得建議的。
如果說，收縮期血壓由於某種機制，確實處在膽固醇和冠心病死亡之間的因果關係通路上。那它就不是一個獨立地和冠心病死亡率之間有關的危險因子 (so SBP is not an independent risk factor for CHD mortality)。如果你同意這一情況，那麼，調整SBP是多餘甚至是錯誤的。因爲這將會導致我們低估膽固醇和冠心病死亡率之間的真實關係。

References

Marmot, Michael G, Geoffrey Rose, Martin Shipley, and Peter J Hamilton. 1978. “Employment Grade and Coronary Heart Disease in British Civil Servants.” Journal of Epidemiology & Community Health 32 (4): 244–49.

第 116 章 粗率比和分層率比 Crude and stratified rate ratios