第 56 章 貝葉斯多層回歸模型2 隨機斜率模型 multilevel models – varying slopes models

56.1 模擬隨機斜率數據

先定義機器人可能造訪的咖啡數據人羣 (population of cafes)。我們需要定義的有早晨和下午的平均等待時間 (average wait time),還有二者之間的相關係數 (correlation coefficient)。先在 R 裏面定義這幾個關鍵的參數,然後從中採集咖啡店樣本:

a <- 3.5                # average morning wait time
b <- (-1)               # average difference afternoon wait time
sigma_a <- 1            # standard dev in intercepts
sigma_b <- 0.5          # standard dev in slopes
rho <- (-0.7)           # correlation between intercepts and slopes

使用上述的參數來模擬咖啡店數據的時候,需要把他們中的一部分整理成一個 \(2\times2\) 的2維多項式高斯分佈的方差協方差矩陣。二維高斯分佈的均值就是很簡單的向量:

Mu <- c( a, b )

方差協方差矩陣應該表達成的數學形式是:

\[ \begin{pmatrix} \text{variance of intercepts} & \text{covariance of intercepts & slopes} \\ \text{covariance of intercepts & slopes} & \text{variance of slopes} \\ \end{pmatrix} \]

簡單地說就是:

\[ \begin{pmatrix} \sigma_\alpha^2 & \sigma_\alpha\sigma_\beta\rho \\ \sigma_\alpha\sigma_\beta\rho & \sigma_\beta^2 \end{pmatrix} \]

處於該方差協方差矩陣的對角線上的分別是,隨機截距的方差 \(\sigma^2_\alpha\),和隨機斜率的方差 \(\sigma^2_\beta\)。剩下的兩個就是相同的 \(\sigma_\alpha\sigma_\beta\rho\) 部分,它是斜率和截距之間的協方差。協方差僅僅是二者的標準差乘積乘以二者之間的相關係數。

在 R 裏書寫或者計算這個方差協方差矩陣的方法最簡單的是:

cov_ab <- sigma_a * sigma_b * rho
Sigma <- matrix( c(sigma_a^2, cov_ab, cov_ab, sigma_b^2), ncol = 2 )
Sigma
##       [,1]  [,2]
## [1,]  1.00 -0.35
## [2,] -0.35  0.25

另一種常見的方法是:

sigmas <- c(sigma_a, sigma_b) # standard deviations
Rho <- matrix( c(1, rho, rho, 1) , nrow = 2 ) # correlation matrix

# Now matrix multiply to get covariance matrix

Sigma <- diag(sigmas) %*% Rho %*% diag(sigmas)
Sigma
##       [,1]  [,2]
## [1,]  1.00 -0.35
## [2,] -0.35  0.25

第二種方法看起來稍微複雜些,但是有助於理解我們在貝葉斯模型中對這些參數設定先驗概率分佈的方法。

接下來我們可以開始採集咖啡店數據樣本了。假設我們只想要採集20家咖啡店的等待數據,那麼先定義:

N_cafes <- 20

讓計算機模擬這20家咖啡店的等待時間數據,我們需要做的就是指定它們之間的關係,也就是使用上面定義好的斜率,截距之間的二維高斯分佈矩陣:

library(MASS)
set.seed(5)
vary_effects <- mvrnorm( N_cafes, Mu, Sigma )

現在看看我們採集的這個20家咖啡的樣本數據 vary_effect ,它現在應該是一個有 20 行 2 列的矩陣:

vary_effects
##            [,1]        [,2]
##  [1,] 4.2239624 -1.60935645
##  [2,] 2.0104980 -0.75177037
##  [3,] 4.5658109 -1.94826458
##  [4,] 3.3436353 -1.19265388
##  [5,] 1.7009705 -0.58556181
##  [6,] 4.1343727 -1.14445388
##  [7,] 3.7944692 -1.62646613
##  [8,] 3.9465976 -1.71527939
##  [9,] 3.8642671 -0.90716770
## [10,] 3.4676137 -0.68040541
## [11,] 2.2428755 -0.61815155
## [12,] 4.1595055 -1.65921204
## [13,] 4.3002830 -2.11254740
## [14,] 3.5069480 -1.44064299
## [15,] 4.3820864 -1.87989834
## [16,] 3.5211329 -1.35069860
## [17,] 4.2167127 -0.91927985
## [18,] 5.9130025 -1.23136238
## [19,] 3.4773057 -0.35703411
## [20,] 3.7748988 -1.05704571

每一行就是一家咖啡店的數據。第一列是截距,第二列是斜率。我們來把他們各自提取出來命名:

a_cafe <- vary_effects[, 1] # the intercepts
b_cafe <- vary_effects[, 2] # the slopes

下面的代碼把上面採集的截距和斜率數據之間的關係繪製成帶有橢圓形二維高斯分佈的散點關係圖:

plot( a_cafe, b_cafe, col = rangi2, 
      xlab = "intercepts (a_cafe)", 
      ylab = "slopes (b_cafe)")

# overlay population distribution

# library(ellipse)
for ( l in c(0.1, 0.3, 0.5, 0.8, 0.99) )
  lines(ellipse::ellipse(Sigma, centre = Mu, level = l), col = col.alpha('black', 0.2))
20 cafes sampled from a statistial population. The horizontal axis is the intercept (average morning wait) for each cafe. The vertical axis is the slope (average difference between afternnon and morning wait) for each cafe. The gray ellipses illustrate the multivariate Gaussian population of intercepts and slopes.

圖 56.1: 20 cafes sampled from a statistial population. The horizontal axis is the intercept (average morning wait) for each cafe. The vertical axis is the slope (average difference between afternnon and morning wait) for each cafe. The gray ellipses illustrate the multivariate Gaussian population of intercepts and slopes.

56.1.1 模擬觀察值 simulate observations

上文中的 vary_effects 模擬的是20家餐廳的特徵值(也就是參數),並非實際上午/下午等待時間的觀察值。接下來我們可以模擬採集每家咖啡店的等待時間了。我們假設派機器人去每家餐廳十次,5次上午,5次下午,記錄每次的點餐等待時間,一共200次的觀察結果:

set.seed(22)
N_visits <- 10

afternoon <- rep(0:1, N_visits * N_cafes / 2)

cafe_id <- rep( 1:N_cafes, each = N_visits )

mu <- a_cafe[cafe_id] + b_cafe[cafe_id] * afternoon
sigma <- 0.5 # std dev within cafes

wait <- rnorm( N_visits*N_cafes, mu, sigma )

d <- data.frame( cafe = cafe_id, afternoon = afternoon, wait = wait)