\[ \newcommand{\bv}[1]{\boldsymbol{\mathbf{#1}}} \]
Click here to download the Rmd file: week4-level-1-predictor.Rmd
# To install a package, run the following ONCE (and only once on your computer)
# install.packages("psych")
library(here) # makes reading data more consistent
library(tidyverse) # for data manipulation and plotting
library(haven) # for importing SPSS/SAS/Stata data
library(lme4) # for multilevel analysis
library(sjPlot) # for plotting effects
library(MuMIn) # for computing r-squared
library(broom.mixed) # for summarizing results
library(modelsummary) # for making tables
library(interactions) # for plotting interactions
theme_set(theme_bw()) # Theme; just my personal preference
msummary_mixed <- function(models, coef_map = NULL, ...) {
if (is.null(coef_map)) {
if (!"list" %in% class(models)) {
models <- list(models)
}
for (model in models) {
coef_map <- union(coef_map, tidy(model)$term)
}
ranef_index <- grep("^(sd|cor)__", x = coef_map)
coef_map <- c(coef_map[-ranef_index], coef_map[ranef_index])
names(coef_map) <- coef_map
} else {
ranef_index <- grep("^(sd|cor)__", x = names(coef_map))
}
rows <- data.frame(term = c("Fixed Effects", "Random Effects"))
rows <- cbind(rows, rbind(
rep("", length(models)),
rep("", length(models))
))
length_fes <- length(coef_map) - length(ranef_index)
if ("statistic" %in% names(list(...)) && is.null(list(...)$statistic)) {
attr(rows, "position") <- c(1, (length_fes + 1))
} else {
attr(rows, "position") <- c(1, (length_fes + 1) * 2)
}
modelsummary::msummary(models, coef_map = coef_map, add_rows = rows, ...)
}
To separate the effects of a lv-1 predictor into different levels, one needs to first center the predictor on the cluster means:
hsball <- hsball %>%
group_by(id) %>% # operate within schools
mutate(ses_cm = mean(ses), # create cluster means (the same as `meanses`)
ses_cmc = ses - ses_cm) %>% # cluster-mean centered
ungroup() # exit the "editing within groups" mode
# The meanses variable already exists in the original data, but it's slightly
# different from computing it by hand
hsball %>%
select(id, ses, meanses, ses_cm, ses_cmc)
># # A tibble: 7,185 × 5
># id ses meanses ses_cm ses_cmc
># <chr> <dbl> <dbl> <dbl> <dbl>
># 1 1224 -1.53 -0.428 -0.434 -1.09
># 2 1224 -0.588 -0.428 -0.434 -0.154
># 3 1224 -0.528 -0.428 -0.434 -0.0936
># 4 1224 -0.668 -0.428 -0.434 -0.234
># 5 1224 -0.158 -0.428 -0.434 0.276
># 6 1224 0.022 -0.428 -0.434 0.456
># 7 1224 -0.618 -0.428 -0.434 -0.184
># 8 1224 -0.998 -0.428 -0.434 -0.564
># 9 1224 -0.888 -0.428 -0.434 -0.454
># 10 1224 -0.458 -0.428 -0.434 -0.0236
># # … with 7,175 more rows
Lv-1:
\[\text{mathach}_{ij} = \beta_{0j} + \beta_{1j} \text{ses_cmc}_{ij} + e_{ij}\]
Lv-2:
\[ \begin{aligned} \beta_{0j} & = \gamma_{00} + \gamma_{01} \text{meanses}_j + u_{0j} \\ \beta_{1j} & = \gamma_{10} \end{aligned} \] where
ses
representing the expected
difference in student achievement between two students in the same
school with one unit difference in ses
,meanses
.We can specify the model as:
m_bw <- lmer(mathach ~ meanses + ses_cmc + (1 | id), data = hsball)
You can summarize the results using
summary(m_bw)
># Linear mixed model fit by REML ['lmerMod']
># Formula: mathach ~ meanses + ses_cmc + (1 | id)
># Data: hsball
>#
># REML criterion at convergence: 46569
>#
># Scaled residuals:
># Min 1Q Median 3Q Max
># -3.167 -0.725 0.017 0.756 2.945
>#
># Random effects:
># Groups Name Variance Std.Dev.
># id (Intercept) 2.69 1.64
># Residual 37.02 6.08
># Number of obs: 7185, groups: id, 160
>#
># Fixed effects:
># Estimate Std. Error t value
># (Intercept) 12.648 0.149 84.7
># meanses 5.866 0.362 16.2
># ses_cmc 2.191 0.109 20.2
>#
># Correlation of Fixed Effects:
># (Intr) meanss
># meanses -0.004
># ses_cmc 0.000 0.000
With a level-1 predictor like ses
, which has both
student-level and school-level variance, we can include both the level-1
variable and the school mean variable as predictors. When the level-1
predictor is present, the coefficient for the group mean variable
becomes the contextual effect.
Lv-1:
\[\text{mathach}_{ij} = \beta_{0j} + \beta_{1j} \text{ses}_{ij} + e_{ij}\]
Lv-2:
\[ \begin{aligned} \beta_{0j} & = \gamma_{00} + \gamma_{01} \text{meanses}_j + u_{0j} \\ \beta_{1j} & = \gamma_{10} \end{aligned} \] where
ses
representing the expected
difference in student achievement between two students in the same
school with one unit difference in ses
,ses
but from two
schools with one unit difference in meanses
.We can specify the model as:
contextual <- lmer(mathach ~ meanses + ses + (1 | id), data = hsball)
You can summarize the results using
summary(contextual)
># Linear mixed model fit by REML ['lmerMod']
># Formula: mathach ~ meanses + ses + (1 | id)
># Data: hsball
>#
># REML criterion at convergence: 46569
>#
># Scaled residuals:
># Min 1Q Median 3Q Max
># -3.167 -0.725 0.017 0.756 2.945
>#
># Random effects:
># Groups Name Variance Std.Dev.
># id (Intercept) 2.69 1.64
># Residual 37.02 6.08
># Number of obs: 7185, groups: id, 160
>#
># Fixed effects:
># Estimate Std. Error t value
># (Intercept) 12.661 0.149 84.76
># meanses 3.675 0.378 9.73
># ses 2.191 0.109 20.16
>#
># Correlation of Fixed Effects:
># (Intr) meanss
># meanses -0.005
># ses 0.004 -0.288
If you compare the REML criterion at convergence
number
you can see this is the same as the between-within model. The estimated
contextual effect is the coefficient of meanses
minus the
coefficient of ses_cmc
, which is the same as what you will
get in the contextual effect model.
hsball %>%
# randomly sample 16 schools
filter(id %in% sample(unique(id), 16)) %>%
ggplot(aes(x = ses, y = mathach)) +
geom_point(size = 0.5) +
geom_smooth(method = "lm") +
facet_wrap(~id)
Lv-1:
\[\text{mathach}_{ij} = \beta_{0j} + \beta_{1j} \text{ses_cmc}_{ij} + e_{ij}\]
Lv-2:
\[ \begin{aligned} \beta_{0j} & = \gamma_{00} + \gamma_{01} \text{meanses}_j + u_{0j} \\ \beta_{1j} & = \gamma_{10} + u_{1j} \end{aligned} \] The additional term is \(u_{1j}\), which represents the deviation of the slope of school \(j\) from the average slope (i.e., \(\gamma_{10}\)).
We have to put ses
in the random part:
ran_slp <- lmer(mathach ~ meanses + ses_cmc + (ses_cmc | id), data = hsball)
You can summarize the results using
summary(ran_slp)
># Linear mixed model fit by REML ['lmerMod']
># Formula: mathach ~ meanses + ses_cmc + (ses_cmc | id)
># Data: hsball
>#
># REML criterion at convergence: 46558
>#
># Scaled residuals:
># Min 1Q Median 3Q Max
># -3.177 -0.727 0.016 0.756 2.941
>#
># Random effects:
># Groups Name Variance Std.Dev. Corr
># id (Intercept) 2.693 1.641
># ses_cmc 0.686 0.828 -0.19
># Residual 36.713 6.059
># Number of obs: 7185, groups: id, 160
>#
># Fixed effects:
># Estimate Std. Error t value
># (Intercept) 12.645 0.149 84.7
># meanses 5.896 0.360 16.4
># ses_cmc 2.191 0.128 17.1
>#
># Correlation of Fixed Effects:
># (Intr) meanss
># meanses -0.004
># ses_cmc -0.088 0.004
meanses
# augmented data (adding EB estimates)
augment(ran_slp, data = hsball) %>%
ggplot(aes(x = ses, y = mathach, color = factor(id))) +
# Add points
geom_point(size = 0.2, alpha = 0.2) +
# Add within-cluster lines
geom_smooth(aes(y = .fitted),
method = "lm", se = FALSE, size = 0.5) +
# Add group means
stat_summary(aes(x = meanses, y = .fitted,
fill = factor(id)),
color = "red", # add border
fun = mean,
geom = "point",
shape = 24,
# use triangles
size = 2.5) +
# Add between coefficient
geom_smooth(aes(x = meanses, y = .fitted),
method = "lm", se = FALSE,
color = "black") +
labs(y = "Math Achievement") +
# Suppress legend
guides(color = "none", fill = "none")
Or on separate graphs:
# Create a common base graph
pbase <- augment(ran_slp, data = hsball) %>%
ggplot(aes(x = ses, y = mathach, color = factor(id))) +
# Add points
geom_point(size = 0.2, alpha = 0.2) +
labs(y = "Math Achievement") +
# Suppress legend
guides(color = "none")
# Lv-1 effect
p1 <- pbase +
# Add within-cluster lines
geom_smooth(aes(y = .fitted),
method = "lm", se = FALSE, size = 0.5)
# Lv-2 effect
p2 <- pbase +
# Add group means
stat_summary(aes(x = meanses, y = .fitted),
fun = mean,
geom = "point",
shape = 17,
# use triangles
size = 2.5) +
# Add between coefficient
geom_smooth(aes(x = meanses, y = .fitted),
method = "lm", se = FALSE,
color = "black")
# Put the two graphs together (need the gridExtra package)
gridExtra::grid.arrange(p1, p2, ncol = 2) # in two columns
As discussed in the previous week, we can obtain the effect size (\(R^2\)) for the model:
(r2_ran_slp <- r.squaredGLMM(ran_slp))
># R2m R2c
># [1,] 0.1684 0.2311
sector
is added to the level-2 intercept and slope
equations
Lv-1:
\[\text{mathach}_{ij} = \beta_{0j} + \beta_{1j} \text{ses_cmc}_{ij} + e_{ij}\]
Lv-2:
\[
\begin{aligned}
\beta_{0j} & = \gamma_{00} + \gamma_{01} \text{meanses}_j +
\gamma_{02} \text{sector}_j + u_{0j} \\
\beta_{1j} & = \gamma_{10} + \gamma_{11} \text{sector}_j +
u_{1j}
\end{aligned}
\] where - \(\gamma_{02}\) =
regression coefficient of school-level sector
variable
representing the expected difference in achievement between two students
with the same SES level and from two schools with the same school-level
SES, but one is a catholic school and the other a private school. -
\(\gamma_{11}\) = cross-level
interaction coefficient of the expected slope difference
between a catholic and a private school with the same school-level
SES.
We have to put the sector * ses
interaction to the fixed
part:
You can summarize the results using
summary(crlv_int)
># Linear mixed model fit by REML ['lmerMod']
># Formula: mathach ~ meanses + sector * ses_cmc + (ses_cmc | id)
># Data: hsball
>#
># REML criterion at convergence: 46514
>#
># Scaled residuals:
># Min 1Q Median 3Q Max
># -3.1156 -0.7291 0.0159 0.7528 3.0154
>#
># Random effects:
># Groups Name Variance Std.Dev. Corr
># id (Intercept) 2.381 1.543
># ses_cmc 0.272 0.521 0.24
># Residual 36.708 6.059
># Number of obs: 7185, groups: id, 160
>#
># Fixed effects:
># Estimate Std. Error t value
># (Intercept) 12.085 0.199 60.81
># meanses 5.245 0.368 14.24
># sectorCatholic 1.252 0.306 4.09
># ses_cmc 2.788 0.156 17.89
># sectorCatholic:ses_cmc -1.348 0.235 -5.74
>#
># Correlation of Fixed Effects:
># (Intr) meanss sctrCt ss_cmc
># meanses 0.245
># sectorCthlc -0.697 -0.355
># ses_cmc 0.071 -0.002 -0.046
># sctrCthlc:_ -0.048 -0.001 0.071 -0.664
By comparing the \(R^2\) of this model (with interaction) and the previous model (without the interaction), we can obtain the proportion of variance accounted for by the cross-level interaction:
(r2_crlv_int <- r.squaredGLMM(crlv_int))
># R2m R2c
># [1,] 0.1759 0.2284
# Change
r2_crlv_int - r2_ran_slp
># R2m R2c
># [1,] 0.007496 -0.002643
Therefore, the cross-level interaction between sector and SES accounted for an effect size of \(\Delta R^2\) = 0.7%. This is usually considered a small effect size (i.e., < 1%), but interpreting the magnitude of an effect size needs to take into account the area of research, the importance of the outcome, and the effect sizes when using other predictors to predict the same outcome.
With a cross-level interaction, the slope between
ses_cmc
and mathach
depends on a moderator,
sector
. To find out what the model suggests to be the slope
at a particular level of sector
, which is also called a
simple slope in the literature, you just need to look at the
equation for \(\beta_{1j}\), which
shows that the predicted slope is \[\hat
\beta_{1} = \hat \gamma_{10} + \hat \gamma_{11} \text{sector}.\]
So when sector
= 0 (public school), the simple slope is
\[\hat \beta_{1} = \hat \gamma_{10} + \hat
\gamma_{11} (0),\] or 2.7877. When sector
= 1
(private school), the simple slope is \[\hat
\beta_{1} = \hat \gamma_{10} + \hat \gamma_{11} (1),\] or 2.7877
+ (-1.3478)(1) = 1.44.
You can also use the interactions::interact_plot()
function for just the fixed effects:
interact_plot(crlv_int,
pred = "ses_cmc",
modx = "sector",
modx.labels = c("Public", "Catholic"),
plot.points = TRUE,
point.size = 0.5,
point.alpha = 0.2,
facet.modx = TRUE, # use this to make two panels
x.label = "SES (cluster-mean centered)",
y.label = "Math achievement")
You can again use the msummary()
function from the
modelsummary
package to quickly summarize the results in a
table. However, there is an issue as the order of the coefficients are
messed up, as shown below.
# Basic table from `modelsummary` (ordering messed up)
msummary(list(
"Between-within" = m_bw,
"Contextual" = contextual,
"Random slope" = ran_slp,
"Cross-level interaction" = crlv_int
))
Between-within | Contextual | Random slope | Cross-level interaction | |
---|---|---|---|---|
(Intercept) | 12.648 | 12.661 | 12.645 | 12.085 |
(0.149) | (0.149) | (0.149) | (0.199) | |
meanses | 5.866 | 3.675 | 5.896 | 5.245 |
(0.362) | (0.378) | (0.360) | (0.368) | |
ses_cmc | 2.191 | 2.191 | 2.788 | |
(0.109) | (0.128) | (0.156) | ||
sd__(Intercept) | 1.641 | 1.641 | 1.641 | 1.543 |
sd__Observation | 6.084 | 6.084 | 6.059 | 6.059 |
ses | 2.191 | |||
(0.109) | ||||
cor__(Intercept).ses_cmc | −0.195 | 0.244 | ||
sd__ses_cmc | 0.828 | 0.521 | ||
sectorCatholic | 1.252 | |||
(0.306) | ||||
sectorCatholic × ses_cmc | −1.348 | |||
(0.235) | ||||
Num.Obs. | 7185 | 7185 | 7185 | 7185 |
AIC | 46578.6 | 46578.6 | 46571.7 | 46532.5 |
BIC | 46613.0 | 46613.0 | 46619.8 | 46594.4 |
Log.Lik. | −23284.289 | −23284.290 | −23278.825 | −23257.264 |
REMLcrit | 46568.577 | 46568.580 | 46557.651 | 46514.529 |
I’ve made a function msummary_mixed()
, defined in the
beginning of this page, as a hack to get a nicer table. Basically, you
can just replace msummary()
with
msummary_mixed()
, with everything else being the same:
# Basic table using `msummary_mixed`
msummary_mixed(list(
"Between-within" = m_bw,
"Contextual" = contextual,
"Random slope" = ran_slp,
"Cross-level interaction" = crlv_int
))
Between-within | Contextual | Random slope | Cross-level interaction | |
---|---|---|---|---|
Fixed Effects | ||||
(Intercept) | 12.648 | 12.661 | 12.645 | 12.085 |
(0.149) | (0.149) | (0.149) | (0.199) | |
meanses | 5.866 | 3.675 | 5.896 | 5.245 |
(0.362) | (0.378) | (0.360) | (0.368) | |
ses_cmc | 2.191 | 2.191 | 2.788 | |
(0.109) | (0.128) | (0.156) | ||
ses | 2.191 | |||
(0.109) | ||||
sectorCatholic | 1.252 | |||
(0.306) | ||||
sectorCatholic:ses_cmc | −1.348 | |||
(0.235) | ||||
Random Effects | ||||
sd__(Intercept) | 1.641 | 1.641 | 1.641 | 1.543 |
sd__Observation | 6.084 | 6.084 | 6.059 | 6.059 |
cor__(Intercept).ses_cmc | −0.195 | 0.244 | ||
sd__ses_cmc | 0.828 | 0.521 | ||
Num.Obs. | 7185 | 7185 | 7185 | 7185 |
AIC | 46578.6 | 46578.6 | 46571.7 | 46532.5 |
BIC | 46613.0 | 46613.0 | 46619.8 | 46594.4 |
Log.Lik. | −23284.289 | −23284.290 | −23278.825 | −23257.264 |
REMLcrit | 46568.577 | 46568.580 | 46557.651 | 46514.529 |
Make it look like the table at https://apastyle.apa.org/style-grammar-guidelines/tables-figures/sample-tables#regression. You can see a recent recommendation in this paper published in the Journal of Memory and Language
# Step 1. Create a map for the predictors and the terms in the model.
# Need to use '\\( \\)' to show math.
# Rename and reorder the rows. Need to use '\\( \\)' to
# show math. If this does not work for you, don't worry about it.
cm <- c("(Intercept)" = "Intercept",
"meanses" = "school mean SES",
"ses_cmc" = "SES (cluster-mean centered)",
"sectorCatholic" = "Catholic school",
"sectorCatholic:ses_cmc" = "Catholic school x SES (cmc)",
"sd__(Intercept)" = "\\(\\tau_0\\)",
"sd__ses_cmc" = "\\(\\tau_1\\) (SES)",
"sd__Observation" = "\\(\\sigma\\)")
# Step 2. Create a list of models to have one column for coef,
# one for SE, one for CI, and one for p
models <- list(
"Estimate" = crlv_int,
"SE" = crlv_int,
"95% CI" = crlv_int
)
# Step 3. Add rows to say fixed and random effects
# (Need same number of columns as the table)
new_rows <- data.frame(
term = c("Fixed effects", "Random effects"),
est = c("", ""),
se = c("", ""),
ci = c("", "")
)
# Specify which rows to add
attr(new_rows, "position") <- c(1, 7)
# Step 4. Render the table
msummary(models,
estimate = c("estimate", "std.error",
"[{conf.low}, {conf.high}]"),
statistic = NULL, # suppress the extra rows for SEs
coef_map = cm,
add_rows = new_rows)
Estimate | SE | 95% CI | |
---|---|---|---|
Fixed effects | |||
Intercept | 12.085 | 0.199 | [11.695, 12.474] |
school mean SES | 5.245 | 0.368 | [4.523, 5.967] |
SES (cluster-mean centered) | 2.788 | 0.156 | [2.482, 3.093] |
Catholic school | 1.252 | 0.306 | [0.652, 1.852] |
Catholic school x SES (cmc) | −1.348 | 0.235 | [−1.808, −0.888] |
Random effects | |||
\(\tau_0\) | 1.543 | ||
\(\tau_1\) (SES) | 0.521 | ||
\(\sigma\) | 6.059 | ||
Num.Obs. | 7185 | 7185 | 7185 |
AIC | 46532.5 | 46532.5 | 46532.5 |
BIC | 46594.4 | 46594.4 | 46594.4 |
Log.Lik. | −23257.264 | −23257.264 | −23257.264 |
REMLcrit | 46514.529 | 46514.529 | 46514.529 |