Model Diagnostics

class: center, middle, inverse, title-slide

# Model Diagnostics
## PSYC 575
### Mark Lai
### University of Southern California
### 2021/09/25 (updated: 2021-09-26)

---

`$$\newcommand{\bv}[1]{\boldsymbol{\mathbf{#1}}}$$`

# Week Learning Objectives

- Describe the major **assumptions** in basic multilevel models

- Conduct analyses to decide whether **cluster means** and **random slopes** should be included

- Use graphical tools to diagnose assumptions of **linearity**, **homoscedasticity** (equal variance), and **normality**

- Solve some basic **convergence issues**

- **Report** results of a multilevel analysis based on established guidelines

---
class: center, middle

# Multilevel "Model" . . .

What is a model?

It is a set of **assumptions** of how the data are generated

---

# Two Components of a Parametric Model

## Functional Form

.pull-left[

`$$\mathrm{E}(Y_{ij} | \mathbf{X}, \mathbf{W}) = \gamma_{00} + \gamma_{10} X_{1ij} + \ldots + \gamma_{01} W_{1j} + \ldots$$`

Versus:

`$$\mathrm{E}(Y_{ij} | \mathbf{X}, \mathbf{W}) = \exp(\gamma_{00} + \gamma_{10} X_{1ij} + \ldots + \gamma_{01} W_{1j} + \ldots)$$`

]

.pull-right[

]

---

# Two Components of a Parametric Model

## Random Component

I.e., distribution of random effects/errors

`$$\begin{bmatrix}
    u_{0j} \\
    u_{1j}
  \end{bmatrix} \sim N \left(
      \begin{bmatrix}
        0 \\
        0
      \end{bmatrix}, 
      \begin{bmatrix}
        \tau^2_0 & \tau_{01} \\
        \tau_{01} & \tau^2_1
      \end{bmatrix}
    \right)$$`

`$$e_{ij} \sim N(0, \sigma)$$`

Versus `$e_{ij} \sim t_3(0, \sigma)$`

Or `$e_{ij} \sim N(0, \sigma_\color{red}{j})$`, where different clusters `$j$` have a different SD `$\sigma_j$`

---
class: inverse, middle, center

# Assumptions of Basic MLM

---

# Five Assumptions in Normal Linear Models

### Linearity

### Independence of errors (at the highest level)

### Normality

### Equal variance of errors (i.e., homoscedasticity)

### Correct Specification of the model

&zwj;Importance: S, L, I > E, N

---

# Assumptions Are Important

Your result is only as good as the assumptions

- Garbage in, garbage out

---

# Correct Specification

Fixed effects

- Cluster means should be included (unless between coefficient = within coefficient)

* Otherwise, between and within coefficients are conflated

- Relevant predictors should be included to answer the target research question

* E.g., Gender gap vs. gender gap adjusting for profession

Random effects

- If random slope variance is not zero, omitting it leads to inflated Type I error rates for fixed effects

* Varying slopes could also be an important information from the data

---

# Linearity

Lack of linear association `$\neq$` lack of association

---

# Independence of Errors

We use MLM because students within the same school are more similar (i.e., not independent)

If schools are from different school districts, they may also not be independent

- Need a three level model

Or, student A in school 1 is from the same neighborhood as student B in school 2

- Cross-classified model

Temporal dependence

- E.g., Repeated measures closer in time are more similar

* Autoregressive model

---

# Equal Variance of Errors (Homoscedasticity)

Residual plots

---

# Normality

.pull-left[

Quantile-quantile (QQ) plot

* Whether the 1st, 5th, 10th, ... percentiles of the residuals correspond to the 1st, 5th, 10th, ... percentiles of a normal distribution

Need to check both level 1 `$(e)$` and level 2 `$(u_0 \text{ and } u_1)$`

]

.pull-right[

]

---

# Examples data for which a normal model is not good

- Binary/ordinal outcome with < 5 categories (including the homework)

- Count data (e.g., # binge drinking episodes; # of success in 5 trials)

- Bounded data with ceiling/floor effects (e.g., depressive symptoms)

- Reaction time

---

# Additional Issues

- Outliers/influential observations
    * Check coding error
    * Don't drop outliers unless you adjust the standard errors accordingly, or use robust models

- Reliability (e.g., `$\alpha$` coefficient)
    * Reliability may be high at one level but low at another level
    * See Lai (2021, doi: 10.1037/met0000287) for level-specific reliability
        * You can use the `multilevel_alpha()` function from https://github.com/marklhc/mcfa_reliability_supp/blob/master/multilevel_alpha.R

---
class: inverse, middle, center

# Dealing With Convergence Issues

## See R codes

---
class: inverse, middle, center

# Reporting Results

---

References
- Chapter by McCoach (2019); Paper by Meteyard & Davies (2020)

Things to report:

.pull-left[

- Sample sizes
- Model equations
- Decisions and justifications for including or not including cluster means, centering, and random slopes
- Estimation methods, software program/package, and version number
- Intraclass correlation
- Convergence issues and handling
- Assumptions
- Tables of fixed and random effect coefficients
- Effect size

]

.pull-right[

- Model comparison criteria and indices
- Software code

]