Residuals with sdmTMB

class: center, middle, inverse, title-slide

.title[
# Residuals with sdmTMB
]
.subtitle[
## DFO DSAF workshop
]
.author[
### 
]
.date[
### January 12–16 2026
]

---

# Why look at residuals

- Assess whether the model is probabilistically consistent with the data

- Check for remaining spatial or temporal structure

- Evaluate missing covariate effects

- Detect over- or under- dispersion (from simulated observations)

- Assess zero inflation (from simulated observations)

---

# Good residuals, bad model

Residuals can only tell us whether the model is probabilistically consistent with the data or if we're ignoring major structural elements.

They don't tell us whether the model is any good at prediction!

---

# Types of residuals in sdmTMB

- *Many*, see `?residuals.sdmTMB`

- Most useful: "randomized quantile" (RQR) (Dunn & Smyth 1996)
- Sometimes called "probability integral transform" (PIT) residuals (Smith 1985).
  
.tiny[
Dunn, P.K. & Smyth, G.K. (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics, 5, 236–244.

Smith, J.Q. (1985). Diagnostic checks of non-standard time series models. Journal of Forecasting, 4, 283–291.
]

---

# Why use *randomized quantile residuals*?

Because they let us work with residuals that are expected to be:

- standard normal, N(0, 1)
- or uniform, U(0, 1)

(take your pick!)

regardless of the distribution family we work with.

---

# What does *randomized quantile* mean?

**Quantile**: Where does each observed value fall in its predicted distribution?

**Randomized**: Add randomization for any discrete data

E.g., for the gamma:

```r
u <- pgamma(q = y, shape = s1, scale = s2)
qnorm(u)
```

where `y`, are the observations and `s1` and `s2` are estimated in the model.
  
---

# What does *randomized quantile* mean?

**Quantile**: Where does each observed value fall in its predicted distribution?

**Randomized**: Add randomization for any discrete data

E.g., for the Poisson:

```r
a <- ppois(y - 1, mu)
b <- ppois(y, mu)
u <- runif(n = length(y), min = a, max = b)
qnorm(u)
```

where the `runif()` adds randomization.

.tiny[
See the [residuals vignette](https://pbs-assess.github.io/sdmTMB/articles/residual-checking.html) that expands on this.]

---

# Decisions within the type of residuals

1. How do we treat the fixed and random effects?

2. Do we use theoretical or simulation-based quantiles

---

# Fixed and random effects

*Many* options, useful ones to consider:

Hold fixed effects at their maximum likelihood estimates (MLEs)

Set random effects to one of:
  - Empirical Bayes estimates *(weird)*
  - One sample drawn from their precision matrix *(good enough, fast)*
  - One sample drawn from an MCMC chain *(great, slow)*

---

# An aside on empirical Bayes

.small[
What the heck is an empirical Bayes (EB) random effect!?
]

.small[
Find fixed effects that maximize the marginal log likelihood.
]

.small[
Hold the fixed effects at these MLEs.
]

.small[
Now use an optimizer to find the random effects that maximize the likelihood.
]

.small[
These are what you're used to getting out of lme4, mgcv, glmmTMB, sdmTMB, etc.!
]

.small[
Key concept: EB estimates are pulled towards the observations and are not expected to reflect the assumed random effect distribution
]

---

# MLE-EB residuals gone wrong

``` r
r1 <- residuals(fit, type = "mle-eb")
qqnorm(r1);abline(0, 1)
```

---

# MLE-MVN residuals gone right

``` r
r2 <- residuals(fit, type = "mle-mvn")
qqnorm(r2);abline(0, 1)
```

---

# Theoretical vs. simulation-based quantiles

Do we answer the question "Where does each observed value fall in its predicted distribution?" by

1. Using the theoretical CDF function, e.g., `pnorm()`, `pgamma()`, or

2. Simulating each observation many times and calculating the proportion of times our simulated values are `$\le$` the observation.

---

# Theoretical vs. simulation-based residual tradeoffs

- In sdmTMB, the theoretical RQRs are N(0, 1)

- In DHARMa, the simulation-based RQRs are U(0, 1)

- We haven't worked out the theoretical RQRs for all distributions or for delta models

- Simulation-based RQRs are stochastic (so set a seed and take enough draws)

---

# Normal(0, 1) vs. Uniform(0, 1)

Residuals from a perfectly matched model:

---

# Theoretical RQRs in sdmTMB

``` r
r <- residuals(fit, type = "mle-mvn")
qqnorm(r)
abline(0, 1)
```

---

# Simulation-based RSRs in sdmTMB

``` r
s <- simulate(fit, nsim = 200, type = "mle-mvn")
dharma_residuals(s, fit)

# or use the pipe:

simulate(fit, nsim = 200, type = "mle-mvn") |>
  dharma_residuals(fit)

# or return the DHARMa object itself:

simulate(fit, nsim = 200, type = "mle-mvn") |>
  dharma_residuals(fit, return_DHARMa = TRUE)
```

Note the `type = "mle-mvn"`!

---