Model comparison and validation with sdmTMB

class: center, middle, inverse, title-slide

.title[
# Model comparison and validation with sdmTMB
]
.subtitle[
## DFO DSAF workshop
]
.author[
### 
]
.date[
### January 12–16 2026
]

---

# What is the goal of modeling?

Prediction?
* in-sample, out-of-sample?
* what kind of out-of-sample?
  
Hypothesis testing or causal inference

Not all metrics appropriate for all applications

---

# Too many metrics to discuss

.small[
* Root mean squared error, median absolute relative error, ...

* AUC: true and false positive rates

* AIC
  * widely used tool in ecology
  * `$\mathrm{AIC} = -2 \log L + 2K$`, where `$K$` is the number of parameters
  * designed for fixed effects models (Burnham and Anderson 2002)
  
* Deviance explained
]

---

# AIC and likelihood with sdmTMB

* `AIC()` and `logLik()` methods work, just like `glm()`

``` r
mesh <- make_mesh(pcod,
  xy_cols = c("X", "Y"),
  cutoff = 10
)
fit <- sdmTMB(
  present ~ 1,
  data = pcod,
  mesh = mesh,
  family = binomial(link = "logit"),
  spatial = "on"
)
logLik(fit) # log likelihood
AIC(fit) # AIC
```

---

# When to use restricted maximum likelihood (REML)

* Integrates over random effects *and* fixed effects; sometimes helps convergence too

* *Can* use REML when comparing different random effect structures

* *Don't* use REML when comparing alternative fixed effects

* *Don't* use REML for index standardization

.small[

``` r
fit <- sdmTMB(..., reml = FALSE)
```
]

---

# Reminder: fixed and random effects in sdmTMB

Random effects include

* random intercepts and slopes: `(1 + x | group)`
* smooth terms: `s(year)`
* time-varying coefficients
* all random fields
* spatially varying coefficients (also random fields)

---

# Limitations of (marginal) AIC

.small[
* Originally conceived for fixed effects 
  * Burnham and Anderson (2002)

* Approximate & problematic for mixed effects
  * Vaida and Blanchard (2005)
  * Liang et al. (2008)
  
* Great [FAQ on `glmmTMB` by Ben Bolker](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#can-i-use-aic-for-mixed-models-how-do-i-count-the-number-of-degrees-of-freedom-for-a-random-effect)

* Conditional AIC: less widely used, computationally harder
  * Thorson 2024: <https://doi.org/10.1002/ecy.4327> for GMRF models
  * Also built into sdmTMB `?cAIC`
]

---

# Predictive model selection

* Ideal world: use cross validation to evaluate predictive accuracy

* Split data into train / test sets

* Objective function:
  * maximize log-likelihood of test data
  * also known as "log predictive density" or "log score"
  * or use a domain-specific metric you care about

---

# Log predictive density

Interpreted as theoretical expectation for new dataset  
  
Defined as the sum of the log likelihoods for left-out data. Sometimes averaged per data point

Analogous to ELPD in Bayesian literature

Higher is better; only makes sense when comparing models fit to the same data!

Great [FAQ on cross validation](https://users.aalto.fi/~ave/CV-FAQ.html) by Aki Vehtari

---

# Cross validation in sdmTMB

.small[

``` r
*fit <- sdmTMB_cv(
  present ~ 1,
  data = pcod,
  mesh = mesh,
  family = binomial(link = "logit"),
* k_folds = 8,
* fold_ids = NULL,
* parallel = TRUE,
* use_initial_fit = FALSE
)
```
]

.xsmall[
* More folds = more computation time but less cross validation variance and bias
* `use_initial_fit = TRUE`: fits first fold, and initializes subsequent model fits from that
* Uses the same mesh for all folds if `mesh` is supplied (you probably want this)
]

---

# Cross validation in sdmTMB

`sdmTMB_cv()` returns:

- `models`: A list of models (each `sdmTMB()` object)

- `data`: Original data plus columns for fold ID, CV predicted value, and CV log likelihood

- `fold_loglik`: sum of held out likelihoods for each fold

- `sum_loglik`: sum of held out likelihoods across all data

- and more...

---

# How to choose folds? How many?

.small[
* Words of wisdom:
  * Can be highly variable based on data, folds, sampling scheme
  
* Spatial sampling or random?
  * [blockCV R package](https://cran.r-project.org/web/packages/blockCV/index.html), [Valavi et al. (2019)](https://doi.org/10.1111/2041-210X.13107)
  
* How to sample with time / years? 
  * LOOCV (leave-one-out...) vs. LFOCV (leave-future-out...)

* `sdmTMB_cv()` does random fold assignment
  * Custom folds can be specified with `fold_ids`
]

---

# Basic cross validation example

Set a `future::plan()`; the folds will be fit in parallel

``` r
*library(future)
*plan(multisession)
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
*m_cv <- sdmTMB_cv(
  density ~ s(depth, k = 5), data = pcod, mesh = mesh,
  family = tweedie(link = "log"), 
* k_folds = 2
)
# Sum of log likelihoods of left-out data:
m_cv$sum_loglik # higher is better
#> [1] -6780.385
```

---

class: center, middle, inverse

# Deviance explained

---

# Deviance explained

* Deviance explained measures how much of the variation in the data is captured by the model compared to a null (usually intercept-only) model

* Analogous to an R² for models fit with likelihood methods (e.g., GLMs, GAMs, or GLMMs)

* Calculated as `1 − (model deviance / null deviance)`; higher values indicate a better fit

* Can also be used to compare % increase in deviance explained between two models fit to the same data

---

# Deviance explained in sdmTMB

``` r
mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10)
fit <- sdmTMB(
  density ~ depth_scaled,
  spatial = "on", 
  mesh = mesh,
  family = delta_gamma(), 
  data = pcod,
)

# model with *only* intercept and no random fields:
fit_null <- update(
  fit, 
* formula. = density ~ 1,
* spatial = "off"
)

# proportion deviance explained:
1 - deviance(fit) / deviance(fit_null)
#> [1] 0.3241151
```

---

# Proportional reduction in deviance by model component

We can quantify how much better the model fits when adding a smoother or spatial field by comparing model deviances.

Let's fit a model with a depth smoother and spatial random field:

``` r
fit <- sdmTMB(
  present ~ s(depth_scaled),
  spatial = "on", 
  mesh = mesh,
  family = binomial(), 
  data = pcod,
)
```

---

# Deviance reduction by component

Fit some sub-models for comparison:

``` r
# same model with no random fields:
fit_depth_only <- update(
  fit, 
* spatial = "off"
)
# same model with no depth smoother:
fit_rf_only <- update(
  fit, 
* formula. = . ~ 1
)
# null model
fit_null <- update(
  fit,
* spatial = "off",
* formula. = . ~ 1
)
```

---

### Deviance reduction by component

.small[
Proportional deviance explained by depth:

``` r
1 - deviance(fit_depth_only) / deviance(fit_null)
#> [1] 0.1979381
```
]

.small[
Proportional reduction in model deviance by depth beyond what the spatial field explains:

``` r
1 - deviance(fit) / deviance(fit_rf_only)
#> [1] 0.08834777
```
]

.small[
Proportional reduction in model deviance by the random field beyond what the depth smoother explains:

``` r
1 - deviance(fit) / deviance(fit_depth_only)
#> [1] 0.2364397
```
]

---

# Caveats with deviance comparisons

.small[
Must be fit to the same response data
]

.small[
Components can explain overlapping patterns in the data, so their deviance reductions don't add up neatly
]

.small[
Some families [require you to fix](https://github.com/pbs-assess/sdmTMB/discussions/471) distribution parameters between comparisons: Negative binomial, Tweedie, Generalized gamma, Lognormal
]

.small[
But these common families are fine (shape or dispersion parameters affect variance linearly or don't exist): Gaussian, Gamma, Poisson, Binomial
]

.small[
We may write a helper function for this soon!
]

---

# But I want a pseudo-R²!

In non-Gaussian or spatial GLMMs, variance partitioning becomes hard to interpret because "variance" depends on the link and distribution.

Comparing proportional reductions in deviance is often clearer.

But, yes, we've been working on a psuedo-R², but there are some technical + theoretical issues remaining.