class: center, middle, inverse, title-slide .title[ # Model comparison and validation with sdmTMB ] .subtitle[ ## DFO DSAF workshop ] .author[ ### ] .date[ ### January 12–16 2026 ] --- <!-- Build with: xaringan::inf_mr() --> # What is the goal of modeling? Prediction? * in-sample, out-of-sample? * what kind of out-of-sample? Hypothesis testing or causal inference Not all metrics appropriate for all applications --- # Too many metrics to discuss .small[ * Root mean squared error, median absolute relative error, ... * AUC: true and false positive rates * AIC * widely used tool in ecology * `\(\mathrm{AIC} = -2 \log L + 2K\)`, where `\(K\)` is the number of parameters * designed for fixed effects models (Burnham and Anderson 2002) * Deviance explained ] --- # AIC and likelihood with sdmTMB * `AIC()` and `logLik()` methods work, just like `glm()` ``` r mesh <- make_mesh(pcod, xy_cols = c("X", "Y"), cutoff = 10 ) fit <- sdmTMB( present ~ 1, data = pcod, mesh = mesh, family = binomial(link = "logit"), spatial = "on" ) logLik(fit) # log likelihood AIC(fit) # AIC ``` --- # When to use restricted maximum likelihood (REML) * Integrates over random effects *and* fixed effects; sometimes helps convergence too * *Can* use REML when comparing different random effect structures * *Don't* use REML when comparing alternative fixed effects * *Don't* use REML for index standardization .small[ ``` r fit <- sdmTMB(..., reml = FALSE) ``` ] --- # Reminder: fixed and random effects in sdmTMB Random effects include * random intercepts and slopes: `(1 + x | group)` * smooth terms: `s(year)` * time-varying coefficients * all random fields * spatially varying coefficients (also random fields) --- # Limitations of (marginal) AIC .small[ * Originally conceived for fixed effects * Burnham and Anderson (2002) * Approximate & problematic for mixed effects * Vaida and Blanchard (2005) * Liang et al. (2008) * Great [FAQ on `glmmTMB` by Ben Bolker](https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#can-i-use-aic-for-mixed-models-how-do-i-count-the-number-of-degrees-of-freedom-for-a-random-effect) * Conditional AIC: less widely used, computationally harder * Thorson 2024: <https://doi.org/10.1002/ecy.4327> for GMRF models * Also built into sdmTMB `?cAIC` ] --- # Predictive model selection * Ideal world: use cross validation to evaluate predictive accuracy * Split data into train / test sets * Objective function: * maximize log-likelihood of test data * also known as "log predictive density" or "log score" * or use a domain-specific metric you care about --- # Log predictive density Interpreted as theoretical expectation for new dataset Defined as the sum of the log likelihoods for left-out data. Sometimes averaged per data point Analogous to ELPD in Bayesian literature Higher is better; only makes sense when comparing models fit to the same data! Great [FAQ on cross validation](https://users.aalto.fi/~ave/CV-FAQ.html) by Aki Vehtari --- # Cross validation in sdmTMB .small[ ``` r *fit <- sdmTMB_cv( present ~ 1, data = pcod, mesh = mesh, family = binomial(link = "logit"), * k_folds = 8, * fold_ids = NULL, * parallel = TRUE, * use_initial_fit = FALSE ) ``` ] .xsmall[ * More folds = more computation time but less cross validation variance and bias * `use_initial_fit = TRUE`: fits first fold, and initializes subsequent model fits from that * Uses the same mesh for all folds if `mesh` is supplied (you probably want this) ] --- # Cross validation in sdmTMB `sdmTMB_cv()` returns: - `models`: A list of models (each `sdmTMB()` object) - `data`: Original data plus columns for fold ID, CV predicted value, and CV log likelihood - `fold_loglik`: sum of held out likelihoods for each fold - `sum_loglik`: sum of held out likelihoods across all data - and more... --- # How to choose folds? How many? .small[ * Words of wisdom: * Can be highly variable based on data, folds, sampling scheme * Spatial sampling or random? * [blockCV R package](https://cran.r-project.org/web/packages/blockCV/index.html), [Valavi et al. (2019)](https://doi.org/10.1111/2041-210X.13107) * How to sample with time / years? * LOOCV (leave-one-out...) vs. LFOCV (leave-future-out...) * `sdmTMB_cv()` does random fold assignment * Custom folds can be specified with `fold_ids` ] --- # Basic cross validation example Set a `future::plan()`; the folds will be fit in parallel ``` r *library(future) *plan(multisession) mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) *m_cv <- sdmTMB_cv( density ~ s(depth, k = 5), data = pcod, mesh = mesh, family = tweedie(link = "log"), * k_folds = 2 ) # Sum of log likelihoods of left-out data: m_cv$sum_loglik # higher is better #> [1] -6780.385 ``` --- class: center, middle, inverse # Deviance explained --- # Deviance explained * Deviance explained measures how much of the variation in the data is captured by the model compared to a null (usually intercept-only) model -- * Analogous to an R² for models fit with likelihood methods (e.g., GLMs, GAMs, or GLMMs) -- * Calculated as `1 − (model deviance / null deviance)`; higher values indicate a better fit -- * Can also be used to compare % increase in deviance explained between two models fit to the same data --- # Deviance explained in sdmTMB ``` r mesh <- make_mesh(pcod, c("X", "Y"), cutoff = 10) fit <- sdmTMB( density ~ depth_scaled, spatial = "on", mesh = mesh, family = delta_gamma(), data = pcod, ) # model with *only* intercept and no random fields: fit_null <- update( fit, * formula. = density ~ 1, * spatial = "off" ) # proportion deviance explained: 1 - deviance(fit) / deviance(fit_null) #> [1] 0.3241151 ``` --- # Proportional reduction in deviance by model component We can quantify how much better the model fits when adding a smoother or spatial field by comparing model deviances. Let's fit a model with a depth smoother and spatial random field: ``` r fit <- sdmTMB( present ~ s(depth_scaled), spatial = "on", mesh = mesh, family = binomial(), data = pcod, ) ``` --- # Deviance reduction by component Fit some sub-models for comparison: ``` r # same model with no random fields: fit_depth_only <- update( fit, * spatial = "off" ) # same model with no depth smoother: fit_rf_only <- update( fit, * formula. = . ~ 1 ) # null model fit_null <- update( fit, * spatial = "off", * formula. = . ~ 1 ) ``` --- ### Deviance reduction by component .small[ Proportional deviance explained by depth: ``` r 1 - deviance(fit_depth_only) / deviance(fit_null) #> [1] 0.1979381 ``` ] -- .small[ Proportional reduction in model deviance by depth beyond what the spatial field explains: ``` r 1 - deviance(fit) / deviance(fit_rf_only) #> [1] 0.08834777 ``` ] -- .small[ Proportional reduction in model deviance by the random field beyond what the depth smoother explains: ``` r 1 - deviance(fit) / deviance(fit_depth_only) #> [1] 0.2364397 ``` ] --- # Caveats with deviance comparisons .small[ Must be fit to the same response data ] -- .small[ Components can explain overlapping patterns in the data, so their deviance reductions don't add up neatly ] -- .small[ Some families [require you to fix](https://github.com/pbs-assess/sdmTMB/discussions/471) distribution parameters between comparisons: Negative binomial, Tweedie, Generalized gamma, Lognormal ] -- .small[ But these common families are fine (shape or dispersion parameters affect variance linearly or don't exist): Gaussian, Gamma, Poisson, Binomial ] -- .small[ We may write a helper function for this soon! ] --- # But I want a pseudo-R²! In non-Gaussian or spatial GLMMs, variance partitioning becomes hard to interpret because "variance" depends on the link and distribution. -- Comparing proportional reductions in deviance is often clearer. -- But, yes, we've been working on a psuedo-R², but there are some technical + theoretical issues remaining.