class: center, middle, inverse, title-slide .title[ # Introduction to sdmTMB ] .subtitle[ ## DFO DSAF workshop ] .author[ ### ] .date[ ### January 12–16 2026 ] --- <!-- Build with: xaringan::inf_mr() --> ### sdmTMB origins .xsmall[ A need for reproducible methods and simple code for assessments [(Anderson et al. 2019)](https://github.com/pbs-assess/gfsynopsis) ] .center[ <img src="images/synopsis-lingcod.png" width="56%" /> ] --- # sdmTMB contributors .center[ <img src="images/intro_slide2.jpg" width="85%" /> ] And many more! Including Julia Indivero, Joe Watson, Cole Monnahan, Joseph Barss, with code adapted from VAST, glmmTMB, mgcv, and brms --- # sdmTMB highlights User-friendly R package for modeling spatial and spatiotemporal processes -- Familiar syntax to widely used functions/packages; `glm()`, mgcv, glmmTMB, etc. -- Performs fast (marginal) maximum likelihood estimation via TMB -- Widely applicable to species distribution modelling, stock assessment inputs, bycatch models, estimating distribution shifts --- # Installing sdmTMB From CRAN: ``` r install.packages("sdmTMB", dependencies = TRUE) ``` From source on GitHub: * Install a C++ compiler * e.g., [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools40.html) on Windows * e.g., Xcode command line tools on a Mac: `xcode-select --install` .small[ ``` r # install.packages("pak") pak::pak("pbs-assess/sdmTMB", dependencies = TRUE) ``` ] --- # sdmTMB workflow 1. Prepare data .xsmall[(convert to UTMs, scale covariates, ...)] 2. Construct a mesh 3. Fit the model 4. Inspect the model .xsmall[(and possibly refit the model)] 5. Predict from the model 6. Calculate any derived quantities --- # sdmTMB workflow 1. Prepare data: .blue[`add_utm_columns()`] 2. Construct a mesh: .blue[`make_mesh()`] 3. Fit the model: .blue[`sdmTMB()`] 4. Inspect the model: .blue[`print()`], .blue[`sanity()`], .blue[`tidy()`], .blue[`residuals()`, .blue[`coef()`], ...] 5. Predict from the model: .blue[`predict()`], .blue[`simulate()`] 6. Get derived quantities: .blue[`get_index()`, `get_cog()`, `get_eao()`, `get_range_edge()`, `get_weighted_average()`, ...] <!-- <img src="images/sdmTMB_workflow.png" height = "500px" width="900px" /> --> --- class: center, middle, inverse # Preparing data: getting into constant distance coordinates --- # Getting into UTMs Need a projection that preserves constant distance UTMs work well for regional analyses Helper function: `sdmTMB::add_utm_columns()` Internally, guesses at UTM zone and uses the sf package: `sf::st_as_sf()`, `sf::st_transform()`, and `sf::st_coordinates()` Or use the sf package yourself --- # Example of adding UTM columns ``` r d <- data.frame( lat = c(52.1, 53.4), lon = c(-130.0, -131.4) ) d <- sdmTMB::add_utm_columns(d, c("lon", "lat")) #> Detected UTM zone 9N; CRS = 32609. #> Visit https://epsg.io/32609 to verify. d #> lat lon X Y #> 1 52.1 -130.0 431.5034 5772.632 #> 2 53.4 -131.4 340.4411 5919.452 ``` * Note default `units = "km"` * Why? Range parameter estimated in units of X and Y * Should be not too big or small for estimation --- class: center, middle, inverse # Constructing a mesh --- # Constructing a mesh `make_mesh()` has 2 shortcuts to mesh construction 1. Cutoff: minimum allowed distance between vertices (e.g., `cutoff = 10`) 2. K-means algorithm: used to cluster points (e.g., `n_knots = 100`); sensitive to random `seed` argument! Alternatively, build any fmesher mesh and supply it to the `mesh` argument in `make_mesh()` --- # Constructing a mesh .small[ Size of mesh has the single largest impact on fitting speed `cutoff` is in units of x and y (minimum triangle size) ] .small[ ``` r d <- data.frame(x = runif(500), y = runif(500)) mesh <- make_mesh(d, xy_cols = c("x", "y"), cutoff = 0.1) mesh$mesh$n #> [1] 86 plot(mesh) ``` <img src="02-intro-sdmTMB_files/figure-html/make-mesh-1.png" width="280px" style="display: block; margin: auto;" /> ] --- # Mesh construction .xsmall[ * A unique mesh is generally made for each dataset * Rules of thumb: * More triangles = more computation time * More triangles = more fine-scale spatial predictions * Finer resolution isn't always better * Borders with coarser resolution reduce number of triangles * Use minimum edge size (cutoff) to avoid meshes becoming too fine * Triangle edge size needs to be smaller than spatial range * "How to make a bad mesh?" [Haakon Bakka's book](https://haakonbakkagit.github.io/btopic114.html) ] .tiny[ Ward, E.J. and Anderson, S.C. 2025. Approximating spatial processes with too many knots degrades the quality of probabilistic predictions. bioRxiv. <https://doi.org/10.1101/2025.11.14.688354>. ] --- # Building your own mesh * E.g., `fmesher::fm_mesh_2d()`: lets many arguments be customized * `INLA::meshbuilder()`: Shiny app for constructing a mesh, provides R code * Meshes can include barriers / islands / coastlines with shapefiles * INLA books <https://www.r-inla.org/learnmore/books> --- # Example: cutoff = 50km <img src="02-intro-sdmTMB_files/figure-html/mesh-example4-1.png" width="700px" style="display: block; margin: auto;" /> --- # Example: cutoff = 25km <img src="02-intro-sdmTMB_files/figure-html/mesh-example3-1.png" width="700px" style="display: block; margin: auto;" /> --- # Example: cutoff = 10km <img src="02-intro-sdmTMB_files/figure-html/mesh-example2-1.png" width="700px" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Fitting the model: sdmTMB() --- # sdmTMB() Set up is similar to `glmmTMB()`. Common arguments: ```r fit <- sdmTMB( formula, data, mesh, time = NULL, family = gaussian(link = "identity"), spatial = c("on", "off"), spatiotemporal = c("iid", "ar1", "rw", "off"), silent = TRUE, ... ) ``` See `?sdmTMB` --- class: center, middle, inverse # Syntax: formulas for non-spatial model components --- # Formula interface sdmTMB uses a similar formula interface to widely used R packages A formula is used to specify fixed effects and (optionally) random intercepts .small[ ``` r # linear effect of x1: formula = y ~ x1 # add smoother effect of x2: formula = y ~ x1 + s(x2) # add random intercept and slope by group g: formula = y ~ x1 + s(x2) + (1 + x1 | g) ``` ] --- # Smoothers (as in mgcv) .small[ ``` r # smoother effect of x: formula = y ~ s(x) # basis dimension of 5: formula = y ~ s(x, k = 5) # bivariate smoother effect of x & y: formula = y ~ s(x, y) # bivariate smoother effect of x & y: formula = y ~ t2(x, y) # smoother effect of x1 varying by x2: formula = y ~ s(x1, by = x2) # other kinds of mgcv smoothers: formula = ~ s(month, bs = "cc", k = 12) ``` Smoothers are penalized ('p-splines'), i.e. data determine 'wiggliness' ] --- # Other common R formula options Polynomials and omitting the intercept: ``` r # transformations using `I()` notation y ~ depth + I(depth^2) # polynomial functions using `poly` y ~ poly(depth, degree = 2) # omit intercept y ~ -1 + as.factor(year) y ~ 0 + as.factor(year) ``` --- # Breakpoint functions for threshold analyses ``` r cpue ~ breakpt(temperature) ``` <img src="02-intro-sdmTMB_files/figure-html/make-breakpt-1.png" width="400px" style="display: block; margin: auto;" /> .tiny[ Essington, T.E. S.C. Anderson, L.A.K. Barnett, H.M. Berger, S.A. Siedlecki, E.J. Ward. Advancing statistical models to reveal the effect of dissolved oxygen on the spatial distribution of marine taxa using thresholds and a physiologically based index. 2022. Ecography 2022(8): e06249. <https://doi.org/10.1111/ecog.06249> ] --- # Logistic functions for threshold analyses ``` r cpue ~ logistic(temperature) ``` <img src="02-intro-sdmTMB_files/figure-html/make-logistic-1.png" width="500px" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Syntax: families and links --- # Families Many of the same families used in `glm()`, `glmmTMB()`, `mgcv::gam()` can be used here Includes: `gaussian()`, `Gamma()`, `binomial()`, `poisson()`, `Beta()`, `student()`, `tweedie()`, `nbinom1()`, `nbinom2()`, `truncated_nbinom1()`, `truncated_nbinom2()`, `delta_gamma()`, `delta_lognormal()`, `delta_beta()`, and more... All have `link` arguments See `?sdmTMB::Families` We will cover less-common but useful families later --- class: center, middle, inverse # Syntax: spatial model components --- # Spatial vs. spatiotemporal fields * A spatial field can be thought of as a spatial intercept * a wiggly spatial process that is constant in time -- * Spatiotemporal variation represents separate fields estimated for each time slice (possibly correlated) * wiggly spatial processes that change through time -- * Refer to [model description](https://pbs-assess.github.io/sdmTMB/articles/model-description.html) vignette for math notation --- # Spatial fields can be turned on/off * By default `sdmTMB()` estimates a spatial field ``` r fit <- sdmTMB( y ~ x, family = gaussian(), data = dat, mesh = mesh, * spatial = "on", ... ) ``` --- # Why *not* estimate a spatial field? * If shared process across time slices isn't of interest * If magnitude of spatiotemporal variability >> spatial variation * If confounded with other parameters (more later) --- # Spatiotemporal fields can be turned on/off * By default `sdmTMB()` estimates a spatiotemporal field if the `time` argument is specified ``` r fit <- sdmTMB( y ~ x, family = gaussian(), data = dat, mesh = mesh, * time = "year", # column in `data` * spatiotemporal = "iid", ... ) ``` --- # Types of spatiotemporal fields * None (`spatiotemporal = "off"`) * Independent (`spatiotemporal = "iid"`) * Random walk (`spatiotemporal = "rw"`) * Autoregressive (`spatiotemporal = "ar1"`) --- # Independent (IID) spatiotemporal fields <img src="02-intro-sdmTMB_files/figure-html/iid-demo-1.png" width="700px" style="display: block; margin: auto;" /> --- # AR1 spatiotemporal fields <img src="02-intro-sdmTMB_files/figure-html/ar1-demo-1.png" width="700px" style="display: block; margin: auto;" /> .small[Random walk = AR1 with 1.0 correlation] --- # Spatiotemporal fields * Why include spatiotemporal fields? * If the data are collected in both space and time *and* there are 'latent' spatial processes that vary through time * E.g., effect of water temperature on abundance if temperature wasn't in the model * Represents all the missing spatial variables that vary through time -- * Why would a field be IID vs RW/AR1? * Do we expect hotspots to be independent with each time slice or adapt slowly over time? --- # After model fitting Inspecting, summarizing, predicting, etc. Covered in examples in next slides. * `predict()`: `?predict.sdmTMB` * `residuals()`: `?residuals.sdmTMB` * `tidy()`: `?tidy.sdmTMB` * `print()` * `sanity()` * `get_index()`, `get_cog()`, `get_eao()`, etc. * `...`