Introduction to sdmTMB

.title[
# Introduction to sdmTMB
]
.subtitle[
## DFO DSAF workshop
]
.author[
### 
]
.date[
### January 12–16 2026
]

---

### sdmTMB origins

.xsmall[
A need for reproducible methods and simple code for assessments [(Anderson et al. 2019)](https://github.com/pbs-assess/gfsynopsis)
]

---

# sdmTMB contributors

And many more! Including Julia Indivero, Joe Watson, Cole Monnahan, Joseph Barss, with code adapted from VAST, glmmTMB, mgcv, and brms

---

# sdmTMB highlights

User-friendly R package for modeling spatial and spatiotemporal processes

Familiar syntax to widely used functions/packages; `glm()`, mgcv, glmmTMB, etc.

--
  
Performs fast (marginal) maximum likelihood estimation via TMB
  
--
  
Widely applicable to species distribution modelling, stock assessment inputs, bycatch models, estimating distribution shifts

---

# Installing sdmTMB

From CRAN:

``` r
install.packages("sdmTMB", dependencies = TRUE)
```

From source on GitHub:

* Install a C++ compiler
  * e.g., [Rtools](https://cran.r-project.org/bin/windows/Rtools/rtools40.html) on Windows
  * e.g., Xcode command line tools on a Mac:  
  `xcode-select --install`

``` r
# install.packages("pak")
pak::pak("pbs-assess/sdmTMB", dependencies = TRUE)
```
]

---

# sdmTMB workflow

1. Prepare data .xsmall[(convert to UTMs, scale covariates, ...)]

2. Construct a mesh

3. Fit the model

4. Inspect the model .xsmall[(and possibly refit the model)]

5. Predict from the model

6. Calculate any derived quantities

---

# sdmTMB workflow

1. Prepare data: .blue[`add_utm_columns()`]

2. Construct a mesh: .blue[`make_mesh()`]

3. Fit the model: .blue[`sdmTMB()`]

4. Inspect the model: .blue[`print()`], .blue[`sanity()`], .blue[`tidy()`], .blue[`residuals()`, .blue[`coef()`], ...]

5. Predict from the model: .blue[`predict()`], .blue[`simulate()`]

6. Get derived quantities: .blue[`get_index()`, `get_cog()`, `get_eao()`, `get_range_edge()`, `get_weighted_average()`, ...]

---

# Preparing data: getting into constant distance coordinates

---

# Getting into UTMs

Need a projection that preserves constant distance

UTMs work well for regional analyses

Helper function: `sdmTMB::add_utm_columns()`

Internally, guesses at UTM zone and uses the sf package:  
`sf::st_as_sf()`, `sf::st_transform()`, and `sf::st_coordinates()`

Or use the sf package yourself

---

# Example of adding UTM columns

``` r
d <- data.frame(
  lat = c(52.1, 53.4), 
  lon = c(-130.0, -131.4)
)
d <- sdmTMB::add_utm_columns(d, c("lon", "lat"))
#> Detected UTM zone 9N; CRS = 32609.
#> Visit https://epsg.io/32609 to verify.
d
#>    lat    lon        X        Y
#> 1 52.1 -130.0 431.5034 5772.632
#> 2 53.4 -131.4 340.4411 5919.452
```

* Note default `units = "km"`
* Why? Range parameter estimated in units of X and Y
* Should be not too big or small for estimation

---

# Constructing a mesh

---

# Constructing a mesh

`make_mesh()` has 2 shortcuts to mesh construction

1. Cutoff: minimum allowed distance between vertices (e.g., `cutoff = 10`)

2. K-means algorithm: used to cluster points (e.g., `n_knots = 100`); sensitive to random `seed` argument!

Alternatively, build any fmesher mesh and supply it to the `mesh` argument in `make_mesh()`

---

# Constructing a mesh

`cutoff` is in units of x and y (minimum triangle size)
]

``` r
d <- data.frame(x = runif(500), y = runif(500))
mesh <- make_mesh(d, xy_cols = c("x", "y"), cutoff = 0.1)
mesh$mesh$n
#> [1] 86
plot(mesh)
```

<img src="02-intro-sdmTMB_files/figure-html/make-mesh-1.png" width="280px" style="display: block; margin: auto;" />
]

---

# Mesh construction

.xsmall[
* A unique mesh is generally made for each dataset
* Rules of thumb:
  * More triangles = more computation time
  * More triangles = more fine-scale spatial predictions
  * Finer resolution isn't always better
  * Borders with coarser resolution reduce number of triangles
  * Use minimum edge size (cutoff) to avoid meshes becoming too fine
  * Triangle edge size needs to be smaller than spatial range

* "How to make a bad mesh?" [Haakon Bakka's book](https://haakonbakkagit.github.io/btopic114.html)
]

.tiny[
Ward, E.J. and Anderson, S.C. 2025. Approximating spatial processes with too many knots degrades the quality of probabilistic predictions. bioRxiv. <https://doi.org/10.1101/2025.11.14.688354>.
]
---

# Building your own mesh

* E.g., `fmesher::fm_mesh_2d()`: lets many arguments be customized

* `INLA::meshbuilder()`: Shiny app for constructing a mesh, provides R code

* Meshes can include barriers / islands / coastlines with shapefiles

* INLA books
<https://www.r-inla.org/learnmore/books>

---

# Example: cutoff = 50km

---

# Example: cutoff = 25km

---

# Example: cutoff = 10km

---

# Fitting the model: sdmTMB()

---

# sdmTMB()

Set up is similar to `glmmTMB()`. Common arguments:

```r
fit <- sdmTMB(
  formula,
  data,
  mesh,
  time = NULL,
  family = gaussian(link = "identity"),
  spatial = c("on", "off"),
  spatiotemporal = c("iid", "ar1", "rw", "off"),
  silent = TRUE,
  ...
)
```

See `?sdmTMB`

---

# Syntax: formulas for non-spatial model components

---

# Formula interface

sdmTMB uses a similar formula interface to widely used R packages

A formula is used to specify fixed effects and (optionally) random intercepts

``` r
# linear effect of x1:
formula = y ~ x1

# add smoother effect of x2:
formula = y ~ x1 + s(x2)

# add random intercept and slope by group g:
formula = y ~ x1 + s(x2) + (1 + x1 | g)
```
]

---

# Smoothers (as in mgcv)

``` r
# smoother effect of x:
formula = y ~ s(x)

# basis dimension of 5:
formula = y ~ s(x, k = 5)

# bivariate smoother effect of x & y:
formula = y ~ s(x, y)

# bivariate smoother effect of x & y:
formula = y ~ t2(x, y)

# smoother effect of x1 varying by x2:
formula = y ~ s(x1, by = x2)

# other kinds of mgcv smoothers:
formula = ~ s(month, bs = "cc", k = 12)
```

Smoothers are penalized ('p-splines'), i.e. data determine 'wiggliness'
]

---

# Other common R formula options

Polynomials and omitting the intercept:

``` r
# transformations using `I()` notation
y ~ depth + I(depth^2)

# polynomial functions using `poly`
y ~ poly(depth, degree = 2)

# omit intercept
y ~ -1 + as.factor(year)
y ~ 0 + as.factor(year)
```

---

# Breakpoint functions for threshold analyses

``` r
cpue ~ breakpt(temperature)
```

.tiny[
Essington, T.E. S.C. Anderson, L.A.K. Barnett, H.M. Berger, S.A. Siedlecki, E.J. Ward. Advancing statistical models to reveal the effect of dissolved oxygen on the spatial distribution of marine taxa using thresholds and a physiologically based index. 2022. Ecography 2022(8): e06249. <https://doi.org/10.1111/ecog.06249>
]

---

# Logistic functions for threshold analyses

``` r
cpue ~ logistic(temperature)
```

---

# Syntax: families and links

---

# Families

Many of the same families used in `glm()`, `glmmTMB()`, `mgcv::gam()` can be used here

Includes:
  `gaussian()`, `Gamma()`, 
  `binomial()`, `poisson()`, `Beta()`, 
  `student()`, `tweedie()`, 
  `nbinom1()`, `nbinom2()`,
  `truncated_nbinom1()`, `truncated_nbinom2()`,
  `delta_gamma()`, `delta_lognormal()`, `delta_beta()`, and more...
  
All have `link` arguments
  
See `?sdmTMB::Families`

We will cover less-common but useful families later

---

# Syntax: spatial model components

---

# Spatial vs. spatiotemporal fields

* A spatial field can be thought of as a spatial intercept

* a wiggly spatial process that is constant in time
  
--

* Spatiotemporal variation represents separate fields estimated for each time
  slice (possibly correlated)
  
  * wiggly spatial processes that change through time

* Refer to [model description](https://pbs-assess.github.io/sdmTMB/articles/model-description.html) vignette for math notation

---

# Spatial fields can be turned on/off

* By default `sdmTMB()` estimates a spatial field

``` r
fit <- sdmTMB(
  y ~ x,
  family = gaussian(),
  data = dat,
  mesh = mesh,
* spatial = "on",
  ...
)
```

---

# Why *not* estimate a spatial field?

* If shared process across time slices isn't of interest

* If magnitude of spatiotemporal variability >> spatial variation

* If confounded with other parameters (more later)

---

# Spatiotemporal fields can be turned on/off

* By default `sdmTMB()` estimates a spatiotemporal field if the `time` argument is specified

``` r
fit <- sdmTMB(
  y ~ x,
  family = gaussian(),
  data = dat,
  mesh = mesh,
* time = "year", # column in `data`
* spatiotemporal = "iid",
  ...
)
```

---

# Types of spatiotemporal fields

* None (`spatiotemporal = "off"`)

* Independent (`spatiotemporal = "iid"`)

* Random walk (`spatiotemporal = "rw"`)

* Autoregressive (`spatiotemporal = "ar1"`)

---

# Independent (IID) spatiotemporal fields

---

# AR1 spatiotemporal fields

---

# Spatiotemporal fields

* Why include spatiotemporal fields?

* If the data are collected in both space and time *and* 
    there are 'latent' spatial processes that vary through time
  * E.g., effect of water temperature on abundance if temperature wasn't in the model
  * Represents all the missing spatial variables that vary through time

* Why would a field be IID vs RW/AR1?

* Do we expect hotspots to be independent with each time slice or adapt slowly over time?

---

# After model fitting

Inspecting, summarizing, predicting, etc.

Covered in examples in next slides.

* `predict()`: `?predict.sdmTMB`
* `residuals()`: `?residuals.sdmTMB`
* `tidy()`: `?tidy.sdmTMB`
* `print()`
* `sanity()`
* `get_index()`, `get_cog()`, `get_eao()`, etc.
*  `...`