class: center, middle, inverse, title-slide .title[ # Data integration ] .subtitle[ ## DFO DSAF workshop ] .author[ ### ] .date[ ### January 12–16 2026 ] --- <!-- Build with: xaringan::inf_mr() --> # Data integration Employing multiple data sources in the same population, stock assessment, or species distribution model. -- Also known as "data fusion" or "integrated analysis". -- Different data sources could be .small[ - multiple surveys of the same kind - surveys with different gear - commercial CPUE - different data "types" (e.g., presence-absence with count data) ] .tiny[ [Grüss et al. (2025) Fisheries Research](https://doi.org/10.1016/j.fishres.2025.107321) ] --- # Types of data integration Spatial domain: same domain vs. "expanded domain" Catchability: equal, constant difference, spatially varying Any combination of these! With (or sometimes without) comparative fishing .tiny[ [Grüss et al. (2025) Fisheries Research](https://doi.org/10.1016/j.fishres.2025.107321) ] --- # Expanded domain Examples: Two or more surveys by the same agency together covering a coastline (e.g., BC groundfish biennial surveys) Two or more surveys run by neighbouring agencies with somewhat different gear but enough spatiotemporal proximity to calibrate --- # Expanded domain example .small[BC groundfish surveys: same gear, bienniel sampling design (don't do this!)] .center[ <img src="images/bc-gf-maps.png" width="700px"/> ] .tiny[[Anderson et al. annual Pacific groundfish synopsis reports](https://github.com/pbs-assess/gfsynopsis)] --- # Expanded domain example .small[BC groundfish surveys: same gear, bienniel sampling design] .center[ <img src="images/bc-index-gfs.png" width="400px"/> ] .tiny[[Anderson et al. annual Pacific groundfish synopsis reports](https://github.com/pbs-assess/gfsynopsis)] --- # Expanded domain example .small[Northeast Pacific Ocean Pacific Spiny Dogfish:<br>NOAA + DFO trawl surveys] .center[ <img src="images/dogfish-coast.png" width="390px"/> ] .xtiny[ Davidson, L.N.K., English, P.A., King, J., Grant, P.B.C., Taylor, I.G., Barnett, L.A.K., Gertseva, V., Tribuzio, C.A., and Anderson, S.C. 2026. Mystery of the disappearing dogfish: transboundary analyses reveal steep population declines across the northeast pacific with little evidence for regional redistribution. Fish and Fisheries 27(1): 1–12. https://doi.org/10.1111/faf.70028. ] --- # Expanded domain example .small[Northeast Pacific Ocean Pacific Spiny Dogfish:<br>NOAA + DFO trawl surveys] .center[ <img src="images/dogfish-map.png" width="390px"/> ] .xtiny[ Davidson, L.N.K., English, P.A., King, J., Grant, P.B.C., Taylor, I.G., Barnett, L.A.K., Gertseva, V., Tribuzio, C.A., and Anderson, S.C. 2026. Mystery of the disappearing dogfish: transboundary analyses reveal steep population declines across the northeast pacific with little evidence for regional redistribution. Fish and Fisheries 27(1): 1–12. https://doi.org/10.1111/faf.70028. ] --- class: center, middle, inverse # Why data integration? --- # Why data integration? .small[ Model a species over a larger range (e.g., for transboundry distribution shift metrics, better estimates of environmental covariates of distribution) ] -- .small[ Can make use of available data with often more accurate and precise distribution predictions ] -- .small[ For single synthetic population indices: - Assessment models aren't usually equipped to combine spatial data - Often statistically convenient - Sometimes necessary when designing management procedures - Fish Stocks provisions "one stock, one LRP" guidance ] --- class: center, middle, inverse # How do we do data integration? --- # Regression discontinuity design (RDD) The ability to integrate data, especially in an expanded domain context depends on the concept of regression discontinuity design. <img src="11-integrated_files/figure-html/unnamed-chunk-1-1.png" width="70%" style="display: block; margin: auto;" /> --- # Ways of accounting for catchability Constant catchability coefficient ``` r formula = catch ~ factor(year) + * factor(gear) ``` Spatially varying catchability coefficient ``` r formula = catch ~ factor(year) + * factor(gear), *spatial_varying = ~ factor(gear) ``` Mind what your base gear factor level is --- # Different likelihoods Data integration can be with the same or different likelihoods. Examples: **Same**: two surveys that collect catch weight (e.g., delta-Gamma or Tweedie) **Different**: catch count (Poisson) + presence-absence (Bernoulli with clogog link) --- # Combining all continuous positive, count, and 0-1 data (😲) .small[ Idea: there's an underlying surface of fish population intensity. You observe it as one of:<br>- weight, count, presence/absence ] .small[ Internally, share linear predictors, convert into different observation likelihoods. ] -- .small[ Key concepts: - a cloglog linked Bernoulli can be thought of as a "thinned" observation of a count process - a Poisson-link delta model has an underlying "count" model and a weight per "count" ] --- class: center, middle, inverse # Food for thought on data integration --- # Deciding whether you should combine data .small[ Do they represent the same population? ] -- .small[ Are the selectivities similar enough, if you're not controlling for it? ] -- .small[ Do the datasets give a similar impression on their own? ] -- .small[ Does it help or hinder your stock assessment to create an integrated index? ] -- .small[ Does a secondary data source substantially change parameter estimates from those of a trusted data source? ([Rufener et al. 2021 Ecol. Appl.](https://onlinelibrary.wiley.com/doi/abs/10.1002/eap.2453)). ] --- # Caveats Unless fish length interacts with catchability, you're assuming matching length selectivity -- Expanded domain models can go awry in practice! -- There may be good reasons to not integrate two datasets -- This is pretty new in the fisheries world -- Using different likelihoods per row of data isn't merged into the main branch of sdmTMB yet, but should be soon. See the [multiple-data](https://github.com/sdmTMB/sdmTMB/tree/multiple-data) branch and [this vignette](https://github.com/sdmTMB/sdmTMB/blob/multiple-data/vignettes/articles/multi-family.Rmd). It's possible in tinyVAST.