Skip to contents

Thin Plate Spline Model to Detect Spatially Variable Genes

Usage

tpSVG(
  input,
  spatial_coords = NULL,
  X = NULL,
  family = poisson(),
  offset = log(input$sizeFactor),
  weights = NULL,
  assay_name = "counts",
  n_threads = 1,
  BPPARAM = NULL,
  verbose = FALSE,
  ...
)

Arguments

input

SpatialExperiment or numeric matrix: Input data, which can either be a SpatialExperiment object or a numeric matrix of values. If it is a SpatialExperiment object, it is assumed to have an assay slot containing either logcounts (e.g. from the scran package) or deviance residuals (e.g. from the scry package), and a spatialCoords slot containing spatial coordinates of the measurements. If it is a numeric matrix, the values are assumed to already be normalized and transformed (e.g. logcounts), formatted as rows = genes and columns = spots, and a separate numeric matrix of spatial coordinates must also be provided with the spatial_coords argument.

spatial_coords

numeric matrix: Matrix containing columns of spatial coordinates, formatted as rows = spots. This must be provided if input is provied as a numeric matrix of values, and is ignored if input is provided as a SpatialExperiment object. Default = NULL.

X

numeric matrix: Optional design matrix containing columns of covariates per spatial location, e.g. known spatial domains. Number of rows must match the number of spatial locations. Default = NULL, which fits an intercept-only model.

family

a description of the error distribution and link function to be used in the model. Currently support two distributions poisson and gaussian

offset

This can be used to account for technician variation when family = poisson model is used to model raw counts. offset should take in the log-transformed scale factor, e.g. offset = log(spe$sizeFactor), library size, or other normalization factor.

weights

Reserved for future development, e.g. correcting mean-var relationship for Gaussian models. Please use with caution.

assay_name

character: If input is provided as a SpatialExperiment object, this argument selects the name of the assay slot in the input object containing the preprocessed gene expression values. For example, logcounts for log-transformed normalized counts from the scran package, or binomial_deviance_residuals for deviance residuals from the scry package. Default = "logcounts", or ignored if input is provided as a numeric matrix of values.

n_threads

integer: Number of threads for parallelization. Default = 1. We recommend setting this equal to the number of cores available (if working on a laptop or desktop) or around 10 or more (if working on a compute cluster).

BPPARAM

BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users of BiocParallel for further flexibility for parallelization on some operating systems. If provided, this should be an instance of BiocParallelParam. For most users, the recommended option is to use the n_threads argument instead. Default = NULL, in which case n_threads will be used instead.

verbose

logical: Whether to display verbose output for model fitting and parameter estimation from BRISC. Default = FALSE.

...

Reserved for future arguments.

Value

If the input was provided as a SpatialExperiment object, the output values are returned as additional columns in the rowData slot of the input object. If the input was provided as a numeric matrix of values, the output is returned as a numeric matrix. The output values include p-values without any adjustment and statistics reporting reporting the thinplate spline model. The test_stat entry of the returned object is the test statistic for the corresponding model, that is F statistics for the gaussian model and the Chi-squared statistics for generalized models.

Examples

library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#>     union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> 
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:nlme’:
#> 
#>     collapse
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#> 
#>     rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     anyMissing, rowMedians
library(STexampleData)
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#> 
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:Biobase’:
#> 
#>     cache
library(scran)
#> Loading required package: scuttle
library(nnSVG)

# load example dataset from STexampleData package
spe <- Visium_humanDLPFC()
#> see ?STexampleData and browseVignettes('STexampleData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache

# preprocessing steps

# keep only spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]

# skip spot-level quality control, since this has been performed previously
# on this dataset
# Add library size
spe <- addPerCellQCMetrics(spe)

# filter low-expressed and mitochondrial genes
spe <- filter_genes(spe)
#> Gene filtering: removing mitochondrial genes
#> removed 13 mitochondrial genes
#> Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations
#> removed 30216 out of 33525 genes due to low expression

# calculate logcounts (log-transformed normalized counts) using scran package
# using library size factors
spe <- computeLibraryFactors(spe)
spe <- logNormCounts(spe)



# select small number of genes for faster runtime in this example
set.seed(123)
ix <- sample(seq_len(nrow(spe)), 4)
spe <- spe[ix, ]

# run tpSVG
set.seed(123)

# Gaussian Model
spe_gaus <- tpSVG(
 spe,
 family = gaussian(),
 assay_name = "logcounts"
 )

# Poisson Model
spe_poisson  <- tpSVG(
 spe,
 family = poisson,
 assay_name = "counts",
 offset = log(spe$sizeFactor)   # Natural log library size
 )