Thin Plate Spline Model to Detect Spatially Variable Genes
Arguments
- input
SpatialExperiment
ornumeric
matrix: Input data, which can either be aSpatialExperiment
object or anumeric
matrix of values. If it is aSpatialExperiment
object, it is assumed to have anassay
slot containing either logcounts (e.g. from thescran
package) or deviance residuals (e.g. from thescry
package), and aspatialCoords
slot containing spatial coordinates of the measurements. If it is anumeric
matrix, the values are assumed to already be normalized and transformed (e.g. logcounts), formatted asrows = genes
andcolumns = spots
, and a separatenumeric
matrix of spatial coordinates must also be provided with thespatial_coords
argument.- spatial_coords
numeric
matrix: Matrix containing columns of spatial coordinates, formatted asrows = spots
. This must be provided ifinput
is provied as anumeric
matrix of values, and is ignored ifinput
is provided as aSpatialExperiment
object. Default = NULL.- X
numeric
matrix: Optional design matrix containing columns of covariates per spatial location, e.g. known spatial domains. Number of rows must match the number of spatial locations. Default = NULL, which fits an intercept-only model.- family
a description of the error distribution and link function to be used in the model. Currently support two distributions
poisson
andgaussian
- offset
This can be used to account for technician variation when
family = poisson
model is used to model raw counts.offset
should take in the log-transformed scale factor, e.g.offset = log(spe$sizeFactor)
, library size, or other normalization factor.- weights
Reserved for future development, e.g. correcting mean-var relationship for Gaussian models. Please use with caution.
- assay_name
character
: Ifinput
is provided as aSpatialExperiment
object, this argument selects the name of theassay
slot in the input object containing the preprocessed gene expression values. For example,logcounts
for log-transformed normalized counts from thescran
package, orbinomial_deviance_residuals
for deviance residuals from thescry
package. Default ="logcounts"
, or ignored ifinput
is provided as anumeric
matrix of values.- n_threads
integer
: Number of threads for parallelization. Default = 1. We recommend setting this equal to the number of cores available (if working on a laptop or desktop) or around 10 or more (if working on a compute cluster).- BPPARAM
BiocParallelParam
: Optional additional argument for parallelization. This argument is provided for advanced users ofBiocParallel
for further flexibility for parallelization on some operating systems. If provided, this should be an instance ofBiocParallelParam
. For most users, the recommended option is to use then_threads
argument instead. Default = NULL, in which casen_threads
will be used instead.- verbose
logical
: Whether to display verbose output for model fitting and parameter estimation fromBRISC
. Default = FALSE.- ...
Reserved for future arguments.
Value
If the input was provided as a SpatialExperiment
object, the
output values are returned as additional columns in the rowData
slot
of the input object. If the input was provided as a numeric
matrix
of values, the output is returned as a numeric
matrix. The output
values include p-values without any adjustment and statistics reporting
reporting the thinplate spline model. The test_stat
entry of the
returned object is the test statistic for the corresponding model,
that is F statistics for the gaussian model and the Chi-squared statistics
for generalized models.
Examples
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#>
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:nlme’:
#>
#> collapse
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
library(STexampleData)
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#>
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:Biobase’:
#>
#> cache
library(scran)
#> Loading required package: scuttle
library(nnSVG)
# load example dataset from STexampleData package
spe <- Visium_humanDLPFC()
#> see ?STexampleData and browseVignettes('STexampleData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
# preprocessing steps
# keep only spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]
# skip spot-level quality control, since this has been performed previously
# on this dataset
# Add library size
spe <- addPerCellQCMetrics(spe)
# filter low-expressed and mitochondrial genes
spe <- filter_genes(spe)
#> Gene filtering: removing mitochondrial genes
#> removed 13 mitochondrial genes
#> Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations
#> removed 30216 out of 33525 genes due to low expression
# calculate logcounts (log-transformed normalized counts) using scran package
# using library size factors
spe <- computeLibraryFactors(spe)
spe <- logNormCounts(spe)
# select small number of genes for faster runtime in this example
set.seed(123)
ix <- sample(seq_len(nrow(spe)), 4)
spe <- spe[ix, ]
# run tpSVG
set.seed(123)
# Gaussian Model
spe_gaus <- tpSVG(
spe,
family = gaussian(),
assay_name = "logcounts"
)
# Poisson Model
spe_poisson <- tpSVG(
spe,
family = poisson,
assay_name = "counts",
offset = log(spe$sizeFactor) # Natural log library size
)