Thin Plate Spline Model to Detect Spatially Variable Genes
Arguments
- input
SpatialExperimentornumericmatrix: Input data, which can either be aSpatialExperimentobject or anumericmatrix of values. If it is aSpatialExperimentobject, it is assumed to have anassayslot containing either logcounts (e.g. from thescranpackage) or deviance residuals (e.g. from thescrypackage), and aspatialCoordsslot containing spatial coordinates of the measurements. If it is anumericmatrix, the values are assumed to already be normalized and transformed (e.g. logcounts), formatted asrows = genesandcolumns = spots, and a separatenumericmatrix of spatial coordinates must also be provided with thespatial_coordsargument.- spatial_coords
numericmatrix: Matrix containing columns of spatial coordinates, formatted asrows = spots. This must be provided ifinputis provied as anumericmatrix of values, and is ignored ifinputis provided as aSpatialExperimentobject. Default = NULL.- X
numericmatrix: Optional design matrix containing columns of covariates per spatial location, e.g. known spatial domains. Number of rows must match the number of spatial locations. Default = NULL, which fits an intercept-only model.- family
a description of the error distribution and link function to be used in the model. Currently support two distributions
poissonandgaussian- offset
This can be used to account for technician variation when
family = poissonmodel is used to model raw counts.offsetshould take in the log-transformed scale factor, e.g.offset = log(spe$sizeFactor), library size, or other normalization factor.- weights
Reserved for future development, e.g. correcting mean-var relationship for Gaussian models. Please use with caution.
- assay_name
character: Ifinputis provided as aSpatialExperimentobject, this argument selects the name of theassayslot in the input object containing the preprocessed gene expression values. For example,logcountsfor log-transformed normalized counts from thescranpackage, orbinomial_deviance_residualsfor deviance residuals from thescrypackage. Default ="logcounts", or ignored ifinputis provided as anumericmatrix of values.- n_threads
integer: Number of threads for parallelization. Default = 1. We recommend setting this equal to the number of cores available (if working on a laptop or desktop) or around 10 or more (if working on a compute cluster).- BPPARAM
BiocParallelParam: Optional additional argument for parallelization. This argument is provided for advanced users ofBiocParallelfor further flexibility for parallelization on some operating systems. If provided, this should be an instance ofBiocParallelParam. For most users, the recommended option is to use then_threadsargument instead. Default = NULL, in which casen_threadswill be used instead.- verbose
logical: Whether to display verbose output for model fitting and parameter estimation fromBRISC. Default = FALSE.- ...
Reserved for future arguments.
Value
If the input was provided as a SpatialExperiment object, the
output values are returned as additional columns in the rowData slot
of the input object. If the input was provided as a numeric matrix
of values, the output is returned as a numeric matrix. The output
values include p-values without any adjustment and statistics reporting
reporting the thinplate spline model. The test_stat entry of the
returned object is the test statistic for the corresponding model,
that is F statistics for the gaussian model and the Chi-squared statistics
for generalized models.
Examples
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
#> union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#>
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:nlme’:
#>
#> collapse
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
library(STexampleData)
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#>
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:Biobase’:
#>
#> cache
library(scran)
#> Loading required package: scuttle
library(nnSVG)
# load example dataset from STexampleData package
spe <- Visium_humanDLPFC()
#> see ?STexampleData and browseVignettes('STexampleData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
# preprocessing steps
# keep only spots over tissue
spe <- spe[, colData(spe)$in_tissue == 1]
# skip spot-level quality control, since this has been performed previously
# on this dataset
# Add library size
spe <- addPerCellQCMetrics(spe)
# filter low-expressed and mitochondrial genes
spe <- filter_genes(spe)
#> Gene filtering: removing mitochondrial genes
#> removed 13 mitochondrial genes
#> Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations
#> removed 30216 out of 33525 genes due to low expression
# calculate logcounts (log-transformed normalized counts) using scran package
# using library size factors
spe <- computeLibraryFactors(spe)
spe <- logNormCounts(spe)
# select small number of genes for faster runtime in this example
set.seed(123)
ix <- sample(seq_len(nrow(spe)), 4)
spe <- spe[ix, ]
# run tpSVG
set.seed(123)
# Gaussian Model
spe_gaus <- tpSVG(
spe,
family = gaussian(),
assay_name = "logcounts"
)
# Poisson Model
spe_poisson <- tpSVG(
spe,
family = poisson,
assay_name = "counts",
offset = log(spe$sizeFactor) # Natural log library size
)
