Title: | Flexible and Interpretable Non-Parametric Tests of Exchangeability |
---|---|
Description: | Given a multivariate dataset and some knowledge about the dependencies between its features, it is important to ensure the observations or individuals are exchangeable before fitting a model to the data in order to make inferences from it, or assigning randomized treatments in order to estimate treatment effects. This package provides a flexible non-parametric test of exchangeability, allowing the user to specify the feature dependencies by hand. It can be used directly to evaluate whether a sample is exchangeable, and can also be piped into larger procedures that require exchangeable samples as outputs (e.g., clustering or community detection). See Aw, Spence and Song (2021+) for the accompanying paper. |
Authors: | Alan Aw [cre, aut] |
Maintainer: | Alan Aw <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.0.2 |
Built: | 2025-01-22 05:59:22 UTC |
Source: | https://github.com/alanaw1/flintyr |
Given a multivariate dataset and some knowledge about the dependencies between its features, it is important to ensure the observations or individuals are exchangeable before fitting a model to the data in order to make inferences from it, or assigning randomized treatments in order to estimate treatment effects. This package provides a flexible non-parametric test of exchangeability, allowing the user to specify the feature dependencies by hand. It can be used directly to evaluate whether a sample is exchangeable, and can also be piped into larger procedures that require exchangeable samples as outputs (e.g., clustering or community detection). See Aw, Spence and Song (2021+) for the accompanying paper.
Index of help topics:
blockGaussian Approximate p-value for Test of Exchangeability (Assuming Large N and P with Block Dependencies) blockLargeP Approximate p-value for Test of Exchangeability (Assuming Large P with Block Dependencies) blockPermute p-value Computation for Test of Exchangeability with Block Dependencies buildForward Map from Indices to Label Pairs buildReverse Map from Label Pairs to Indices cacheBlockPermute1 Resampling Many V Statistics (Version 1) cacheBlockPermute2 Resampling Many V Statistics (Version 2) cachePermute Permutation by Caching Distances distDataLargeP Asymptotic p-value of Exchangeability Using Distance Data distDataPValue A Non-parametric Test for Exchangeability and Homogeneity (Distance List Version) distDataPermute p-value Computation for Test of Exchangeability Using Distance Data flintyR-package Flexible and Interpretable Non-Parametric Tests of Exchangeability getBinVStat V Statistic for Binary Matrices getBlockCov Covariance Computations Between Pairs of Distances (Block Dependencies Case) getChi2Weights Get Chi Square Weights getCov Covariance Computations Between Pairs of Distances (Independent Case) getHammingDistance A Hamming Distance Vector Calculator getLpDistance A l_p^p Distance Vector Calculator getPValue A Non-parametric Test for Exchangeability and Homogeneity getRealVStat V Statistic for Real Matrices hamming_bitwise Fast Bitwise Hamming Distance Vector Computation indGaussian Approximate p-value for Test of Exchangeability (Assuming Large N and P) indLargeP Approximate p-value for Test of Exchangeability (Assuming Large P) lp_distance Fast l_p^p Distance Vector Computation naiveBlockPermute1 Resampling V Statistic (Version 1) naiveBlockPermute2 Resampling V Statistic (Version 2) weightedChi2P Tail Probability for Chi Square Convolution Random Variable
Alan Aw <[email protected]>
Alan Aw [cre, aut] (<https://orcid.org/0000-0001-9455-7878>), Jeffrey Spence [ctb]
Computes the large asymptotic p-value for dataset
,
assuming its
features are independent within specified blocks.
blockGaussian(X, block_boundaries, block_labels, p)
blockGaussian(X, block_boundaries, block_labels, p)
X |
The binary or real matrix on which to perform test of exchangeability |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
This is the large and large
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getBlockCov, getChi2Weights
The asymptotic p-value
Computes the large asymptotic p-value for dataset
,
assuming its
features are independent within specified blocks.
blockLargeP(X, block_boundaries, block_labels, p = 2)
blockLargeP(X, block_boundaries, block_labels, p = 2)
X |
The binary or real matrix on which to perform test of exchangeability |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
This is the large asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getChi2Weights, weightedChi2P, getBlockCov
The asymptotic p-value
Generates a block permutation p-value. Uses a heuristic to decide whether to use distance caching or simple block permutations.
blockPermute( X, block_boundaries = NULL, block_labels = NULL, nruns, type, p = 2 )
blockPermute( X, block_boundaries = NULL, block_labels = NULL, nruns, type, p = 2 )
X |
The binary or real matrix on which to perform permutation resampling |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. Default is NULL. |
block_labels |
Length |
nruns |
The resampling number (use at least 1000) |
type |
Either an unbiased estimate (''unbiased'‘, default), or exact ('’valid'‘) p-value (see Hemerik and Goeman, 2018), or both ('’both'‘). Default is '’unbiased''. |
p |
The power p of |
Dependencies: buildForward, buildReverse, cachePermute, cacheBlockPermute1, cacheBlockPermute2, getHammingDistance, getLpDistance, naiveBlockPermute1, naiveBlockPermute2
The block permutation p-value
Builds a map from indexes to pairs of labels. This is
for caching distances, to avoid recomputing Hamming distances
especially when dealing with high-dimensional (large ) matrices.
buildForward(N)
buildForward(N)
N |
Sample size, i.e., nrow( |
Dependencies: None
matrix whose entries record the index
corresponding to the pair of labels (indexed by the matrix dims)
Builds a map from pairs of labels to indexes. This is
for caching distances, to avoid recomputing Hamming distances
especially when dealing with high-dimensional (large ) matrices.
buildReverse(N)
buildReverse(N)
N |
Sample size, i.e., nrow( |
Dependencies: None
matrix whose entries record the index
corresponding to the pair of labels (indexed by the matrix dims)
Generates a block permutation distribution of statistic.
Precomputes distances and some indexing arrays to quickly
generate samples from the block permutation distribution of the
statistic of
.
cacheBlockPermute1(X, block_labels, nruns, p = 2)
cacheBlockPermute1(X, block_labels, nruns, p = 2)
X |
The binary or real matrix on which to perform permutation resampling |
block_labels |
Length |
nruns |
The resampling number (use at least 1000) |
p |
The power |
This version is with block labels specified.
Dependencies: buildForward, buildReverse, cachePermute, getHammingDistance, getLpDistance
A vector of resampled values of the statistic
Generates a block permutation distribution of statistic.
Precomputes distances and some indexing arrays to quickly
generate samples from the block permutation distribution of the
statistic of
.
cacheBlockPermute2(X, block_boundaries, nruns, p = 2)
cacheBlockPermute2(X, block_boundaries, nruns, p = 2)
X |
The binary or real matrix on which to perform permutation resampling |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts |
nruns |
The resampling number (use at least 1000) |
p |
The power p of |
This version is with block boundaries specified.
Dependencies: buildForward, buildReverse, cachePermute, getHammingDistance, getLpDistance
A vector of resampled values of the statistic
What do you do when you have to compute pairwise distances many times, and those damn distances take a long time to compute? Answer: You cache the distances and permute the underlying sample labels!
cachePermute(dists, forward, reverse)
cachePermute(dists, forward, reverse)
dists |
|
forward |
|
reverse |
|
This function permutes the distances (Hamming, , etc.) within blocks.
Permutations respect the fact that we are actually permuting the
underlying labels. Arguments forward and reverse should be
precomputed using buildForward and buildReverse.
Dependencies: buildForward, buildReverse
A matrix with same dimensions as dists containing the block-permuted pairwise distances
Generates an asymptotic p-value.
distDataLargeP(dist_list)
distDataLargeP(dist_list)
dist_list |
The list (length |
Generates a weighted convolution of chi-squares distribution of statistic
by storing the provided list of distance data as an
array,
and then using large-
theory to generate the asymptotic null distribution
against which the p-value of observed
statistic is computed.
Each element of dist_list should be a distance matrix.
Dependencies: buildReverse, getChi2Weights, weightedChi2P
The asymptotic p-value obtained from the weighted convolution of chi-squares distribution.
Generates a block permutation p-value.
distDataPermute(dist_list, nruns, type)
distDataPermute(dist_list, nruns, type)
dist_list |
The list (length |
nruns |
The resampling number (use at least 1000) |
type |
Either an unbiased estimate (''unbiased'‘, default), or exact ('’valid'‘) p-value (see Hemerik and Goeman, 2018), or both ('’both'‘). Default is '’unbiased''. |
Generates a block permutation distribution of statistic by storing
the provided list of distance data as an
array,
and then permuting the underlying indices of each individual to generate
resampled
arrays. The observed
statistic is
also computed from the distance data.
Each element of dist_list should be a distance matrix.
Dependencies: buildForward, buildReverse, cachePermute
The p-value obtained from comparing the empirical tail cdf of the observed
statistic computed from distance data.
Computes the p-value of a multivariate dataset, which informs the user if the sample is exchangeable at a given significance level, while simultaneously accounting for feature dependencies. See Aw, Spence and Song (2021) for details.
distDataPValue(dist_list, largeP = FALSE, nruns = 1000, type = "unbiased")
distDataPValue(dist_list, largeP = FALSE, nruns = 1000, type = "unbiased")
dist_list |
The list of distances. |
largeP |
Boolean indicating whether to use large |
nruns |
Resampling number for exact test. Default is 1000. |
type |
Either an unbiased estimate of (''unbiased'‘, default), or valid, but biased estimate of, ('’valid'‘) p-value (see Hemerik and Goeman, 2018), or both ('’both'‘). Default is '’unbiased''. |
This version takes in a list of distance matrices recording
pairwise distances between individuals across independent features.
Dependencies: distDataLargeP and distDataPermute from auxiliary.R
The p-value to be used to test the null hypothesis of exchangeability.
Computes statistic for a binary matrix
, as defined in
Aw, Spence and Song (2021+).
getBinVStat(X)
getBinVStat(X)
X |
The |
Dependencies: getHammingDistance
, the variance of the pairwise Hamming distance between samples
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) getBinVStat(X)
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) getBinVStat(X)
Computes covariance matrix entries and associated alpha, beta
and gamma quantities defined in Aw, Spence and Song (2021),
for partitionable features that are grouped into blocks. Uses
precomputation to compute the unique entries of the asymptotic
covariance matrix of the pairwise Hamming distances in time.
getBlockCov(X, block_boundaries, block_labels, p = 2)
getBlockCov(X, block_boundaries, block_labels, p = 2)
X |
The binary or real matrix |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. |
block_labels |
Length |
p |
The power |
This is used in the large asymptotics of the permutation test.
Dependencies: buildReverse, getHammingDistance, getLpDistance
The three distinct entries of covariance matrix,
Computes weights for the asymptotic random variable
from the and
computed of data array
.
getChi2Weights(alpha, beta, gamma, N)
getChi2Weights(alpha, beta, gamma, N)
alpha |
covariance matrix entry computed from getCov |
beta |
covariance matrix entry computed from getCov |
gamma |
covariance matrix entry computed from getCov |
N |
The sample size, i.e., nrow(X) where X is the original dataset |
This is used in the large asymptotics of the permutation test.
Dependencies: None
The weights
Computes covariance matrix entries and associated alpha, beta
and gamma quantities defined in Aw, Spence and Song (2021),
assuming the features of the dataset
are independent.
getCov(X, p = 2)
getCov(X, p = 2)
X |
The binary or real matrix |
p |
The power |
This is used in the large asymptotics of the permutation test.
Dependencies: buildReverse, getLpDistance
The three distinct entries of covariance matrix,
Computes all pairwise Hamming distances for a binary matrix .
getHammingDistance(X)
getHammingDistance(X)
X |
The |
Dependencies: hamming_bitwise from fast_dist_calc.cpp
A length vector of pairwise Hamming distances
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) getHammingDistance(X)
X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) getHammingDistance(X)
Distance Vector CalculatorComputes all pairwise distances for a real matrix
,
for a specified choice of Minkowski norm exponent
.
getLpDistance(X, p)
getLpDistance(X, p)
X |
The |
p |
The power p of |
Dependencies: lp_distance from fast_dist_calc.cpp
A length vector of pairwise
distances
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) getLpDistance(X, p = 2)
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) getLpDistance(X, p = 2)
Computes the p-value of a multivariate dataset , which
informs the user if the sample is exchangeable at a given
significance level, while simultaneously accounting for feature
dependencies. See Aw, Spence and Song (2021) for details.
getPValue( X, block_boundaries = NULL, block_labels = NULL, largeP = FALSE, largeN = FALSE, nruns = 5000, type = "unbiased", p = 2 )
getPValue( X, block_boundaries = NULL, block_labels = NULL, largeP = FALSE, largeN = FALSE, nruns = 5000, type = "unbiased", p = 2 )
X |
The binary or real matrix on which to perform test of exchangeability. |
block_boundaries |
Vector denoting the positions where a new block of non-independent features starts. Default is NULL. |
block_labels |
Length |
largeP |
Boolean indicating whether to use large |
largeN |
Boolean indicating whether to use large |
nruns |
Resampling number for exact test. Default is 5000. |
type |
Either an unbiased estimate of (''unbiased'‘, default), or valid, but biased estimate of, ('’valid'‘) p-value (see Hemerik and Goeman, 2018), or both ('’both'‘). Default is '’unbiased''. |
p |
The power |
Automatically detects if dataset is binary, and runs the Hamming
distance version of test if so. Otherwise, computes the squared
Euclidean distance between samples and evaluates whether the
variance of Euclidean distances, , is atypically large under the
null hypothesis of exchangeability. Note the user may tweak the
choice of power
if they prefer an
distance other than Euclidean.
Under the hood, the variance statistic, , is computed efficiently.
Moreover, the user can specify their choice of block permutations,
large
asymptotics, or large
and large
asymptotics. The latter two
return reasonably accurate p-values for moderately large dimensionalities.
User recommendations: When the number of independent blocks or number of
independent features
is at least 50, it is safe to use large
asymptotics.
If
or
is small, however, stick with permutations.
Dependencies: All functions in auxiliary.R
The p-value to be used to test the null hypothesis of exchangeability.
# Example 1 (get p-value of small matrix with independent features using exact test) suppressWarnings(require(doParallel)) # registerDoParallel(cores = 2) X1 <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix, small getPValue(X1) # perform exact test with 5000 permutations # should be larger than 0.05 # Example 2 (get p-value of high-dim matrix with independent features using asymptotic test) X2 <- matrix(nrow = 10, ncol = 1000, rnorm(1e4)) # real matrix, large enough getPValue(X2, p = 2, largeP = TRUE) # very fast # should be larger than 0.05 # getPValue(X2, p = 2) # slower, do not run (Output: 0.5764) # Example 3 (get p-value of high-dim matrix with partitionable features using exact test) X3 <- matrix(nrow = 10, ncol = 1000, rbinom(1e4, 1, 0.5)) getPValue(X3, block_labels = rep(c(1,2,3,4,5), 200)) # Warning message: # there are features that have zero variation (i.e., all 0s or 1s) # In getPValue(X3, block_labels = rep(c(1, 2, 3, 4, 5), 200)) : # There exist columns with all ones or all zeros for binary X. # Example 4 (get p-value of high-dim matrix with partitionable features using asymptotic test) ## This elaborate example generates binarized versions of time series data. # Helper function to binarize a marker # by converting z-scores to {0,1} based on # standard normal quantiles binarizeMarker <- function(x, freq, ploidy) { if (ploidy == 1) { return((x > qnorm(1-freq)) + 0) } else if (ploidy == 2) { if (x <= qnorm((1-freq)^2)) { return(0) } else if (x <= qnorm(1-freq^2)) { return(1) } else return(2) } else { cat("Specify valid ploidy number, 1 or 2") } } getAutoRegArray <- function(B, N, maf_l = 0.38, maf_u = 0.5, rho = 0.5, ploid = 1) { # get minor allele frequencies by sampling from uniform mafs <- runif(B, min = maf_l, max = maf_u) # get AR array ar_array <- t(replicate(N, arima.sim(n = B, list(ar=rho)))) # theoretical column variance column_var <- 1/(1-rho^2) # rescale so that variance per marker is 1 ar_array <- ar_array / sqrt(column_var) # rescale each column of AR array for (b in 1:B) { ar_array[,b] <- sapply(ar_array[,b], binarizeMarker, freq = mafs[b], ploidy = ploid) } return(ar_array) } ## Function to generate the data array with desired number of samples getExHaplotypes <- function(N) { array <- do.call("cbind", lapply(1:50, function(x) {getAutoRegArray(N, B = 20)})) return(array) } ## Generate data and run test X4 <- getExHaplotypes(10) getPValue(X4, block_boundaries = seq(from = 1, to = 1000, by = 25), largeP = TRUE) # stopImplicitCluster()
# Example 1 (get p-value of small matrix with independent features using exact test) suppressWarnings(require(doParallel)) # registerDoParallel(cores = 2) X1 <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix, small getPValue(X1) # perform exact test with 5000 permutations # should be larger than 0.05 # Example 2 (get p-value of high-dim matrix with independent features using asymptotic test) X2 <- matrix(nrow = 10, ncol = 1000, rnorm(1e4)) # real matrix, large enough getPValue(X2, p = 2, largeP = TRUE) # very fast # should be larger than 0.05 # getPValue(X2, p = 2) # slower, do not run (Output: 0.5764) # Example 3 (get p-value of high-dim matrix with partitionable features using exact test) X3 <- matrix(nrow = 10, ncol = 1000, rbinom(1e4, 1, 0.5)) getPValue(X3, block_labels = rep(c(1,2,3,4,5), 200)) # Warning message: # there are features that have zero variation (i.e., all 0s or 1s) # In getPValue(X3, block_labels = rep(c(1, 2, 3, 4, 5), 200)) : # There exist columns with all ones or all zeros for binary X. # Example 4 (get p-value of high-dim matrix with partitionable features using asymptotic test) ## This elaborate example generates binarized versions of time series data. # Helper function to binarize a marker # by converting z-scores to {0,1} based on # standard normal quantiles binarizeMarker <- function(x, freq, ploidy) { if (ploidy == 1) { return((x > qnorm(1-freq)) + 0) } else if (ploidy == 2) { if (x <= qnorm((1-freq)^2)) { return(0) } else if (x <= qnorm(1-freq^2)) { return(1) } else return(2) } else { cat("Specify valid ploidy number, 1 or 2") } } getAutoRegArray <- function(B, N, maf_l = 0.38, maf_u = 0.5, rho = 0.5, ploid = 1) { # get minor allele frequencies by sampling from uniform mafs <- runif(B, min = maf_l, max = maf_u) # get AR array ar_array <- t(replicate(N, arima.sim(n = B, list(ar=rho)))) # theoretical column variance column_var <- 1/(1-rho^2) # rescale so that variance per marker is 1 ar_array <- ar_array / sqrt(column_var) # rescale each column of AR array for (b in 1:B) { ar_array[,b] <- sapply(ar_array[,b], binarizeMarker, freq = mafs[b], ploidy = ploid) } return(ar_array) } ## Function to generate the data array with desired number of samples getExHaplotypes <- function(N) { array <- do.call("cbind", lapply(1:50, function(x) {getAutoRegArray(N, B = 20)})) return(array) } ## Generate data and run test X4 <- getExHaplotypes(10) getPValue(X4, block_boundaries = seq(from = 1, to = 1000, by = 25), largeP = TRUE) # stopImplicitCluster()
Computes statistic for a real matrix
,
where
= scaled variance of
distances between the
row samples of
.
getRealVStat(X, p)
getRealVStat(X, p)
X |
The |
p |
The power |
Dependencies: getLpDistance
, the variance of the pairwise
distance between samples
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) getRealVStat(X, p = 2)
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) getRealVStat(X, p = 2)
Takes in a binary matrix X, whose transpose t(X) has N rows, and computes a vector recording all N choose 2 pairwise Hamming distances of t(X), ordered lexicographically.
hamming_bitwise(X)
hamming_bitwise(X)
X |
binary matrix (IntegerMatrix class ) |
vector of Hamming distances (NumericVector class)
# t(X) = [[1,0], [0,1], [1,1]] --> output = [2,1,1]
# t(X) = [[1,0], [0,1], [1,1]] --> output = [2,1,1]
Computes the large asymptotic p-value for dataset
,
assuming its
features are independent
indGaussian(X, p = 2)
indGaussian(X, p = 2)
X |
The binary or real matrix on which to perform test of exchangeability |
p |
The power p of |
This is the large and large
asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getCov, getChi2Weights
The asymptotic p-value
Computes the large asymptotic p-value for dataset
,
assuming its
features are independent.
indLargeP(X, p = 2)
indLargeP(X, p = 2)
X |
The binary or real matrix on which to perform test of exchangeability |
p |
The power p of |
This is the large asymptotics of the permutation test.
Dependencies: getBinVStat, getRealVStat, getChi2Weights, weightedChi2P, getCov
The asymptotic p-value
Distance Vector ComputationTakes in a double matrix X, whose transpose t(X)
has N rows, and computes a vector recording all
pairwise
distances of t(X),
ordered lexicographically.
lp_distance(X, p)
lp_distance(X, p)
X |
double matrix (arma::mat class) |
p |
numeric Minkowski power (double class) |
vector of distances (arma::vec class)
# X = [[0.5,0.5],[0,1],[0.3,0.7]] --> lPVec = [x,y,z] # with x = (0.5^p + 0.5^p)
# X = [[0.5,0.5],[0,1],[0.3,0.7]] --> lPVec = [x,y,z] # with x = (0.5^p + 0.5^p)
Generates a new array under the permutation null and then
returns the
statistic computed for
.
naiveBlockPermute1(X, block_labels, p)
naiveBlockPermute1(X, block_labels, p)
X |
The |
block_labels |
A vector of length |
p |
The power |
This is Version 1, which takes in the block labels. It is suitable in
the most general setting, where the features are grouped by labels.
Given original and a list denoting labels of each feature,
independently permutes the rows within each block of
and returns resulting
.
If block labels are not specified, then features are assumed independent, which
is to say that block_labels is set to 1:ncol(
).
Dependencies: getBinVStat, getRealVStat
, where
is a resampled by permutation of entries blockwise
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5), p = 2) # use Euclidean distance X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5))
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5), p = 2) # use Euclidean distance X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example naiveBlockPermute1(X, block_labels = c(1,1,2,2,3,3,4,4,5,5))
Generates a new array under the permutation null and then
returns the
statistic computed for
.
naiveBlockPermute2(X, block_boundaries, p)
naiveBlockPermute2(X, block_boundaries, p)
X |
The |
block_boundaries |
A vector of length at most P, whose entries indicate positions at which to demarcate blocks |
p |
The power p of |
This is Version 2, which takes in the block boundaries. It is suitable
for use when the features are already arranged such that the block
memberships are determined by index delimiters. Given original and
a list denoting labels of each feature, independently permutes the rows
within each block of
and returns resulting
. If block labels are not specified,
then features are assumed independent, which is to say that block_labels is set to 1:ncol(
).
Dependencies: getBinVStat, getRealVStat
, where
is a resampled by permutation of entries blockwise
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example naiveBlockPermute2(X, block_boundaries = c(4,7,9), p = 2) # use Euclidean distance X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example naiveBlockPermute2(X, block_boundaries = c(4,7,9))
X <- matrix(nrow = 5, ncol = 10, rnorm(50)) # real matrix example naiveBlockPermute2(X, block_boundaries = c(4,7,9), p = 2) # use Euclidean distance X <- matrix(nrow = 5, ncol = 10, rbinom(50, 1, 0.5)) # binary matrix example naiveBlockPermute2(X, block_boundaries = c(4,7,9))
Computes where
, where
is chi square distributed with
degrees of freedom,
is chi square distributed with
degrees of freedom,
and
and
are weights with
assumed positive.
The probability is computed using numerical integration of the
densities of the two chi square distributions. (Method: trapezoidal rule)
weightedChi2P(val, w1, w2, d1, d2)
weightedChi2P(val, w1, w2, d1, d2)
val |
observed statistic |
w1 |
weight of first chi square rv |
w2 |
weight of second chi square rv, assumed positive |
d1 |
degrees of freedom of first chi square rv |
d2 |
degrees of freedom of second chi square rv |
This is used in the large asymptotics of the permutation test.
Dependencies: None
1 - CDF = P(X > val)