Title: | Variable Selection in a Multivariate Linear Model |
---|---|
Description: | It performs variable selection in a multivariate linear model by estimating the covariance matrix of the residuals then use it to remove the dependence that may exist among the responses and eventually performs variable selection by using the Lasso criterion. The method is described in the paper Perrot-Dockès et al. (2017) <arXiv:1704.00076>. |
Authors: | Marie Perrot-Dockès, Céline Lévy-Leduc, Julien Chiquet |
Maintainer: | Marie Perrot-Dockès <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.3 |
Built: | 2024-11-15 03:20:23 UTC |
Source: | https://github.com/cran/MultiVarSel |
MultiVarSel consists of four functions: "whitening.R", "whitening_test.R", "whitening_choice.R" and "variable_selection.R". For further information on how to use these functions, we refer the reader to the vignette of the package.
This package consists of four functions: "whitening.R", "whitening_test.R", "whitening_choice.R" and "variable_selection.R". For further information on how to use these functions, we refer the reader to the vignette of the package.
Marie Perrot-Dockes, Celine Levy-Leduc, Julien Chiquet
Maintainer: Marie Perrot-Dockes <[email protected]>
M. Perrot-Dockes et al. "A multivariate variable selection approach for analyzing LC-MS metabolomics data", arXiv:1704.00076
data("copals_camera") Y <- scale(Y[, 1:50]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals S12_inv <- whitening(residuals, "AR1", pAR = 1, qMA = 0) Frequencies <- variable_selection( Y = Y, X = X, square_root_inv_hat_Sigma = S12_inv, nb_repli = 10, nb.cores = 1, parallel = FALSE ) ## Not run: # Parallel computing require(doMC) registerDoMC(cores=4) Freqs <- variable_selection(Y,X,square_root_inv_hat_Sigma, nb_repli=10,parallel=TRUE,nb.cores=4) ## End(Not run)
data("copals_camera") Y <- scale(Y[, 1:50]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals S12_inv <- whitening(residuals, "AR1", pAR = 1, qMA = 0) Frequencies <- variable_selection( Y = Y, X = X, square_root_inv_hat_Sigma = S12_inv, nb_repli = 10, nb.cores = 1, parallel = FALSE ) ## Not run: # Parallel computing require(doMC) registerDoMC(cores=4) Freqs <- variable_selection(Y,X,square_root_inv_hat_Sigma, nb_repli=10,parallel=TRUE,nb.cores=4) ## End(Not run)
A Liquid Chromatography Mass Spectrometry dataset made of African copals samples.
data("copals_camera")
data("copals_camera")
It containes Y a data frame with 30 observations on 1019 variables and group a qualitative variable indicating the type of tree each row of Y is.
M. Perrot-Dockes et al. "A multivariate variable selection approach for analyzing LC-MS metabolomics data", arXiv:1704.00076 https://arxiv.org/pdf/1704.00076.pdf
data(copals_camera)
data(copals_camera)
This is a qualitative variable indicating the type of tree each row of Y is.
Marie Perrot-Dockes [email protected]
https://arxiv.org/pdf/1704.00076.pdf
This is a dataset containing the abundance of 199 metabolites from 9 seeds samples just after germination. The temperature of seed maturation vary between the different seeds.
Marie Perrot-Dockes [email protected]
This is a dataset containing the abundance of 724 proteins from 9 seeds samples just after germination. The temperature of seed maturation vary between the different seeds.
Marie Perrot-Dockes [email protected]
This function allows the user to select the most relevant variables thanks to the estimation of their selection frequencies obtained by the stability selection approach.
variable_selection(Y, X, square_root_inv_hat_Sigma, nb_repli = 1000, parallel = FALSE, nb.cores = 1)
variable_selection(Y, X, square_root_inv_hat_Sigma, nb_repli = 1000, parallel = FALSE, nb.cores = 1)
Y |
a response matrix |
X |
a matrix of covariables |
square_root_inv_hat_Sigma |
Estimation of the inverse of the square root of the covariance matrix of each row of the residuals matrix obtained by the whitening function. |
nb_repli |
numerical, number of replications in the stability selection |
parallel |
logical, if TRUE then a parallelized version of the code is used |
nb.cores |
numerical, number of cores used |
A data frame containing the selection frequencies of the different variables obtained by the stability selection, the corresponding level in the design matrix and the associated column of the observations matrix.
data("copals_camera") Y <- scale(Y[, 1:50]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals S12_inv <- whitening(residuals, "AR1", pAR = 1, qMA = 0) Frequencies <- variable_selection( Y = Y, X = X, square_root_inv_hat_Sigma = S12_inv, nb_repli = 10, nb.cores = 1, parallel = FALSE )
data("copals_camera") Y <- scale(Y[, 1:50]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals S12_inv <- whitening(residuals, "AR1", pAR = 1, qMA = 0) Frequencies <- variable_selection( Y = Y, X = X, square_root_inv_hat_Sigma = S12_inv, nb_repli = 10, nb.cores = 1, parallel = FALSE )
This function provides an estimation of the inverse of the square root of the covariance matrix of each row of the residuals matrix.
whitening(residuals, typeDep, pAR = 1, qMA = 0)
whitening(residuals, typeDep, pAR = 1, qMA = 0)
residuals |
the residuals matrix obtained by fitting a linear model to each column of the response matrix as if they were independent |
typeDep |
character in c("AR1", "ARMA", "nonparam") defining which type of dependence to use |
pAR |
numerical, only use if typeDep = "ARMA", the parameter p for the ARMA(p, q) process |
qMA |
numerical, only use if typeDep = "ARMA", the parameter q for the ARMA(p, q) process |
It returns the estimation of the inverse of the square root of the covariance matrix of each row of the residuals matrix.
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening(residuals, "AR1")
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening(residuals, "AR1")
This function helps to choose the best whitening strategy among the following types of dependence modellings: AR1, ARMA, non parametric and without any whitening.
whitening_choice(residuals, typeDeps = "AR1", pAR = 1, qMA = 0, threshold = 0.05)
whitening_choice(residuals, typeDeps = "AR1", pAR = 1, qMA = 0, threshold = 0.05)
residuals |
the residuals matrix obtained by fitting a linear model to each column of the response matrix as if they were independent |
typeDeps |
character in c("AR1", "ARMA", "nonparam", "no_whitening") defining which dependence structure to use to whiten the residuals. |
pAR |
numerical, only use if typeDep = "ARMA", the parameter p for the ARMA(p, q) process |
qMA |
numerical, only use if typeDep = "ARMA", the parameter q for the ARMA(p, q) process |
threshold |
significance level of the test |
It provides a table giving the p-values for the different whitening tests applied to the residuals multiplied on the right by the inverse of the square root of the estimated covariance matrix. If the p-value is small (in general smaller than 0.05) it means that the hypothesis that each row of the residuals "whitened" matrix is a white noise, is rejected.
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening_choice(residuals, c("AR1", "nonparam", "ARMA", "no_whitening"), pAR = 1, qMA = 1 )
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening_choice(residuals, c("AR1", "nonparam", "ARMA", "no_whitening"), pAR = 1, qMA = 1 )
This function provides the p-value of an adaptation of the Portmanteau statistic to test if there is some dependence in the rows of the residuals matrix given as an argument of the function.
whitening_test(residuals)
whitening_test(residuals)
residuals |
the residuals matrix obtained by fitting a linear model to each column of the response matrix as if they were independent |
the p-value of a whitening test. If the p-value is small (generally smaller than 0.05) it means that the hypothesis that each row of the residuals matrix is a white noise, is rejected.
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening_test(residuals)
data(copals_camera) Y <- scale(Y[, 1:100]) X <- model.matrix(~ group + 0) residuals <- lm(as.matrix(Y) ~ X - 1)$residuals whitening_test(residuals)
This is a metabolomic dataset from 30 copals samples of trees coming from Africa
Marie Perrot-Dockes [email protected]
https://arxiv.org/pdf/1704.00076.pdf