Title: | Tools Developed for Structured Sufficient Dimension Reduction (sSDR) |
---|---|
Description: | Performs structured OLS (sOLS) and structured SIR (sSIR). |
Authors: | Yang Liu <[email protected]>, Francesca Chiaromonte, Bing Li |
Maintainer: | Yang Liu <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.0 |
Built: | 2025-02-12 04:55:53 UTC |
Source: | https://github.com/cran/sSDR |
Center a vector
center(v)
center(v)
v |
A vector. |
This function centers any vector and returns a vector with mean zero.
A vector with mean zero.
data <- gen.data(n=100) y.centered <- center(data$y)
data <- gen.data(n=100) y.centered <- center(data$y)
Covariance matrix
cov.x(X)
cov.x(X)
X |
a n x p matrix of n observations and p predictors. |
This function returns A p x p covariance matrix for any n x p matrix.
A p x p covariance matrix.
data <- gen.data(n=100) x.cov <- cov.x(data$X)
data <- gen.data(n=100) x.cov <- cov.x(data$X)
Subspace distance
disvm(v1, v2)
disvm(v1, v2)
v1 |
A matrix, each column consists of a p-dimensional vector. |
v2 |
A matrix, each column consists of a p-dimensional vector. |
This function computes the distances between two spaces using the formulation in Li, Zha, Chiaromonte (2005), which is the Frobenius norm of the difference between the two orthogonal projection matrices defined by v1 and v2.
A scaler represents the distance between the two spaces spanned by v1 and v2 respectively.
Li, B., Zha, H., and Chiaromonte, F. (2005). Contour regression: a general approach to dimension reduction. Annals of Statistics, 33(4):1580-1616.
v1 <- c(1, 0, 0) v2 <- c(0, 1, 0) disvm(v1, v1) disvm(v1, v2)
v1 <- c(1, 0, 0) v2 <- c(0, 1, 0) disvm(v1, v1) disvm(v1, v2)
Simulate data
gen.data(n, rho = 0.5, theta = 1, binary = FALSE)
gen.data(n, rho = 0.5, theta = 1, binary = FALSE)
n |
Sample size. |
rho |
Pairwise correlation between covariates. |
theta |
Standard deviation of the random error. |
binary |
If TRUE, generate binary responses; otherwise, by default, create continuous responses. |
This function simulates data as presented in Liu (2015).
gen.data returns a list containning at least the following components: "X", a covariate matrix of n observations and p predictors; "y", a univariate response; "b.true", the actual coefficients for each predictor group.
Liu, Y. (2015). Approaches to reduce and integrate data in structured and high-dimensional regression problems in Genomics. Ph.D. Dissertation, The Pennsylvania State University, University Park, Department of Statistics.
data <- gen.data(n=100) names(data)
data <- gen.data(n=100) names(data)
Groupwise OLS (gOLS)
gOLS(X, Y, groups, dims)
gOLS(X, Y, groups, dims)
X |
A covariate matrix of n observations and p predictors. |
Y |
A univariate response. |
groups |
A vector with the number of predictors in each group. |
dims |
A vector with the dimension (at most 1) for each predictor group. |
This function estimates directions for each predictor group using gOLS. Predictors need to be organized in groups within the "X" matrix, as the same order saved in "groups". We only allow continuous covariates in the "X" matrix; while categorical covariates can be handled outside of gOLS, e.g. structured OLS.
gOLS returns a list containning at least the following components: "b_est", the estimated directions for each group with its own dimension using gOLS AFTER normalization; "B", the estimated directions for each group using gOLS BEFORE normalization.
Liu, Y., Chiaromonte, F., and Li, B. (2015). Structured Ordinary Least Squares: a sufficient dimension reduction approach for regressions with partitioned predictors and heterogeneous units. Submitted.
data <- gen.data(n=1000, binary=FALSE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors dim(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dims <- c(1,1) # dimension of each predictor group est_gOLS <- gOLS(data$X,data$y,groups,dims) names(est_gOLS)
data <- gen.data(n=1000, binary=FALSE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors dim(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dims <- c(1,1) # dimension of each predictor group est_gOLS <- gOLS(data$X,data$y,groups,dims) names(est_gOLS)
Groupwise OLS (gOLS) BIC criterion to estimate dimensions with eigen-decomposition
gOLS.comp.d(X, y, groups)
gOLS.comp.d(X, y, groups)
X |
A covariate matrix of n observations and p predictors. |
y |
A univariate response. |
groups |
A vector with the number of predictors in each group. |
This function estimates dimension for each predictor group using eigen-decomposition. Predictors need to be organized in groups within the "X" matrix, as the same order saved in "groups". We only allow continuous covariates in the "X" matrix; while categorical covariates can be handled outside of gOLS, e.g. structured OLS.
gOLS.comp.d returns a list containning at least the following components: "d", the estimated dimension (at most 1) for each predictor group; "crit", the BIC criterion from each iteration.
Liu, Y., Chiaromonte, F., and Li, B. (2015). Structured Ordinary Least Squares: a sufficient dimension reduction approach for regressions with partitioned predictors and heterogeneous units. Submitted.
data <- gen.data(n=1000, binary=FALSE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors dim(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dim_gOLS<-gOLS.comp.d(data$X,data$y,groups) names(dim_gOLS)
data <- gen.data(n=1000, binary=FALSE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors dim(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dim_gOLS<-gOLS.comp.d(data$X,data$y,groups) names(dim_gOLS)
Groupwise SIR (gSIR) for binary response
gSIR(X, Y, groups, dims)
gSIR(X, Y, groups, dims)
X |
A covariate matrix of n observations and p predictors. |
Y |
A binary response. |
groups |
A vector with the number of predictors in each group. |
dims |
A vector with the dimension (at most 1) for each predictor group. |
This function estimates directions for each predictor group using gSIR. Predictors need to be organized in groups within the "X" matrix, as the same order saved in "groups". We only allow continuous covariates in the "X" matrix; while categorical covariates can be handled outside of gSIR, e.g. structured SIR.
gSIR returns a list containning at least the following components: "b_est", the estimated directions for each group with its own dimension using gSIR AFTER normalization; "B", the estimated directions for each group using gSIR BEFORE normalization.
Guo, Z., Li, L., Lu, W., and Li, B. (2014). Groupwise dimension reduction via envelope method. Journal of the American Statistical Association, accepted.
data <- gen.data(n=1000, binary=TRUE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors length(data$y) # binary response groups <- c(5, 10) # two predictor groups and their numbers of predictors dims <- c(1,1) # dimension of each predictor group est_gSIR<-gSIR(data$X,data$y,groups,dims) names(est_gSIR)
data <- gen.data(n=1000, binary=TRUE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors length(data$y) # binary response groups <- c(5, 10) # two predictor groups and their numbers of predictors dims <- c(1,1) # dimension of each predictor group est_gSIR<-gSIR(data$X,data$y,groups,dims) names(est_gSIR)
Groupwise SIR (gSIR) BIC criterion to estimate dimensions with eigen-decomposition (binary response)
gSIR.comp.d(X, y, groups)
gSIR.comp.d(X, y, groups)
X |
A covariate matrix of n observations and p predictors. |
y |
A binary response. |
groups |
A vector with the number of predictors in each group. |
This function estimates dimension for each predictor group using eigen-decomposition. Predictors need to be organized in groups within the "X" matrix, as the same order saved in "groups". We only allow continuous covariates in the "X" matrix; while categorical covariates can be handled outside of gSIR, e.g. structured SIR.
gSIR.comp.d returns a list containning at least the following components: "d", the estimated dimension (at most 1) for each predictor group; "crit", the BIC criterion from each iteration.
Liu, Y. (2015). Approaches to reduce and integrate data in structured and high-dimensional regression problems in Genomics. Ph.D. Dissertation, The Pennsylvania State University, University Park, Department of Statistics.
data <- gen.data(n=1000, binary=TRUE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors length(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dim_gSIR<-gSIR.comp.d(data$X,data$y,groups) names(dim_gSIR)
data <- gen.data(n=1000, binary=TRUE) # generate data dim(data$X) # covariate matrix of 1000 observations and 15 predictors length(data$y) # univariate response groups <- c(5, 10) # two predictor groups and their numbers of predictors dim_gSIR<-gSIR.comp.d(data$X,data$y,groups) names(dim_gSIR)
Power of a matrix
matpower(X, alpha)
matpower(X, alpha)
X |
A p x p square matrix. |
alpha |
A scaler determining the order of the power. |
This function calculates the power of a square matrix.
A p x p square matrix.
data <- gen.data(n=100) cov.squared <- matpower(cov.x(data$X), 2)
data <- gen.data(n=100) cov.squared <- matpower(cov.x(data$X), 2)
Normalize a vector
norm1(v)
norm1(v)
v |
A vector. |
This function normalizes any non-zero vector and returns a vector with the norm equal to 1.
A vector with norm 1.
data <- gen.data(n=100) y.norm1 <- norm1(data$y)
data <- gen.data(n=100) y.norm1 <- norm1(data$y)
Gram-Schmidt orthonormalization
orthnormal(X)
orthnormal(X)
X |
a n x p matrix of n observations and p predictors. |
This function orthonormalizes any n x p matrix.
A n x p matrix of n observations and p predictors.
data <- gen.data(n=100) x.orth <- orthnormal(data$X)
data <- gen.data(n=100) x.orth <- orthnormal(data$X)
Structured OLS (sOLS) outer level BIC criterion to estimate dimension with eigen-decomposition
sOLS.comp.d(X, sizes)
sOLS.comp.d(X, sizes)
X |
A matrix containing directions estimated from all subpopulations. |
sizes |
A vector with the sample sizes of all subpopulation. |
This function estimates dimension across the subpopulations using eigen-decomposition. The order of the subpopulations in the "sizes" vector should match the one in the "X" matrix. Also, this function returns the linearly independent directions among all subpopulations.
sOLS.comp.d returns a list containning at least the following components: "d", the dimension estimated across subpopulations; "u", the "d" linearly independent directions among the matrix X.
Liu, Y., Chiaromonte, F., and Li, B. (2015). Structured Ordinary Least Squares: a sufficient dimension reduction approach for regressions with partitioned predictors and heterogeneous units. Submitted.
v1 <- c(1, 1, 0, 0) v2 <- c(0, 1, 1, 0) v3 <- c(0, 0, 1, 1) v4 <- c(1, 1, 1, 1) m1 <- cbind(v1, v2) sizes1 <- c(100, 200) sOLS.comp.d(m1, sizes1) m2 <- cbind(v1, v2, v3) sizes2 <- c(100, 200, 500) sOLS.comp.d(m2, sizes2) m3 <- cbind(v1, v3, v4) sizes3 <- c(100, 500, 1000) sOLS.comp.d(m3, sizes3)
v1 <- c(1, 1, 0, 0) v2 <- c(0, 1, 1, 0) v3 <- c(0, 0, 1, 1) v4 <- c(1, 1, 1, 1) m1 <- cbind(v1, v2) sizes1 <- c(100, 200) sOLS.comp.d(m1, sizes1) m2 <- cbind(v1, v2, v3) sizes2 <- c(100, 200, 500) sOLS.comp.d(m2, sizes2) m3 <- cbind(v1, v3, v4) sizes3 <- c(100, 500, 1000) sOLS.comp.d(m3, sizes3)
Matrix standardization
standmat(x)
standmat(x)
x |
A n x p matrix of n observations and p predictors. |
This function standardizes a matrix treating each row as a random vector in an iid sample. It returns a n x p matrix with column-mean zero and identity-covariance matrix.
A n x p matrix of n observations and p predictors.
data <- gen.data(n=100) x.std <- standmat(data$X)
data <- gen.data(n=100) x.std <- standmat(data$X)