| Title: | High-Dimensional Principal Fitted Components and Abundant Regression |
|---|---|
| Description: | Fit and predict with the high-dimensional principal fitted components model. This model is described by Cook, Forzani, and Rothman (2012) <doi:10.1214/11-AOS962>. |
| Authors: | Adam J. Rothman |
| Maintainer: | Adam J. Rothman <[email protected]> |
| License: | GPL-2 |
| Version: | 1.2 |
| Built: | 2026-06-08 09:43:42 UTC |
| Source: | https://github.com/cran/abundant |
Fit and predict with the high-dimensional principal fitted components model.
The main functions are fit.pfc, pred.response.
Adam J. Rothman
Maintainer: Adam J. Rothman <[email protected]>
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
Let denote the measurements of
the predictor and response, where and .
The model assumes that these measurements
are a realization
of independent copies of
the random vector , where
; with rank ;
with rank ; is a known
vector valued function; ;
; and is independent of .
The central subspace is .
This function computes estimates of these model parameters
by imposing constraints for identifiability.
The mean parameters and
are estimated with and
.
Let ,
which we require to be positive definite.
Given a user-specified weight matrix ,
let
subject to the constraints that is diagonal and
. The sufficient reduction estimate
is defined by
fit.pfc(X, y, r=4, d=NULL, F.user=NULL, weight.type=c("sample", "diag", "L1"), lam.vec=NULL, kfold=5, silent=TRUE, qrtol=1e-10, cov.tol=1e-4, cov.maxit=1e3, NPERM=1e3, level=0.01)fit.pfc(X, y, r=4, d=NULL, F.user=NULL, weight.type=c("sample", "diag", "L1"), lam.vec=NULL, kfold=5, silent=TRUE, qrtol=1e-10, cov.tol=1e-4, cov.maxit=1e3, NPERM=1e3, level=0.01)
X |
The predictor matrix with |
y |
The vector of measured responses with |
r |
When polynomial basis functions are used (which is the case when |
d |
The dimension of the central subspace defined above. This must be specified by the user
when |
F.user |
A matrix with |
weight.type |
The type of weight matrix estimate
|
lam.vec |
A vector of candidate tuning parameter values to use when |
kfold |
The number of folds to use in cross-validation to select the optimal tuning parameter when |
silent |
Logical. When |
qrtol |
The tolerance for calls to |
cov.tol |
The convergence tolerance for the QUIC algorithm used when |
cov.maxit |
The maximum number of iterations allowed for the QUIC algorithm used when |
NPERM |
The number of permutations to used in the sequential permutation testing procedure to select |
level |
The significance level to use to terminate the sequential permutation testing procedure to select |
See Cook, Forzani, and Rothman (2012) more information.
A list with
Gamhat |
this is |
bhat |
this is |
Rmat |
this is |
What |
this is |
d |
this is |
r |
this is |
GWG |
this is |
fc |
a matrix with |
Xc |
a matrix with |
y |
the vector of |
mx |
this is |
mf |
this is |
best.lam |
this is selected tuning parameter value used when |
lam.vec |
this is the vector of candidate tuning parameter values used when
|
err.vec |
this is the vector of validation errors from cross validation, one error for each entry in |
test.info |
a dataframe that summarizes the results from the sequential testing procedure. Will be |
Adam J. Rothman
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
Friedman, J., Hastie, T., and Tibshirani R. (2008). Sparse inverse covariance estimation with the lasso. Biostatistics 9(3), 432-441.
set.seed(1) n=20 p=30 d=2 y=sqrt(12)*runif(n) Gam=matrix(rnorm(p*d), nrow=p, ncol=d) beta=diag(2) E=matrix(0.5*rnorm(n*p), nrow=n, ncol=p) V=matrix(c(1, sqrt(12), sqrt(12), 12.8), nrow=2, ncol=2) tmp=eigen(V, symmetric=TRUE) V.msqrt=tcrossprod(tmp$vec*rep(tmp$val^(-0.5), each=2), tmp$vec) Fyc=cbind(y-sqrt(3),y^2-4)%*%V.msqrt X=0+Fyc%*%t(beta)%*%t(Gam) + E fit=fit.pfc(X=X, y=y, r=3, weight.type="sample") ## display hypothesis testing information for selecting d fit$test.info ## make a response versus fitted values plot plot(pred.response(fit), y)set.seed(1) n=20 p=30 d=2 y=sqrt(12)*runif(n) Gam=matrix(rnorm(p*d), nrow=p, ncol=d) beta=diag(2) E=matrix(0.5*rnorm(n*p), nrow=n, ncol=p) V=matrix(c(1, sqrt(12), sqrt(12), 12.8), nrow=2, ncol=2) tmp=eigen(V, symmetric=TRUE) V.msqrt=tcrossprod(tmp$vec*rep(tmp$val^(-0.5), each=2), tmp$vec) Fyc=cbind(y-sqrt(3),y^2-4)%*%V.msqrt X=0+Fyc%*%t(beta)%*%t(Gam) + E fit=fit.pfc(X=X, y=y, r=3, weight.type="sample") ## display hypothesis testing information for selecting d fit$test.info ## make a response versus fitted values plot plot(pred.response(fit), y)
Let denote the values of the predictors.
This function computes using equation (8.1)
of Cook, Forzani, and Rothman (2012).
pred.response(fit, newx=NULL)pred.response(fit, newx=NULL)
fit |
The object returned by |
newx |
A matrix with |
See Cook, Forzani, and Rothman (2012) for more information.
A vector of response prediction with nrow(newx) entries.
Adam J. Rothman
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
set.seed(1) n=25 p=50 d=1 true.G = matrix(rnorm(p*d), nrow=p, ncol=d) y=rnorm(n) fy = y E=matrix(rnorm(n*p), nrow=n, ncol=p) X=fy%*%t(true.G) + E fit=fit.pfc(X=X, r=4, d=d, y=y, weight.type="diag") fitted.values=pred.response(fit) mean((y-fitted.values)^2) plot(fitted.values, y) n.new=100 y.new=rnorm(n.new) fy.new=y.new E.new=matrix(rnorm(n.new*p), nrow=n.new, ncol=p) X.new = fy.new%*%t(true.G) + E.new mean((y.new - pred.response(fit, newx=X.new))^2)set.seed(1) n=25 p=50 d=1 true.G = matrix(rnorm(p*d), nrow=p, ncol=d) y=rnorm(n) fy = y E=matrix(rnorm(n*p), nrow=n, ncol=p) X=fy%*%t(true.G) + E fit=fit.pfc(X=X, r=4, d=d, y=y, weight.type="diag") fitted.values=pred.response(fit) mean((y-fitted.values)^2) plot(fitted.values, y) n.new=100 y.new=rnorm(n.new) fy.new=y.new E.new=matrix(rnorm(n.new*p), nrow=n.new, ncol=p) X.new = fy.new%*%t(true.G) + E.new mean((y.new - pred.response(fit, newx=X.new))^2)