conformalInt.Rd
conformalInt
is a framework for weighted and unweighted conformal inference for interval
outcomes. It supports both weighted split conformal inference and weighted CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
covariates.
interval outcomes. A matrix with two columns.
a string that takes values in {"CQR", "mean"}.
a function to fit the lower bound, or a valid string. See Details.
the quantile to be fit by lofun
. Used only when type = "CQR"
.
a list of other parameters to be passed into lofun
.
a function to fit the upper bound, or a valid string; see Details.
the quantile to be fit by upfun
. Used only when type = "CQR"
.
a list of other parameters to be passed into upfun
.
NULL for unweighted conformal inference, or a function for weighted conformal inference
when useCV = FALSE
, or a list of functions for weighted conformal inference when useCV = TRUE
.
See Details.
FALSE for split conformal inference and TRUE for CV+.
proportion of units for training outfun
. The default it 75%. Used only when useCV = FALSE
.
indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE
.
number of folds. The default is 10. Used only when useCV = TRUE
.
a list of indices of length nfolds
. The default is NULL, generating random indices. Used only when useCV = TRUE
.
a conformalIntSplit
object when useCV = FALSE
with the following attributes:
Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X
that produces the estimates the conditional
mean or quantiles of X
wtfun, type, loquantile, upquantile, trainprop, trainid: the same as inputs
or a conformalIntCV
object when useCV = TRUE
with the following attributes:
info: a list of length nfolds
with each element being a list with attributes
Yscore
, wt
and Ymodel
described above for each fold
wtfun, type, loquantile, upquantile, nfolds, idlist: the same as inputs
The conformal interval for a testing point x is in the form of
\([\hat{m}^{L}(x) - \eta, \hat{m}^{R}(x) + \eta]\) where \(\hat{m}^{L}(x)\) is fit by lofun
and \(\hat{m}^{R}(x)\) is fit by upfun
.
lofun
/upfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
;
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
;
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
;
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
;
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
;
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
;
or a function object whose input must include, but not limited to
Y
for outcome in the training data;
X
for covariates in the training data;
Xtest
for covariates in the testing data.
When type = "CQR"
, lofun
and upfun
should also include an argument quantiles
that is a scalar. The output of lofun
and upfun
must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be
passed into lofun
and upfun
through loparams
and upparams
.
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Ylo <- X %*% beta + rnorm(n)
Yup <- Ylo + pmax(1, 2 * rnorm(n))
Y <- cbind(Ylo, Yup)
# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)
# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.5571458 3.928145
#> 2 0.2065284 6.665040
#> 3 -4.4381127 1.989639
#> 4 0.4809445 6.779876
#> 5 -1.4292453 5.020169
# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
lofun = "RF", upfun = "RF",
wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.2383038 3.862934
#> 2 0.3532264 6.401187
#> 3 -4.1811585 1.926886
#> 4 0.9984495 7.026955
#> 5 -1.7229759 4.404223
# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.2983936 3.981516
#> 2 0.5439503 6.660022
#> 3 -4.1578044 2.014425
#> 4 0.7584403 6.882226
#> 5 -1.2587786 4.959964
# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
lofun = "RF", upfun = "RF",
wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.3403715 3.648372
#> 2 0.4192234 6.349207
#> 3 -4.0142355 1.808601
#> 4 1.1723317 7.124246
#> 5 -1.7601795 4.403348
# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.5268106 4.315319
#> 2 -0.1750260 6.231053
#> 3 -4.2329081 2.224193
#> 4 0.3592586 6.833755
#> 5 -1.6656031 4.726196
# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res <- predict(fit, Xtest, quantiles = quantiles)
if (is.list(res) && !is.data.frame(res)){
res <- res$predictions # for the recent update of \code{grf} package that changes the output format
}
if (length(quantiles) == 1){
res <- as.numeric(res)
} else {
res <- as.matrix(res)
}
return(res)
}
obj <- conformalInt(X, Y, type = "CQR",
lofun = quantRF, upfun = quantRF,
wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.6141739 4.010033
#> 2 0.2688345 6.822655
#> 3 -4.2637191 2.273239
#> 4 0.4472719 6.885175
#> 5 -2.0250835 4.824113
# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj <- conformalInt(X, Y, type = "mean",
lofun = linearReg, upfun = linearReg,
wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.271234 2.9187045
#> 2 1.789870 6.9687679
#> 3 -4.671487 0.4865889
#> 4 2.058177 7.2547775
#> 5 -1.255497 4.0560528
# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
pnorm(X[, 1])
}
obj <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.7320204 4.046570
#> 2 -0.3524280 6.581274
#> 3 -4.2541273 2.520488
#> 4 -0.1105241 6.688713
#> 5 -1.5985066 5.486312
# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.4578761 3.801213
#> 2 0.2964787 6.616022
#> 3 -4.2342074 2.068502
#> 4 0.5935968 6.742122
#> 5 -1.7078680 4.642198
# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformalInt(X, Y, type = "CQR",
lofun = "quantRF", upfun = "quantRF",
wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.4578761 3.801213
#> 2 0.2964787 6.616022
#> 3 -4.2342074 2.068502
#> 4 0.5935968 6.742122
#> 5 -1.7078680 4.642198