conformal.Rd
conformal
is a framework for weighted and unweighted conformal inference for continuous
outcomes. It supports both weighted split conformal inference and weighted CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean estimation.
covariates.
outcome vector.
a string that takes values in {"CQR", "mean"}.
a string that takes values in {"two", "above", "below"}. See Details.
a scalar or a vector of length 2 depending on side
. Used only when type = "CQR"
. See Details.
a function that models the conditional mean/quantiles, or a valid string.
The default is random forest when type = "mean"
and quantile random forest when
type = "CQR"
. See Details.
a list of other parameters to be passed into outfun
.
NULL for unweighted conformal inference, or a function for weighted conformal inference
when useCV = FALSE
, or a list of functions for weighted conformal inference when useCV = TRUE
.
See Details.
FALSE for split conformal inference and TRUE for CV+.
proportion of units for training outfun
. The default if 75%. Used only when useCV = FALSE
.
indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE
.
number of folds. The default is 10. Used only when useCV = TRUE
.
a list of indices of length nfolds
. The default is NULL, generating random indices. Used only when useCV = TRUE
.
a conformalSplit
object when useCV = FALSE
with the following attributes:
Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X
that produces the estimates the conditional
mean or quantiles of X
wtfun, type, side, quantiles, trainprop, trainid: the same as inputs
or a conformalCV
object when useCV = TRUE
with the following attributes:
info: a list of length nfolds
with each element being a list with attributes
Yscore
, wt
and Ymodel
described above for each fold
wtfun, type, side, quantiles, nfolds, idlist: the same as inputs
When side = "two"
, CQR (two-sided) produces intervals in the form of
$$[q_{\alpha_{lo}}(x) - \eta, q_{\alpha_{hi}}(x) + \eta]$$
where \(q_{\alpha_{lo}}(x)\) and \(q_{\alpha_{hi}}(x)\) are estimates of conditional
quantiles of Y given X and the standard conformal inference produces (two-sided) intervals in the form of
$$[m(x) - \eta, m(x) + \eta]$$
where \(m(x)\) is an estimate of conditional mean/median of Y given X. When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
quantiles
should be given when type = "CQR"
. When side = "two"
, quantiles
should be a vector of length 2, giving \(\alpha_{lo}\) and \(\alpha_{hi}\). When side = "above"
or side = "below"
, only one quantile should be given.
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be
passed into outfun
through outparams
.
wtfun
is NULL for unweighted conformal inference. For weighted split conformal inference, it is a
function with a required input X
that produces a vector of non-negative reals of length nrow(X)
.
For weighted CV+, it can be a function as in the case useCV = FALSE
so that the same function will
apply to each fold, or a list of functions of length nfolds
so that wtfun[[k]]
is applied to fold k
.
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)
# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)
# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = NULL, useCV = FALSE)
#> Loading required namespace: grf
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.535434 2.843827
#> 2 -3.795258 2.056821
#> 3 -4.828133 2.730680
#> 4 -1.746706 3.945956
#> 5 -1.949787 2.934958
# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
outfun = "RF", wtfun = NULL, useCV = FALSE)
#> Loading required namespace: randomForest
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -1.7129412 2.1089487
#> 2 -3.3063890 0.5155009
#> 3 -3.2554484 0.5664415
#> 4 -0.4305554 3.3913345
#> 5 -1.4377467 2.3841433
# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -1.990873 2.712992
#> 2 -3.358873 1.512175
#> 3 -5.120096 1.751706
#> 4 -1.107740 3.572100
#> 5 -1.417252 2.552171
# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
outfun = "RF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -1.7869359 2.1462101
#> 2 -3.2330711 0.7126941
#> 3 -3.4989231 0.4552532
#> 4 -0.5622837 3.3968215
#> 5 -1.3402963 2.6123227
# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.007688 2.537079
#> 2 -3.267513 1.455911
#> 3 -5.082839 2.174463
#> 4 -1.201008 3.820256
#> 5 -1.613601 2.643617
# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res <- predict(fit, Xtest, quantiles = quantiles)
if (is.list(res) && !is.data.frame(res)){
res <- res$predictions # for the recent update of \code{grf} package that changes the output format
}
if (length(quantiles) == 1){
res <- as.numeric(res)
} else {
res <- as.matrix(res)
}
return(res)
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = quantRF, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.338700 2.545836
#> 2 -3.488523 1.570238
#> 3 -4.484138 2.012845
#> 4 -1.387026 3.790043
#> 5 -1.614070 2.510640
# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj <- conformal(X, Y, type = "mean",
outfun = linearReg, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -1.53560580 1.8431881
#> 2 -2.95309352 0.4257004
#> 3 -3.89044926 -0.5116553
#> 4 -0.01023004 3.3685639
#> 5 -1.20526029 2.1735336
# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
pnorm(X[, 1])
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.096538 2.965481
#> 2 -3.402480 1.513177
#> 3 -5.023213 1.366197
#> 4 -1.335975 3.870952
#> 5 -1.543778 2.898841
# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.067734 2.421420
#> 2 -3.356540 1.489078
#> 3 -5.116667 2.263265
#> 4 -1.244840 3.788403
#> 5 -1.458652 2.551119
# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.067734 2.421420
#> 2 -3.356540 1.489078
#> 3 -5.116667 2.263265
#> 4 -1.244840 3.788403
#> 5 -1.458652 2.551119