Conformal inference for continuous outcomes

conformal is a framework for weighted and unweighted conformal inference for continuous outcomes. It supports both weighted split conformal inference and weighted CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean estimation.

conformal(
  X,
  Y,
  type = c("CQR", "mean"),
  side = c("two", "above", "below"),
  quantiles = NULL,
  outfun = NULL,
  outparams = list(),
  wtfun = NULL,
  useCV = FALSE,
  trainprop = 0.75,
  trainid = NULL,
  nfolds = 10,
  idlist = NULL
)

Arguments

X: covariates.
Y: outcome vector.
type: a string that takes values in {"CQR", "mean"}.
side: a string that takes values in {"two", "above", "below"}. See Details.
quantiles: a scalar or a vector of length 2 depending on side. Used only when type = "CQR". See Details.
outfun: a function that models the conditional mean/quantiles, or a valid string. The default is random forest when type = "mean" and quantile random forest when type = "CQR". See Details.
outparams: a list of other parameters to be passed into outfun.
wtfun: NULL for unweighted conformal inference, or a function for weighted conformal inference when useCV = FALSE, or a list of functions for weighted conformal inference when useCV = TRUE. See Details.
useCV: FALSE for split conformal inference and TRUE for CV+.
trainprop: proportion of units for training outfun. The default if 75%. Used only when useCV = FALSE.
trainid: indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE.
nfolds: number of folds. The default is 10. Used only when useCV = TRUE.
idlist: a list of indices of length nfolds. The default is NULL, generating random indices. Used only when useCV = TRUE.

Value

a conformalSplit object when useCV = FALSE with the following attributes:

Yscore: a vector of non-conformity score on the calibration fold
wt: a vector of weights on the calibration fold
Ymodel: a function with required argument X that produces the estimates the conditional mean or quantiles of X
wtfun, type, side, quantiles, trainprop, trainid: the same as inputs

or a conformalCV object when useCV = TRUE with the following attributes:

info: a list of length nfolds with each element being a list with attributes Yscore, wt and Ymodel described above for each fold
wtfun, type, side, quantiles, nfolds, idlist: the same as inputs

Details

When side = "two", CQR (two-sided) produces intervals in the form of $$[q_{\alpha_{lo}}(x) - \eta, q_{\alpha_{hi}}(x) + \eta]$$ where $q_{\alpha_{lo}}(x)$ and $q_{\alpha_{hi}}(x)$ are estimates of conditional quantiles of Y given X and the standard conformal inference produces (two-sided) intervals in the form of $$[m(x) - \eta, m(x) + \eta]$$ where $m(x)$ is an estimate of conditional mean/median of Y given X. When side = "above", intervals are of form [-Inf, a(x)] and when side = "below" the intervals are of form [a(x), Inf].

quantiles should be given when type = "CQR". When side = "two", quantiles should be a vector of length 2, giving $\alpha_{lo}$ and $\alpha_{hi}$. When side = "above" or side = "below", only one quantile should be given.

outfun can be a valid string, including

"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest package. Used when type = "mean".
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on grf package. Used when type = "CQR".
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm package. Used when type = "mean".
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on gbm package. Used when type = "CQR".
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine package. Used when type = "mean".
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on bartMachine package. Used when type = "CQR".

or a function object whose input must include, but not limited to

Y for outcome in the training data.
X for covariates in the training data.
Xtest for covariates in the testing data.

When type = "CQR", outfun should also include an argument quantiles that is either a vector of length 2 or a scalar, depending on the argument side. The output of outfun must be a matrix with two columns giving the conditional quantile estimates when quantiles is a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into outfun through outparams.

wtfun is NULL for unweighted conformal inference. For weighted split conformal inference, it is a function with a required input X that produces a vector of non-negative reals of length nrow(X). For weighted CV+, it can be a function as in the case useCV = FALSE so that the same function will apply to each fold, or a list of functions of length nfolds so that wtfun[[k]] is applied to fold k.

Examples

# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                 outfun = "quantRF", wtfun = NULL, useCV = FALSE)
#> Loading required namespace: grf
predict(obj, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.535434 2.843827
#> 2 -3.795258 2.056821
#> 3 -4.828133 2.730680
#> 4 -1.746706 3.945956
#> 5 -1.949787 2.934958

# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
                 outfun = "RF", wtfun = NULL, useCV = FALSE)
#> Loading required namespace: randomForest
predict(obj, Xtest, alpha = 0.1)
#>        lower     upper
#> 1 -1.7129412 2.1089487
#> 2 -3.3063890 0.5155009
#> 3 -3.2554484 0.5664415
#> 4 -0.4305554 3.3913345
#> 5 -1.4377467 2.3841433

# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                 outfun = "quantRF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -1.990873 2.712992
#> 2 -3.358873 1.512175
#> 3 -5.120096 1.751706
#> 4 -1.107740 3.572100
#> 5 -1.417252 2.552171

# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformal(X, Y, type = "mean",
                 outfun = "RF", wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#>        lower     upper
#> 1 -1.7869359 2.1462101
#> 2 -3.2330711 0.7126941
#> 3 -3.4989231 0.4552532
#> 4 -0.5622837 3.3968215
#> 5 -1.3402963 2.6123227

# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                 outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.007688 2.537079
#> 2 -3.267513 1.455911
#> 3 -5.082839 2.174463
#> 4 -1.201008 3.820256
#> 5 -1.613601 2.643617

# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (is.list(res) && !is.data.frame(res)){
        res <- res$predictions # for the recent update of \code{grf} package that changes the output format
    }
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                 outfun = quantRF, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.338700 2.545836
#> 2 -3.488523 1.570238
#> 3 -4.484138 2.012845
#> 4 -1.387026 3.790043
#> 5 -1.614070 2.510640

# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformal(X, Y, type = "mean",
                 outfun = linearReg, wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>         lower      upper
#> 1 -1.53560580  1.8431881
#> 2 -2.95309352  0.4257004
#> 3 -3.89044926 -0.5116553
#> 4 -0.01023004  3.3685639
#> 5 -1.20526029  2.1735336

# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
    pnorm(X[, 1])
}
obj <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                 outfun = "quantRF", wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.096538 2.965481
#> 2 -3.402480 1.513177
#> 3 -5.023213 1.366197
#> 4 -1.335975 3.870952
#> 5 -1.543778 2.898841

# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                  outfun = "quantRF", wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.067734 2.421420
#> 2 -3.356540 1.489078
#> 3 -5.116667 2.263265
#> 4 -1.244840 3.788403
#> 5 -1.458652 2.551119

# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformal(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
                  outfun = "quantRF", wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)
#>       lower    upper
#> 1 -2.067734 2.421420
#> 2 -3.356540 1.489078
#> 3 -5.116667 2.263265
#> 4 -1.244840 3.788403
#> 5 -1.458652 2.551119

Conformal inference for continuous outcomes

Arguments

Value

Details

See also

Examples