conformalInt is a framework for weighted and unweighted conformal inference for interval outcomes. It supports both weighted split conformal inference and weighted CV+, including weighted Jackknife+ as a special case. For each type, it supports both conformalized quantile regression (CQR) and standard conformal inference based on conditional mean regression.

conformalInt(
  X,
  Y,
  type = c("CQR", "mean"),
  lofun = NULL,
  loquantile = 0.5,
  loparams = list(),
  upfun = NULL,
  upquantile = 0.5,
  upparams = list(),
  wtfun = NULL,
  useCV = FALSE,
  trainprop = 0.75,
  trainid = NULL,
  nfolds = 10,
  idlist = NULL
)

Arguments

X

covariates.

Y

interval outcomes. A matrix with two columns.

type

a string that takes values in {"CQR", "mean"}.

lofun

a function to fit the lower bound, or a valid string. See Details.

loquantile

the quantile to be fit by lofun. Used only when type = "CQR".

loparams

a list of other parameters to be passed into lofun.

upfun

a function to fit the upper bound, or a valid string; see Details.

upquantile

the quantile to be fit by upfun. Used only when type = "CQR".

upparams

a list of other parameters to be passed into upfun.

wtfun

NULL for unweighted conformal inference, or a function for weighted conformal inference when useCV = FALSE, or a list of functions for weighted conformal inference when useCV = TRUE. See Details.

useCV

FALSE for split conformal inference and TRUE for CV+.

trainprop

proportion of units for training outfun. The default it 75%. Used only when useCV = FALSE.

trainid

indices of training units. The default is NULL, generating random indices. Used only when useCV = FALSE.

nfolds

number of folds. The default is 10. Used only when useCV = TRUE.

idlist

a list of indices of length nfolds. The default is NULL, generating random indices. Used only when useCV = TRUE.

Value

a conformalIntSplit object when useCV = FALSE with the following attributes:

  • Yscore: a vector of non-conformity score on the calibration fold

  • wt: a vector of weights on the calibration fold

  • Ymodel: a function with required argument X that produces the estimates the conditional mean or quantiles of X

  • wtfun, type, loquantile, upquantile, trainprop, trainid: the same as inputs

or a conformalIntCV object when useCV = TRUE with the following attributes:

  • info: a list of length nfolds with each element being a list with attributes Yscore, wt and Ymodel described above for each fold

  • wtfun, type, loquantile, upquantile, nfolds, idlist: the same as inputs

Details

The conformal interval for a testing point x is in the form of \([\hat{m}^{L}(x) - \eta, \hat{m}^{R}(x) + \eta]\) where \(\hat{m}^{L}(x)\) is fit by lofun and \(\hat{m}^{R}(x)\) is fit by upfun.

lofun/upfun can be a valid string, including

  • "RF" for random forest that predicts the conditional mean, a wrapper built on randomForest package. Used when type = "mean";

  • "quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on grf package. Used when type = "CQR";

  • "Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm package. Used when type = "mean";

  • "quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on gbm package. Used when type = "CQR";

  • "BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine package. Used when type = "mean";

  • "quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on bartMachine package. Used when type = "CQR";

or a function object whose input must include, but not limited to

  • Y for outcome in the training data;

  • X for covariates in the training data;

  • Xtest for covariates in the testing data.

When type = "CQR", lofun and upfun should also include an argument quantiles that is a scalar. The output of lofun and upfun must be a vector giving the conditional quantile estimate or conditional mean estimate. Other optional arguments can be passed into lofun and upfun through loparams and upparams.

Examples

# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Ylo <- X %*% beta + rnorm(n)
Yup <- Ylo + pmax(1, 2 * rnorm(n))
Y <- cbind(Ylo, Yup)

# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)

# Run unweighted split CQR with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.5571458 3.928145
#> 2  0.2065284 6.665040
#> 3 -4.4381127 1.989639
#> 4  0.4809445 6.779876
#> 5 -1.4292453 5.020169

# Run unweighted standard split conformal inference with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.2383038 3.862934
#> 2  0.3532264 6.401187
#> 3 -4.1811585 1.926886
#> 4  0.9984495 7.026955
#> 5 -1.7229759 4.404223

# Run unweighted CQR-CV+ with the built-in quantile random forest learner
# grf package needs to be installed
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.2983936 3.981516
#> 2  0.5439503 6.660022
#> 3 -4.1578044 2.014425
#> 4  0.7584403 6.882226
#> 5 -1.2587786 4.959964

# Run unweighted standard CV+ with the built-in random forest learner
# randomForest package needs to be installed
obj <- conformalInt(X, Y, type = "mean",
                    lofun = "RF", upfun = "RF",
                    wtfun = NULL, useCV = TRUE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.3403715 3.648372
#> 2  0.4192234 6.349207
#> 3 -4.0142355 1.808601
#> 4  1.1723317 7.124246
#> 5 -1.7601795 4.403348

# Run weighted split CQR with w(x) = pnorm(x1)
wtfun <- function(X){pnorm(X[, 1])}
obj <- conformalInt(X, Y, type = "CQR",
                   lofun = "quantRF", upfun = "quantRF",
                   wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.5268106 4.315319
#> 2 -0.1750260 6.231053
#> 3 -4.2329081 2.224193
#> 4  0.3592586 6.833755
#> 5 -1.6656031 4.726196

# Run unweighted split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
    fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
    res <- predict(fit, Xtest, quantiles = quantiles)
    if (is.list(res) && !is.data.frame(res)){
        res <- res$predictions # for the recent update of \code{grf} package that changes the output format
    }
    if (length(quantiles) == 1){
        res <- as.numeric(res)
    } else {
        res <- as.matrix(res)
    }
    return(res)
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = quantRF, upfun = quantRF,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.6141739 4.010033
#> 2  0.2688345 6.822655
#> 3 -4.2637191 2.273239
#> 4  0.4472719 6.885175
#> 5 -2.0250835 4.824113

# Run unweighted standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
    X <- as.data.frame(X)
    Xtest <- as.data.frame(Xtest)
    data <- data.frame(Y = Y, X)
    fit <- lm(Y ~ ., data = data)
    as.numeric(predict(fit, Xtest))
}
obj <- conformalInt(X, Y, type = "mean",
                    lofun = linearReg, upfun = linearReg,
                    wtfun = NULL, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>       lower     upper
#> 1 -2.271234 2.9187045
#> 2  1.789870 6.9687679
#> 3 -4.671487 0.4865889
#> 4  2.058177 7.2547775
#> 5 -1.255497 4.0560528

# Run weighted split-CQR with user-defined weights
wtfun <- function(X){
    pnorm(X[, 1])
}
obj <- conformalInt(X, Y, type = "CQR",
                    lofun = "quantRF", upfun = "quantRF",
                    wtfun = wtfun, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.7320204 4.046570
#> 2 -0.3524280 6.581274
#> 3 -4.2541273 2.520488
#> 4 -0.1105241 6.688713
#> 5 -1.5985066 5.486312

# Run weighted CQR-CV+ with user-defined weights
# Use a list of identical functions
set.seed(1)
wtfun_list <- lapply(1:10, function(i){wtfun})
obj1 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun_list, useCV = TRUE)
predict(obj1, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.4578761 3.801213
#> 2  0.2964787 6.616022
#> 3 -4.2342074 2.068502
#> 4  0.5935968 6.742122
#> 5 -1.7078680 4.642198

# Use a single function. Equivalent to the above approach
set.seed(1)
obj2 <- conformalInt(X, Y, type = "CQR", 
                     lofun = "quantRF", upfun = "quantRF",
                     wtfun = wtfun, useCV = TRUE)
predict(obj2, Xtest, alpha = 0.1)
#>        lower    upper
#> 1 -2.4578761 3.801213
#> 2  0.2964787 6.616022
#> 3 -4.2342074 2.068502
#> 4  0.5935968 6.742122
#> 5 -1.7078680 4.642198