conformalCf.Rd
conformalCf
computes intervals for counterfactuals or outcomes with ignorable missing values in general.
It supports both split conformal inference and CV+,
including weighted Jackknife+ as a special case. For each type, it supports both conformalized
quantile regression (CQR) and standard conformal inference based on conditional mean regression.
covariates.
outcome vector with missing values encoded as NA. See Details.
a string that takes values in {"unconditional", "nonmissing", "missing"}. See Details.
a string that takes values in {"CQR", "mean"}.
a string that takes values in {"two", "above", "below"}. See Details.
a scalar or a vector of length 2 depending on side
. Used only when type = "CQR"
. See Details.
a function that models the conditional mean or quantiles, or a valid string.
The default is random forest when type = "mean"
and quantile random forest when
type = "CQR"
. See Details.
a list of other parameters to be passed into outfun
.
a function that models the missing mechanism (probability of missing given X), or a valid string. The default is "Boosting". See Details.
a list of other parameters to be passed into psfun
.
FALSE for split conformal inference and TRUE for CV+.
proportion of units for training outfun
. The default if 75%. Used only when useCV = FALSE
.
number of folds. The default is 10. Used only when useCV = TRUE
.
a conformalSplit
object when useCV = FALSE
or a conformalCV
object
The outcome Y
must comprise both observed values and missing values encoded as NA.
The missing values are used to estimate the propensity score \(P(missing | X)\).
estimand
controls the type of coverage to be guaranteed:
(Default) when estimand = "unconditional"
, the interval has
\(P(Y \in \hat{C}(X))\ge 1 - \alpha\).
When estimand = "nonmissing"
, the interval has
\(P(Y \in \hat{C}(X) | nonmissing) \ge 1 - \alpha\).
When estimand = "missing"
, the interval has
\(P(Y \in \hat{C}(X) | missing) \ge 1 - \alpha\).
When side = "above"
,
intervals are of form [-Inf, a(x)] and when side = "below"
the intervals are of form [a(x), Inf].
outfun
can be a valid string, including
"RF" for random forest that predicts the conditional mean, a wrapper built on randomForest
package.
Used when type = "mean"
.
"quantRF" for quantile random forest that predicts the conditional quantiles, a wrapper built on
grf
package. Used when type = "CQR"
.
"Boosting" for gradient boosting that predicts the conditional mean, a wrapper built on gbm
package. Used when type = "mean"
.
"quantBoosting" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
gbm
package. Used when type = "CQR"
.
"BART" for gradient boosting that predicts the conditional mean, a wrapper built on bartMachine
package. Used when type = "mean"
.
"quantBART" for quantile gradient boosting that predicts the conditional quantiles, a wrapper built on
bartMachine
package. Used when type = "CQR"
.
or a function object whose input must include, but not limited to
Y
for outcome in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
When type = "CQR"
, outfun
should also include an argument quantiles
that is either
a vector of length 2 or a scalar, depending on the argument side
. The output of outfun
must be a matrix with two columns giving the conditional quantile estimates when quantiles
is
a vector of length 2; otherwise, it must be a vector giving the conditional quantile estimate or
conditional mean estimate. Other optional arguments can be passed into outfun
through outparams
.
psfun
can be a valid string, including
"RF" for random forest that predicts the propensity score, a wrapper built on randomForest
package.
Used when type = "mean"
.
"Boosting" for gradient boosting that predicts the propensity score, a wrapper built on gbm
package. Used when type = "mean"
.
or a function object whose input must include, but not limited to
Y
for treatment assignment, a binary vector, in the training data.
X
for covariates in the training data.
Xtest
for covariates in the testing data.
The output of psfun
must be a vector of predicted probabilities. Other optional arguments
can be passed into psfun
through psparams
.
# Generate data from a linear model
set.seed(1)
n <- 1000
d <- 5
X <- matrix(rnorm(n * d), nrow = n)
beta <- rep(1, 5)
Y <- X %*% beta + rnorm(n)
# Generate missing indicators
missing_prob <- pnorm(X[, 1])
if_missing <- missing_prob < runif(n)
Y[if_missing] <- NA
# Generate testing data
ntest <- 5
Xtest <- matrix(rnorm(ntest * d), nrow = ntest)
# Run weighted split CQR
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.694169 2.928830
#> 2 -2.644680 2.680240
#> 3 -4.768709 1.537865
#> 4 -5.136779 3.604356
#> 5 -3.036140 2.662865
# Run weighted standard conformal inference
obj <- conformalCf(X, Y, type = "mean",
outfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.554873 2.0402082
#> 2 -2.904058 1.7288508
#> 3 -5.072992 -0.4400834
#> 4 -4.330321 1.2443807
#> 5 -2.483102 2.1119798
# Run one-sided weighted split CQR
obj1 <- conformalCf(X, Y, type = "CQR", side = "above",
quantiles = 0.95, outfun = "quantRF", useCV = FALSE)
predict(obj1, Xtest, alpha = 0.1)
#> lower upper
#> 1 -Inf 2.2160363
#> 2 -Inf 0.7140021
#> 3 -Inf -0.1788658
#> 4 -Inf 3.2922861
#> 5 -Inf 1.4599988
obj2 <- conformalCf(X, Y, type = "CQR", side = "below",
quantiles = 0.05, outfun = "quantRF", useCV = FALSE)
predict(obj2, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.168817 Inf
#> 2 -1.705976 Inf
#> 3 -4.792848 Inf
#> 4 -5.133214 Inf
#> 5 -2.333960 Inf
# Run split CQR with a self-defined quantile random forest
# Y, X, Xtest, quantiles should be included in the inputs
quantRF <- function(Y, X, Xtest, quantiles, ...){
fit <- grf::quantile_forest(X, Y, quantiles = quantiles, ...)
res <- predict(fit, Xtest, quantiles = quantiles)
if (is.list(res) && !is.data.frame(res)){
res <- res$predictions # for the recent update of \code{grf} package that changes the output format
}
if (length(quantiles) == 1){
res <- as.numeric(res)
} else {
res <- as.matrix(res)
}
return(res)
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = quantRF, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.761585 3.086488
#> 2 -2.197869 2.702938
#> 3 -5.121073 1.504797
#> 4 -5.110667 2.836832
#> 5 -3.052736 2.679462
# Run standard split conformal inference with a self-defined linear regression
# Y, X, Xtest should be included in the inputs
linearReg <- function(Y, X, Xtest){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- lm(Y ~ ., data = data)
as.numeric(predict(fit, Xtest))
}
obj <- conformalCf(X, Y, type = "mean",
outfun = linearReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -1.786284 1.3121660
#> 2 -3.318391 -0.1431223
#> 3 -5.035091 -1.9366408
#> 4 -4.353584 -0.8314229
#> 5 -2.067116 1.0313344
# Run split CQR with a built-in psfun
# Y, X, Xtest, should be included in the inputs
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = "RF", useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.368242 2.929165
#> 2 -2.247054 2.731080
#> 3 -4.193513 1.532939
#> 4 -4.795010 2.283152
#> 5 -2.895413 2.522138
# Run split CQR with a self-defined function to estimate propensity scores
# Y, X, Xtest, should be included in the inputs
logitReg <- function(Y, X, Xtest, ...){
X <- as.data.frame(X)
Xtest <- as.data.frame(Xtest)
data <- data.frame(Y = Y, X)
fit <- glm(Y ~ ., data = data, family = "binomial", ...)
as.numeric(predict(fit, Xtest, type = "response"))
}
obj <- conformalCf(X, Y, type = "CQR", quantiles = c(0.05, 0.95),
outfun = "quantRF", psfun = logitReg, useCV = FALSE)
predict(obj, Xtest, alpha = 0.1)
#> lower upper
#> 1 -2.571811 3.460430
#> 2 -3.163423 3.629465
#> 3 -5.717934 2.079342
#> 4 -6.034055 4.501632
#> 5 -3.048383 2.546090