intro_S3_calib.Rmd
The tisthemachinelearner
package provides a simple R
interface to scikit-learn models through Python’s
tisthemachinelearner
package. This vignette demonstrates
how to use the package with R’s built-in mtcars
dataset.
We’ll use the classic mtcars
dataset to predict miles
per gallon (mpg) based on other car characteristics:
# Load data
# Split features and target
X <- as.matrix(MASS::Boston[, -14]) # all columns except mpg
y <- MASS::Boston[, 14] # mpg column
# Create train/test split
set.seed(42)
train_idx <- sample(nrow(X), size = floor(0.8 * nrow(X)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]
Now let’s try Ridge regression with cross-validation for hyperparameter tuning:
# Fit ridge regression model
reg_ridge <- tisthemachinelearner::regressor(X_train, y_train, "Ridge",
#alphas = c(0.01, 0.1, 1, 10),
calibration = TRUE)
# Make predictions
predictions_ridge_splitconformal <- predict(reg_ridge, X_test, method = "splitconformal")
predictions_ridge_surrogate <- predict(reg_ridge, X_test, method = "surrogate")
#> Registered S3 method overwritten by 'quantmod':
#> method from
#> as.zoo.data.frame zoo
predictions_ridge_bootstrap <- predict(reg_ridge, X_test, method = "bootstrap")
# Calculate coverage
coverage_ridge_splitconformal <- mean(y_test >= predictions_ridge_splitconformal[, "lwr"] & y_test <= predictions_ridge_splitconformal[, "upr"])
coverage_ridge_surrogate <- mean(y_test >= predictions_ridge_surrogate[, "lwr"] & y_test <= predictions_ridge_surrogate[, "upr"])
coverage_ridge_bootstrap <- mean(y_test >= predictions_ridge_bootstrap[, "lwr"] & y_test <= predictions_ridge_bootstrap[, "upr"])
cat("Ridge Regression Split Conformal Coverage:", coverage_ridge_splitconformal, "\n")
#> Ridge Regression Split Conformal Coverage: 0.9411765
cat("Ridge Regression Surrogate Coverage:", coverage_ridge_surrogate, "\n")
#> Ridge Regression Surrogate Coverage: 0.8921569
cat("Ridge Regression Bootstrap Coverage:", coverage_ridge_bootstrap, "\n")
#> Ridge Regression Bootstrap Coverage: 0.9117647
sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] reticulate_1.41.0 tisthemachinelearner_0.3.1
#>
#> loaded via a namespace (and not attached):
#> [1] cli_3.6.4 knitr_1.49 rlang_1.1.5 xfun_0.50
#> [5] png_0.1-8 textshaping_1.0.0 jsonlite_1.9.0 zoo_1.8-12
#> [9] TTR_0.24.4 xts_0.14.1 htmltools_0.5.8.1 ragg_1.3.3
#> [13] sass_0.4.9 rmarkdown_2.29 quadprog_1.5-8 grid_4.3.3
#> [17] evaluate_1.0.3 jquerylib_0.1.4 MASS_7.3-60.0.1 fastmap_1.2.0
#> [21] yaml_2.3.10 lifecycle_1.0.4 compiler_4.3.3 fs_1.6.5
#> [25] Rcpp_1.0.14 htmlwidgets_1.6.4 systemfonts_1.1.0 lattice_0.22-5
#> [29] digest_0.6.37 R6_2.6.1 curl_6.2.0 quantmod_0.4.26
#> [33] bslib_0.9.0 Matrix_1.6-5 tools_4.3.3 tseries_0.10-58
#> [37] pkgdown_2.1.1 cachem_1.1.0 desc_1.4.3