Introduction

The tisthemachinelearner package provides a simple R interface to scikit-learn models through Python’s tisthemachinelearner package. This vignette demonstrates how to use the package with R’s built-in mtcars dataset.

Setup

First, let’s load the required packages:

library(tisthemachinelearner)
#> Loading required package: reticulate
#> Loading required package: Matrix
#> Loading required package: memoise
#> Python environment detected: venv
library(reticulate)

Data Preparation

We’ll use the classic mtcars dataset to predict miles per gallon (mpg) based on other car characteristics:

# Load data

# Split features and target
X <- as.matrix(MASS::Boston[, -14])  # all columns except mpg
y <- MASS::Boston[, 14]              # mpg column

# Create train/test split
set.seed(42)
train_idx <- sample(nrow(X), size = floor(0.8 * nrow(X)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]

Ridge Regression with Cross-Validation

Now let’s try Ridge regression with cross-validation for hyperparameter tuning:

# Fit booster model
time <- proc.time()[3]
reg_booster <- tisthemachinelearner::booster(X_train, y_train, "ExtraTreeRegressor",
                                            n_estimators = 100L,
                                            learning_rate = 0.1,
                                            show_progress = FALSE,
                                            verbose = FALSE, venv_path = "../venv")
time <- proc.time()[3] - time
cat("Time taken:", time, "seconds\n")
#> Time taken: 169.705 seconds

# Make predictions
time <- proc.time()[3]
predictions <- predict(reg_booster, X_test)
time <- proc.time()[3] - time
cat("Time taken:", time, "seconds\n")
#> Time taken: 0.571 seconds

# RMSE 
rmse <- sqrt(mean((y_test - predictions)^2))
cat("RMSE:", rmse, "\n")
#> RMSE: 2.916688

Session Info

sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tisthemachinelearner_0.9.0 memoise_2.0.1             
#> [3] Matrix_1.6-5               reticulate_1.43.0         
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5         knitr_1.50        rlang_1.1.6       xfun_0.52        
#>  [5] png_0.1-8         textshaping_1.0.0 jsonlite_2.0.0    htmltools_0.5.8.1
#>  [9] ragg_1.3.3        sass_0.4.10       rmarkdown_2.29    grid_4.3.3       
#> [13] evaluate_1.0.4    jquerylib_0.1.4   MASS_7.3-60.0.1   fastmap_1.2.0    
#> [17] yaml_2.3.10       lifecycle_1.0.4   compiler_4.3.3    fs_1.6.6         
#> [21] Rcpp_1.1.0        htmlwidgets_1.6.4 rstudioapi_0.17.1 systemfonts_1.1.0
#> [25] lattice_0.22-5    digest_0.6.37     R6_2.6.1          bslib_0.9.0      
#> [29] tools_4.3.3       pkgdown_2.1.1     cachem_1.1.0      desc_1.4.3