Introduction

The tisthemachinelearner package provides a simple R interface to scikit-learn models through Python’s tisthemachinelearner package. This vignette demonstrates how to use the package with R’s built-in mtcars dataset.

Setup

First, let’s load the required packages:

library(tisthemachinelearner)
#> Loading required package: reticulate
#> Loading required package: Matrix
library(reticulate)

Data Preparation

We’ll use the classic mtcars dataset to predict miles per gallon (mpg) based on other car characteristics:

# Load data
data(mtcars)
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

# Split features and target
X <- as.matrix(mtcars[, -1])  # all columns except mpg
y <- mtcars[, 1]              # mpg column

# Create train/test split
set.seed(42)
train_idx <- sample(nrow(mtcars), size = floor(0.8 * nrow(mtcars)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]

# R6 interface
model <- Regressor$new(model_name = "BayesianRidge")
start <- proc.time()[3]
model$fit(X_train, y_train)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.005 seconds

start <- proc.time()[3]
preds <- model$predict(X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.003 seconds
print(preds)
#>           fit       lwr      upr
#> [1,] 20.94090 14.831066 27.05074
#> [2,] 24.01947 18.119071 29.91988
#> [3,] 26.18171 20.377673 31.98574
#> [4,] 26.02409 20.167570 31.88061
#> [5,] 17.79989 11.414835 24.18495
#> [6,] 14.03148  6.834385 21.22857
#> [7,] 21.59554 15.063960 28.12713

model <- Regressor$new(model_name = "ARDRegression")
start <- proc.time()[3]
model$fit(X_train, y_train)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.011 seconds

start <- proc.time()[3]
preds <- model$predict(X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.022 seconds
print(preds)
#>           fit       lwr      upr
#> [1,] 20.30561 10.130095 30.48113
#> [2,] 22.57227 11.540950 33.60360
#> [3,] 25.68348 16.446742 34.92021
#> [4,] 26.94130 17.062289 36.82032
#> [5,] 18.65716  8.945064 28.36925
#> [6,] 22.64235 10.774920 34.50977
#> [7,] 20.05272  9.285493 30.81995


# S3 interface
start <- proc.time()[3]
model <- regressor(X_train, y_train, model_name = "GaussianProcessRegressor")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.662 seconds

start <- proc.time()[3]
preds <- predict(model, X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.003 seconds
print(preds)
#>                fit       lwr      upr
#> [1,] 1.823916e-241 -1.959964 1.959964
#> [2,] 3.073627e-245 -1.959964 1.959964
#> [3,]  2.195172e-44 -1.959964 1.959964
#> [4,]  4.246636e-12 -1.959964 1.959964
#> [5,]  3.586947e-42 -1.959964 1.959964
#> [6,]  1.262932e-79 -1.959964 1.959964
#> [7,]  0.000000e+00 -1.959964 1.959964

Conclusion

This example demonstrates how to:

  1. Prepare R data for use with the regressor
  2. Fit different types of regression models
  3. Make predictions on new data
  4. Calculate and compare model performance
  5. Visualize results

The tisthemachinelearner package makes it easy to use scikit-learn models with R data, combining the familiarity of R data structures with the power of Python’s machine learning ecosystem.

Session Info

sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tisthemachinelearner_0.3.0 Matrix_1.6-5              
#> [3] reticulate_1.42.0         
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.4         knitr_1.49        rlang_1.1.6       xfun_0.51        
#>  [5] png_0.1-8         textshaping_1.0.0 jsonlite_2.0.0    htmltools_0.5.8.1
#>  [9] ragg_1.3.3        sass_0.4.9        rmarkdown_2.29    grid_4.3.3       
#> [13] evaluate_1.0.3    jquerylib_0.1.4   fastmap_1.2.0     yaml_2.3.10      
#> [17] lifecycle_1.0.4   compiler_4.3.3    fs_1.6.5          htmlwidgets_1.6.4
#> [21] Rcpp_1.0.14       systemfonts_1.1.0 lattice_0.22-5    digest_0.6.37    
#> [25] R6_2.6.1          bslib_0.9.0       tools_4.3.3       pkgdown_2.1.1    
#> [29] cachem_1.1.0      desc_1.4.3