bayesian.RmdThe tisthemachinelearner package provides a simple R
interface to scikit-learn models through Python’s
tisthemachinelearner package. This vignette demonstrates
how to use the package with R’s built-in mtcars
dataset.
First, let’s load the required packages:
library(tisthemachinelearner)
#> Loading required package: reticulate
#> Loading required package: Matrix
library(reticulate)We’ll use the classic mtcars dataset to predict miles
per gallon (mpg) based on other car characteristics:
# Load data
data(mtcars)
head(mtcars)
#>                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# Split features and target
X <- as.matrix(mtcars[, -1])  # all columns except mpg
y <- mtcars[, 1]              # mpg column
# Create train/test split
set.seed(42)
train_idx <- sample(nrow(mtcars), size = floor(0.8 * nrow(mtcars)))
X_train <- X[train_idx, ]
X_test <- X[-train_idx, ]
y_train <- y[train_idx]
y_test <- y[-train_idx]
# R6 interface
model <- Regressor$new(model_name = "BayesianRidge")
start <- proc.time()[3]
model$fit(X_train, y_train)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.005 seconds
start <- proc.time()[3]
preds <- model$predict(X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.003 seconds
print(preds)
#>           fit       lwr      upr
#> [1,] 20.94090 14.831066 27.05074
#> [2,] 24.01947 18.119071 29.91988
#> [3,] 26.18171 20.377673 31.98574
#> [4,] 26.02409 20.167570 31.88061
#> [5,] 17.79989 11.414835 24.18495
#> [6,] 14.03148  6.834385 21.22857
#> [7,] 21.59554 15.063960 28.12713
model <- Regressor$new(model_name = "ARDRegression")
start <- proc.time()[3]
model$fit(X_train, y_train)
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.011 seconds
start <- proc.time()[3]
preds <- model$predict(X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.022 seconds
print(preds)
#>           fit       lwr      upr
#> [1,] 20.30561 10.130095 30.48113
#> [2,] 22.57227 11.540950 33.60360
#> [3,] 25.68348 16.446742 34.92021
#> [4,] 26.94130 17.062289 36.82032
#> [5,] 18.65716  8.945064 28.36925
#> [6,] 22.64235 10.774920 34.50977
#> [7,] 20.05272  9.285493 30.81995
# S3 interface
start <- proc.time()[3]
model <- regressor(X_train, y_train, model_name = "GaussianProcessRegressor")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.662 seconds
start <- proc.time()[3]
preds <- predict(model, X_test, method="bayesian")
end <- proc.time()[3]
cat("Time taken:", end - start, "seconds\n")
#> Time taken: 0.003 seconds
print(preds)
#>                fit       lwr      upr
#> [1,] 1.823916e-241 -1.959964 1.959964
#> [2,] 3.073627e-245 -1.959964 1.959964
#> [3,]  2.195172e-44 -1.959964 1.959964
#> [4,]  4.246636e-12 -1.959964 1.959964
#> [5,]  3.586947e-42 -1.959964 1.959964
#> [6,]  1.262932e-79 -1.959964 1.959964
#> [7,]  0.000000e+00 -1.959964 1.959964This example demonstrates how to:
The tisthemachinelearner package makes it easy to use
scikit-learn models with R data, combining the familiarity of R data
structures with the power of Python’s machine learning ecosystem.
sessionInfo()
#> R version 4.3.3 (2024-02-29)
#> Platform: x86_64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.2
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tisthemachinelearner_0.3.0 Matrix_1.6-5              
#> [3] reticulate_1.42.0         
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.4         knitr_1.49        rlang_1.1.6       xfun_0.51        
#>  [5] png_0.1-8         textshaping_1.0.0 jsonlite_2.0.0    htmltools_0.5.8.1
#>  [9] ragg_1.3.3        sass_0.4.9        rmarkdown_2.29    grid_4.3.3       
#> [13] evaluate_1.0.3    jquerylib_0.1.4   fastmap_1.2.0     yaml_2.3.10      
#> [17] lifecycle_1.0.4   compiler_4.3.3    fs_1.6.5          htmlwidgets_1.6.4
#> [21] Rcpp_1.0.14       systemfonts_1.1.0 lattice_0.22-5    digest_0.6.37    
#> [25] R6_2.6.1          bslib_0.9.0       tools_4.3.3       pkgdown_2.1.1    
#> [29] cachem_1.1.0      desc_1.4.3