Split data into training and test sets — train_test_split • unifiedml

Randomly splits a feature matrix or data.frame and its corresponding response vector into training and test subsets.

train_test_split(X, y, test_size = 0.2, seed = NULL)

Arguments

X: A matrix or data.frame of features.
y: A vector of responses (numeric or factor). Must have the same number of rows as X.
test_size: Proportion of observations to use as the test set. A number in (0, 1). Default is 0.2 (80/20 split).
seed: An optional integer random seed for reproducibility. If NULL (default) the current RNG state is used.

Value

A named list with four elements:

X_train: Training features (same type as X).
X_test: Test features (same type as X).
y_train: Training response.
y_test: Test response.

Examples

# matrix input
X <- iris[, 1:4]
y <- iris$Species
d <- unifiedml::train_test_split(X, y, test_size = 0.3, seed = 42)
dim(d$X_train)  # 105 x 4
#> [1] 105   4
dim(d$X_test)   #  45 x 4
#> [1] 45  4

# data.frame input
d2 <- unifiedml::train_test_split(iris[, 1:4], iris$Species, test_size = 0.2)
is.data.frame(d2$X_train)  # TRUE
#> [1] TRUE