Creates approximately balanced folds for cross-validation. For numeric outcomes, the response is first discretized into quantile-based groups to preserve the distribution across folds. For categorical outcomes, stratified sampling is applied so that each fold contains approximately the same class proportions.

create_folds(y, k = 10)

Arguments

y

A numeric or categorical response vector.

k

Integer. The number of folds to create.

Value

A named list of integer vectors. Each element contains the row indices for a fold.

Details

Inspired by caret::createFolds().

If y is numeric, the values are grouped into quantile-based intervals before stratification. The number of quantile groups is automatically determined based on the sample size and number of folds.

Fold names are returned in the format "Fold01", "Fold02", etc.

Examples

# Classification example
set.seed(123)
y <- sample(c("A", "B"), size = 100, replace = TRUE)
folds <- create_folds(y, k = 5)

# Regression example
set.seed(123)
y_num <- rnorm(100)
folds_num <- create_folds(y_num, k = 5)