mlarch.Rmd
Probabilistic stock forecasting often relies on parametric models
like ARIMA for the mean and GARCH for volatility. The
mlarchf
function in the ahead
package offers a
flexible hybrid alternative to ARMA-GARCH by combining machine learning
approaches with ARCH effects.
The model decomposes the time series into two components:
where:
The key innovation is using machine learning methods
and conformal prediction to model the volatility component, allowing for
more flexible and potentially more accurate volatility forecasts than
traditional GARCH models. The function supports various machine learning
methods through parameters fit_func
and
predict_func
as in other ahead
models, and
through the caret
package.
The forecasting process involves:
auto.arima
)(basically a supervised regression of squared residuals on their lags) is a good approximation of the latent conditional volatility
This new approach combines the interpretability of traditional time series models with the flexibility of machine learning, while maintaining proper uncertainty quantification through conformal prediction.
Let’s start with a simple example using the Google stock price data
from the fpp2
package:
y <- fpp2::goog200
# Default model for volatility (Ridge regression for volatility)
(obj_ridge <- ahead::mlarchf(y, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 532.8683 510.6072 549.0701
## 202 534.1620 510.1836 555.2200
## 203 537.8519 455.6482 621.4205
## 204 535.4597 523.3184 548.0685
## 205 536.2701 515.9482 557.4350
## 206 536.8562 516.9855 553.6221
## 207 537.0966 522.2134 548.3146
## 208 536.8214 509.8663 550.7663
## 209 538.5965 519.8896 550.6361
## 210 538.6741 514.6596 558.2802
## 211 539.4124 510.6087 557.9464
## 212 541.5197 524.6351 558.9229
## 213 541.3757 514.1783 567.5858
## 214 542.3874 520.2966 559.7851
## 215 542.7441 515.4962 562.8581
## 216 543.6913 518.4352 562.3159
## 217 544.1679 531.4607 553.9185
## 218 546.0748 513.9685 582.5379
## 219 545.6681 511.9970 571.1057
## 220 546.4334 524.3504 566.2234
The package supports various machine learning methods for volatility modeling. Here are some examples:
# Random Forest
(obj_rf <- ahead::mlarchf(y, fit_func = randomForest::randomForest,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 532.9943 514.9818 546.3583
## 202 535.6430 473.9838 583.8987
## 203 536.6626 516.8293 586.4722
## 204 537.2098 512.6205 563.7898
## 205 535.3415 528.9124 541.6989
## 206 538.4757 504.1888 565.2346
## 207 536.5940 529.0149 541.9653
## 208 536.8695 456.9157 560.6926
## 209 538.3905 528.9728 544.5427
## 210 539.0829 488.7497 569.5273
## 211 539.1325 530.3422 544.4237
## 212 542.0685 521.4105 561.6385
## 213 541.0877 533.4657 548.2760
## 214 542.6126 507.4853 571.4893
## 215 542.0972 533.1655 547.1384
## 216 545.2356 495.8112 578.3687
## 217 543.8493 538.2177 547.9920
## 218 551.4629 485.5071 721.1598
## 219 544.8177 535.2052 554.2283
## 220 547.5296 508.4202 575.9079
# Support Vector Machine
(obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 532.4934 522.4649 539.7515
## 202 533.7948 517.0382 548.4818
## 203 535.8943 491.5423 582.5843
## 204 535.3236 524.5294 546.6604
## 205 535.8114 522.6526 549.5445
## 206 536.6484 520.1038 550.2097
## 207 536.8970 526.1589 544.8919
## 208 536.8676 515.4748 547.6928
## 209 538.3806 524.2927 547.3659
## 210 539.1035 517.5743 557.9289
## 211 539.5177 515.8533 554.2971
## 212 541.1323 528.2824 555.1481
## 213 541.1741 519.9923 560.9940
## 214 542.0895 524.9697 555.6496
## 215 542.6036 521.4297 559.3157
## 216 543.4293 523.8594 557.6665
## 217 543.9761 534.1496 551.4794
## 218 545.5987 520.7796 573.8206
## 219 545.4529 519.2999 565.1699
## 220 546.2045 528.9343 561.0275
# Elastic Net
(obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 533.0081 506.2610 552.4747
## 202 534.0312 512.4848 552.9534
## 203 537.9979 452.9923 624.4150
## 204 535.4434 523.4675 547.8804
## 205 536.2787 515.8231 557.5828
## 206 536.8552 517.0005 553.6076
## 207 537.0983 522.1824 548.3408
## 208 536.8211 509.8375 550.7809
## 209 538.5978 519.8626 550.6556
## 210 538.6744 514.6277 558.3068
## 211 539.4128 510.5685 557.9729
## 212 541.5220 524.6141 558.9493
## 213 541.3769 514.1416 567.6234
## 214 542.3890 520.2676 559.8108
## 215 542.7453 515.4595 562.8872
## 216 543.6928 518.4017 562.3432
## 217 544.1691 531.4442 553.9332
## 218 546.0777 513.9268 582.5913
## 219 545.6694 511.9516 571.1424
## 220 546.4348 524.3211 566.2522
Let’s visualize the forecasts:
par(mfrow=c(1, 2))
plot(obj_svm, main="Support Vector Machine")
plot(obj_glmnet, main="Elastic Net")
The package also supports models from the caret
package,
which provides access to hundreds of machine learning methods. Here’s
how to use them:
y <- window(fpp2::goog200, start=100)
# Random Forest via caret
(obj_rf <- ahead::mlarchf(y, ml_method="ranger", h=20L))
## Point Forecast Lo 95 Hi 95
## 201 534.0896 527.2105 549.2119
## 202 540.8033 506.2501 676.2634
## 203 535.4651 525.0756 554.5970
## 204 536.0222 512.5343 561.4383
## 205 534.7667 513.1644 552.6010
## 206 536.8632 523.8917 577.7776
## 207 534.4352 523.6236 550.2029
## 208 540.1830 514.0565 591.3863
## 209 534.2758 525.7503 549.4391
## 210 539.9345 506.9003 611.3457
## 211 536.3750 518.6688 553.7465
## 212 538.4359 525.6782 618.8693
## 213 535.9024 517.3023 564.3173
## 214 538.7200 502.9221 580.7261
## 215 535.2246 524.7153 566.6379
## 216 536.5341 526.6102 586.9456
## 217 541.5191 520.9008 576.9608
## 218 541.0114 506.6884 575.3034
## 219 535.0434 527.2991 549.4959
## 220 544.4411 516.0606 639.3218
# Gradient Boosting via caret
(obj_glmboost <- ahead::mlarchf(y, ml_method="glmboost", h=20L))
## Point Forecast Lo 95 Hi 95
## 201 536.7254 524.0021 561.1012
## 202 537.2241 516.4879 576.7179
## 203 536.0656 523.2692 561.0059
## 204 536.3933 516.0502 562.9614
## 205 535.4724 513.0202 553.4245
## 206 539.5897 520.5008 594.8892
## 207 538.9055 509.9366 581.2882
## 208 541.8347 516.0594 595.3970
## 209 536.6201 523.1016 561.5969
## 210 546.8758 502.1903 678.2426
## 211 540.0372 509.7394 586.5420
## 212 537.0641 524.2422 568.2773
## 213 537.6658 517.8344 564.1649
## 214 537.6244 516.2911 565.5337
## 215 538.7529 521.8397 602.8892
## 216 541.5688 523.7738 668.0133
## 217 541.0185 523.9721 603.8553
## 218 542.2963 517.5191 584.6708
## 219 537.7170 524.9537 562.6072
## 220 557.4956 515.8869 751.6655
Visualizing the forecasts:
par(mfrow=c(1, 2))
plot(obj_rf, main="Random Forest (caret)")
plot(obj_glmboost, main="Gradient Boosting (caret)")
Looking at the simulation paths:
par(mfrow=c(1, 2))
matplot(obj_rf$sims, type='l', main="RF Simulation Paths")
matplot(obj_glmboost$sims, type='l', main="GBM Simulation Paths")
You can also customize both the mean forecasting model and the model for forecasting standardized residuals:
# Using RW + Theta method for mean and residuals along with SVM for volatility
(obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm,
predict_func = predict, h=20L,
mean_model=forecast::rwf,
model_residuals=forecast::thetaf))
## Point Forecast Lo 95 Hi 95
## 201 536.7001 524.0427 560.8327
## 202 538.0314 514.4173 583.0770
## 203 535.8869 523.5741 559.8654
## 204 537.4383 512.7645 569.5894
## 205 535.4157 513.2780 553.1443
## 206 540.9776 518.5788 605.5379
## 207 539.0655 509.5605 582.2023
## 208 543.2671 513.9431 604.0251
## 209 536.8391 522.7220 562.9176
## 210 548.6580 498.8522 695.3473
## 211 540.5741 508.3049 590.1480
## 212 537.6249 523.5462 571.9791
## 213 538.0943 516.8791 566.4469
## 214 538.1972 514.8895 568.6467
## 215 539.2980 521.1176 608.3200
## 216 542.4556 523.0970 680.1538
## 217 541.8127 523.3560 609.9607
## 218 543.2392 516.3275 589.2513
## 219 538.2161 524.4095 565.0870
## 220 559.7157 514.5775 769.9032
# Using Theta + Theta method for mean and residuals along with GLMNET for volatility
(obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet,
predict_func = predict, h=20L,
mean_model=forecast::thetaf,
model_residuals=forecast::thetaf))
## Point Forecast Lo 95 Hi 95
## 201 539.7210 521.7443 571.0620
## 202 536.1702 523.4416 562.4592
## 203 537.3178 525.2986 563.1430
## 204 537.1867 520.3207 560.1881
## 205 537.4467 516.6405 554.9163
## 206 540.9844 525.3315 584.7582
## 207 541.9160 515.3199 579.7897
## 208 544.2088 523.5243 589.6411
## 209 540.4673 529.1910 562.9970
## 210 550.2739 510.0462 671.8512
## 211 544.6859 517.0620 588.9166
## 212 542.2312 530.6821 571.1815
## 213 543.1535 527.9466 566.9051
## 214 543.4203 526.1855 568.1847
## 215 544.9848 530.6582 585.6213
## 216 547.9162 533.7306 662.0766
## 217 548.6211 532.6393 608.8480
## 218 549.8988 532.7408 588.1457
## 219 546.3406 535.2334 568.0789
## 220 564.5126 529.8817 740.9622
plot(obj_svm, main="SVM with RW + Theta")
plot(obj_glmnet, main="Elastic Net with Theta + Theta")
When using non-ARIMA models for the mean forecast, it’s important to check if the residuals of the mean forecasting model are centered and stationary:
# Diagnostic tests for residuals
print(obj_svm$resids_t_test)
##
## One Sample t-test
##
## data: resids
## t = 1.0148, df = 99, p-value = 0.3127
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.7180739 2.2214961
## sample estimates:
## mean of x
## 0.7517111
print(obj_svm$resids_kpss_test)
##
## KPSS Test for Level Stationarity
##
## data: resids
## KPSS Level = 7.5764e-76, Truncation lag parameter = 4, p-value = 0.1
print(obj_glmnet$resids_t_test)
##
## One Sample t-test
##
## data: resids
## t = 1.0992, df = 100, p-value = 0.2743
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.6460748 2.2513707
## sample estimates:
## mean of x
## 0.8026479
print(obj_glmnet$resids_kpss_test)
##
## KPSS Test for Level Stationarity
##
## data: resids
## KPSS Level = 0.26089, Truncation lag parameter = 4, p-value = 0.1