mlarch.Rmd
Probabilistic stock forecasting often relies on parametric models
like ARIMA for the mean and GARCH for volatility. The
mlarchf
function in the ahead
package offers a
flexible hybrid alternative to ARMA-GARCH by combining machine learning
approaches with ARCH effects.
The model decomposes the time series into two components:
where:
The key innovation is using machine learning methods
and conformal prediction to model the volatility component, allowing for
more flexible and potentially more accurate volatility forecasts than
traditional GARCH models. The function supports various machine learning
methods through parameters fit_func
and
predict_func
as in other ahead
models, and
through the caret
package.
The forecasting process involves:
auto.arima
)(basically a supervised regression of squared residuals on their lags) is a good approximation of the latent conditional volatility
This new approach combines the interpretability of traditional time series models with the flexibility of machine learning, while maintaining proper uncertainty quantification through conformal prediction.
Let’s start with a simple example using the Google stock price data
from the fpp2
package:
y <- fpp2::goog200
# Default model for volatility (Ridge regression for volatility)
(obj_ridge <- ahead::mlarchf(y, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 534.2541 512.7005 552.3607
## 202 534.6423 502.2771 557.7232
## 203 531.6017 511.4929 553.6944
## 204 536.8212 506.3934 562.6331
## 205 537.1214 515.9948 556.2671
## 206 537.2112 513.0587 558.7980
## 207 537.8388 503.1394 558.0224
## 208 539.0939 516.2791 559.2532
## 209 539.7927 518.1542 561.0321
## 210 538.6715 511.1846 562.7483
## 211 540.9258 501.6186 565.0454
## 212 541.2588 517.6220 564.5185
## 213 541.4309 508.8563 563.8305
## 214 542.2194 514.9613 564.8986
## 215 543.3127 516.1817 564.1709
## 216 544.5975 524.2303 562.8741
## 217 545.2311 522.7227 569.1356
## 218 546.4706 526.8154 564.1152
## 219 547.0501 513.2194 571.0685
## 220 546.4400 519.9851 569.2394
The package supports various machine learning methods for volatility modeling. Here are some examples:
# Random Forest
(obj_rf <- ahead::mlarchf(y, fit_func = randomForest::randomForest,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 534.0159 520.9542 543.1924
## 202 537.2902 490.3225 587.6412
## 203 531.6931 526.9703 540.6487
## 204 542.5556 502.2263 590.5716
## 205 535.8584 527.4970 542.2554
## 206 539.3621 498.8091 579.2929
## 207 537.4664 522.6626 548.8695
## 208 542.0397 508.8091 575.6179
## 209 539.0960 532.4078 548.1384
## 210 539.6872 485.6650 580.9918
## 211 539.9266 523.4049 547.2282
## 212 543.0664 507.7779 578.5372
## 213 540.8806 530.2707 546.8225
## 214 543.9351 479.6184 587.3773
## 215 542.3772 533.9899 549.5630
## 216 547.3840 517.4481 581.2031
## 217 544.2022 530.3728 553.5204
## 218 550.0734 512.6420 586.6612
## 219 545.5816 532.6964 553.7377
## 220 546.9096 510.5368 570.9965
# Support Vector Machine
(obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 533.1339 523.3062 541.6084
## 202 534.1918 510.2528 553.2471
## 203 532.5771 521.6773 545.4500
## 204 536.6727 508.9950 561.2681
## 205 536.4719 520.1975 552.7401
## 206 536.9900 516.5820 555.7390
## 207 537.5020 511.5053 554.4331
## 208 538.7870 519.8940 555.9879
## 209 539.3159 520.2680 555.5655
## 210 538.5694 517.2322 557.7531
## 211 540.6991 510.2075 560.7600
## 212 540.9974 522.4648 560.0051
## 213 541.3215 514.9596 560.0394
## 214 542.1177 520.7079 560.3652
## 215 542.5232 520.7394 561.3015
## 216 544.2347 528.5013 558.9817
## 217 544.8473 524.9007 564.6974
## 218 545.9705 530.3819 562.0458
## 219 546.6228 519.6270 565.9742
## 220 546.2731 523.5324 565.2891
# Elastic Net
(obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet,
predict_func = predict, h=20L, B=500L))
## Point Forecast Lo 95 Hi 95
## 201 534.6730 508.7761 556.4285
## 202 534.4628 505.3801 555.2027
## 203 531.5346 510.7405 554.3804
## 204 536.7864 506.7732 562.2466
## 205 537.1356 515.8700 556.4073
## 206 537.2099 513.0769 558.7794
## 207 537.8421 503.0666 558.0698
## 208 539.0961 516.2570 559.2768
## 209 539.7957 518.1246 561.0673
## 210 538.6718 511.1481 562.7808
## 211 540.9283 501.5659 565.0819
## 212 541.2608 517.5914 564.5526
## 213 541.4322 508.8123 563.8629
## 214 542.2208 514.9249 564.9314
## 215 543.3147 516.1459 564.2017
## 216 544.6002 524.2048 562.9022
## 217 545.2338 522.6941 569.1714
## 218 546.4740 526.7915 564.1431
## 219 547.0533 513.1757 571.1051
## 220 546.4414 519.9498 569.2724
Let’s visualize the forecasts:
par(mfrow=c(1, 2))
plot(obj_svm, main="Support Vector Machine")
plot(obj_glmnet, main="Elastic Net")
The package also supports models from the caret
package,
which provides access to hundreds of machine learning methods. Here’s
how to use them:
y <- window(fpp2::goog200, start=100)
# Random Forest via caret
(obj_rf <- ahead::mlarchf(y, ml_method="ranger", h=20L))
## Point Forecast Lo 95 Hi 95
## 201 532.9162 521.2100 545.0998
## 202 542.0707 513.7720 606.8034
## 203 536.6156 525.8142 559.7745
## 204 537.8822 521.1372 580.7287
## 205 537.1546 509.4761 565.7132
## 206 537.8189 517.3974 572.0399
## 207 535.0668 525.2457 564.1666
## 208 539.2166 516.1806 579.3233
## 209 536.6969 524.8034 561.7561
## 210 536.2132 520.7161 570.8188
## 211 536.2587 525.6450 561.5486
## 212 538.5170 520.3597 577.5364
## 213 536.3716 526.6066 574.9143
## 214 539.2696 512.3047 593.9620
## 215 536.0739 526.4491 570.0661
## 216 538.3305 521.9893 574.8221
## 217 542.6543 520.9201 601.0671
## 218 540.3616 516.5601 567.7309
## 219 538.6467 524.4831 587.5485
## 220 537.7820 525.1554 565.3565
# Gradient Boosting via caret
(obj_glmboost <- ahead::mlarchf(y, ml_method="glmboost", h=20L))
## Point Forecast Lo 95 Hi 95
## 201 534.2694 518.8609 558.4107
## 202 541.2883 514.8659 583.6918
## 203 539.6626 523.8154 559.4926
## 204 539.1105 519.7944 584.2025
## 205 535.8913 516.3941 560.6932
## 206 539.4092 516.3546 589.0730
## 207 540.7346 519.2800 594.9133
## 208 540.6068 517.8368 585.0502
## 209 540.0550 521.5931 565.6433
## 210 538.6147 517.7842 569.1785
## 211 539.5135 523.7022 584.3756
## 212 542.4832 521.8640 613.1290
## 213 540.5787 519.1449 601.2747
## 214 540.8575 510.3267 603.5387
## 215 539.7381 525.0678 591.3527
## 216 544.0127 514.9348 588.6763
## 217 541.3015 525.6672 594.0188
## 218 539.8837 520.8755 586.3994
## 219 544.7746 520.9124 586.5594
## 220 541.1185 522.0672 588.7918
Visualizing the forecasts:
par(mfrow=c(1, 2))
plot(obj_rf, main="Random Forest (caret)")
plot(obj_glmboost, main="Gradient Boosting (caret)")
Looking at the simulation paths:
par(mfrow=c(1, 2))
matplot(obj_rf$sims, type='l', main="RF Simulation Paths")
matplot(obj_glmboost$sims, type='l', main="GBM Simulation Paths")
You can also customize both the mean forecasting model and the model for forecasting standardized residuals:
# Using RW + Theta method for mean and residuals along with SVM for volatility
(obj_svm <- ahead::mlarchf(y, fit_func = e1071::svm,
predict_func = predict, h=20L,
mean_model=forecast::rwf,
model_residuals=forecast::thetaf))
## Point Forecast Lo 95 Hi 95
## 201 534.2761 518.8773 558.5154
## 202 542.6780 512.5850 591.2690
## 203 539.3825 524.1203 559.0725
## 204 540.7554 517.3005 595.8953
## 205 535.8410 516.5428 560.4248
## 206 540.8266 513.6361 599.7767
## 207 540.9625 519.0747 595.3212
## 208 541.9211 515.9533 592.4186
## 209 540.4952 521.1336 567.1889
## 210 539.4892 516.2908 573.8738
## 211 540.0182 523.2295 587.9505
## 212 543.6659 520.8964 621.9065
## 213 541.2562 518.1388 606.5975
## 214 541.7677 508.1439 610.7653
## 215 540.4008 524.5452 596.4184
## 216 545.2035 513.4671 593.9008
## 217 542.0959 525.1215 599.6508
## 218 540.6441 520.0346 591.9502
## 219 545.9244 520.0419 591.3528
## 220 541.9762 521.2350 594.5949
# Using Theta + Theta method for mean and residuals along with GLMNET for volatility
(obj_glmnet <- ahead::mlarchf(y, fit_func = glmnet::cv.glmnet,
predict_func = predict, h=20L,
mean_model=forecast::thetaf,
model_residuals=forecast::thetaf))
## Point Forecast Lo 95 Hi 95
## 201 535.8135 514.1987 571.9909
## 202 538.5835 521.6369 565.5215
## 203 541.3194 524.8385 563.3388
## 204 539.5938 524.8692 578.2095
## 205 537.8688 521.4224 561.3728
## 206 540.9987 524.2488 578.8469
## 207 543.3177 526.9597 589.3057
## 208 543.3685 524.5164 583.1396
## 209 543.1559 525.8245 567.4223
## 210 542.4533 527.2598 570.6816
## 211 544.0698 530.7752 584.4625
## 212 546.8171 528.3885 605.0747
## 213 546.1443 529.2912 599.9006
## 214 546.5470 522.5503 601.7464
## 215 546.1457 534.3143 593.9578
## 216 550.3745 533.2594 590.1594
## 217 548.5356 535.3032 595.9426
## 218 547.7789 530.5011 589.1992
## 219 552.5480 530.9739 591.8940
## 220 549.5759 534.1405 592.6026
plot(obj_svm, main="SVM with RW + Theta")
plot(obj_glmnet, main="Elastic Net with Theta + Theta")
When using non-ARIMA models for the mean forecast, it’s important to check if the residuals of the mean forecasting model are centered and stationary:
# Diagnostic tests for residuals
print(obj_svm$resids_t_test)
##
## One Sample t-test
##
## data: resids
## t = 1.0148, df = 99, p-value = 0.3127
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.7180739 2.2214961
## sample estimates:
## mean of x
## 0.7517111
print(obj_svm$resids_kpss_test)
##
## KPSS Test for Level Stationarity
##
## data: resids
## KPSS Level = 7.5912e-76, Truncation lag parameter = 4, p-value = 0.1
print(obj_glmnet$resids_t_test)
##
## One Sample t-test
##
## data: resids
## t = 1.0992, df = 100, p-value = 0.2743
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.6460748 2.2513707
## sample estimates:
## mean of x
## 0.8026479
print(obj_glmnet$resids_kpss_test)
##
## KPSS Test for Level Stationarity
##
## data: resids
## KPSS Level = 0.26089, Truncation lag parameter = 4, p-value = 0.1