Hi Egor, thank you so much for all the work you've put into this! I completed your time series crash course in just two days, and I found it incredibly helpful and well-explained. I've also noticed that your TH-cam channel is packed with fascinating data science content-I can't wait to dive into all of it! Additionally, I wanted to ask if you could recommend any materials for a deeper dive into time series analysis and more advanced and recent techniques. Thanks again!
Hey Egor, great video! Watched the entirety of the crash course for an upcoming interview! Please continue making youtube videos, they've been extremely helpful.
Hi Egor, just went through your time series playlist. Appreciate the absolutely great work! Hope you get even more subscribers soon, well deserved. Would be nice if you could drop some machine learning videos :)
This is the best Time Series course I have found on YT, thanks a million for your sharing! Would you please consider to add another episode of the Prophet model from FB?
Hi Kelly, there is no real reason its just an arbitrary number. The goal is to let the model find which fourier terms are the most valuable for the model. So say if sin_3 is not important, its coefficient would be close to 0. If cos_5 is important its coefficient will be bigger. I am relying on the model to find this relationship. This is just one way of doing it, but I found it's one of the best and simplest. You can also try to find the real harmonic frequencies using a FFT, but that's quite tough and hasn't really worked for me in the past. Does that make sense?
@@egorhowell Hi! Thanks for explaining! I see, yeah, that makes sense! :) I have two more questions that I'm currently doing research on/testing, and thought it's worth asking you as well :) 1. The case I'm working with has weekly data for the past 4 years or so. In the first iteration I've set the Period P = 52. How do I add to the model that the period is rather 52.143 than 52? Just by dividing with 52.143 in the sine and cosine terms? Like this: data[f'fourier_sin_order_{order}'] = np.sin(2 * np.pi * order * data['week_num'] / 52.143) data[f'fourier_cos_order_{order}'] = np.cos(2 * np.pi * order * data['week_num'] / 52.143) 2. I would like to, in addition to the Fourier features, also add other exogenous features. Am I doing that in the right way by adding the Fourier terms to X = ... and then adding any additional variables to exogenous=... ? Like this: model = pm.auto_arima(y=..., X=train[fourier_features], exogenous= train[exogenous_features], seasonal = False, ...) Thank you!!
1. Exactly that, but make sure your target and lagged variables are weekly time-indexed. Also, remove the P=52 from the pm.auto_arima function! 2. ah no, fourier series are exogenous and this is what X refers to. reading the auto_arima docs: """ y : array-like or iterable, shape=(n_samples,) The time-series to which to fit the ARIMA estimator. This may either be a Pandas Series object (statsmodels can internally use the dates in the index), or a numpy array. This should be a one-dimensional array of floats, and should not contain any np.nan or np.inf values.""" """ X : array-like, shape=[n_obs, n_vars], optional (default=None) An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an ARIMA is fit on exogenous features, it must be provided exogenous features for making predictions.""" So X is the exogenous features. So pass into X your fourier features and your other exog features as well! Make sure all you exog features have the same time index as well, should be weekly given your context!
hey you are doing a great job, i see you are not getting much credit for your work but thankyou so much it helped me a lot. soon you will be a big youtuber💟
Hi Egor, thank you so much for all the work you've put into this! I completed your time series crash course in just two days, and I found it incredibly helpful and well-explained. I've also noticed that your TH-cam channel is packed with fascinating data science content-I can't wait to dive into all of it!
Additionally, I wanted to ask if you could recommend any materials for a deeper dive into time series analysis and more advanced and recent techniques.
Thanks again!
Hi Roberto! Glad you liked the course :)
This book is a gold mine and covers pretty much everything you need: otexts.com/fpp3/
Hey Egor, great video! Watched the entirety of the crash course for an upcoming interview! Please continue making youtube videos, they've been extremely helpful.
Thank you!!!
Hey man, thanks for the playlist. I wish you could make some videos about time series forecasting with models such as LSTM. Good luck with your work.
hey, yeah i though about extending the series. But my content has changed a bit
I echo others appreciation for this series a wonderful job young man, thank you
Glad you enjoy it!
It was a really great tutorial, I also saw other tutorials from your channel and learned so much, thanks
Happy to hear that!
Hi Egor, just went through your time series playlist. Appreciate the absolutely great work! Hope you get even more subscribers soon, well deserved. Would be nice if you could drop some machine learning videos :)
Thanks for feedback! A couple of people have suggested this, so I have it in the works!
Wow really a great video especially the intuitions you have provided are fantastic. Thank you!!
Glad you enjoyed it!
This is the best Time Series course I have found on YT, thanks a million for your sharing! Would you please consider to add another episode of the Prophet model from FB?
Thanks! And yeah sure, I will probably do an article on it in the future!
@@egorhowell That would be superb! Looking forward to your new article!
Thanks Alan :)
Hi! Thank you so much, great tutorial! I have one question. What's the reasoning behind choosing 9 sin-terms and 9 cos-terms?
Are you doing a search over different values and choosing the highest number of terms before the AICC value start increasing? :)
Hi Kelly, there is no real reason its just an arbitrary number. The goal is to let the model find which fourier terms are the most valuable for the model. So say if sin_3 is not important, its coefficient would be close to 0. If cos_5 is important its coefficient will be bigger. I am relying on the model to find this relationship.
This is just one way of doing it, but I found it's one of the best and simplest. You can also try to find the real harmonic frequencies using a FFT, but that's quite tough and hasn't really worked for me in the past.
Does that make sense?
@@egorhowell Hi! Thanks for explaining! I see, yeah, that makes sense! :)
I have two more questions that I'm currently doing research on/testing, and thought it's worth asking you as well :)
1. The case I'm working with has weekly data for the past 4 years or so. In the first iteration I've set the Period P = 52. How do I add to the model that the period is rather 52.143 than 52? Just by dividing with 52.143 in the sine and cosine terms? Like this:
data[f'fourier_sin_order_{order}'] = np.sin(2 * np.pi * order * data['week_num'] / 52.143)
data[f'fourier_cos_order_{order}'] = np.cos(2 * np.pi * order * data['week_num'] / 52.143)
2. I would like to, in addition to the Fourier features, also add other exogenous features. Am I doing that in the right way by adding the Fourier terms to X = ... and then adding any additional variables to exogenous=... ? Like this:
model = pm.auto_arima(y=..., X=train[fourier_features], exogenous= train[exogenous_features], seasonal = False, ...)
Thank you!!
1. Exactly that, but make sure your target and lagged variables are weekly time-indexed. Also, remove the P=52 from the pm.auto_arima function!
2. ah no, fourier series are exogenous and this is what X refers to.
reading the auto_arima docs:
""" y : array-like or iterable, shape=(n_samples,)
The time-series to which to fit the ARIMA estimator. This may either be a Pandas Series object (statsmodels can internally use the dates in the index), or a numpy array. This should be a one-dimensional array of floats, and should not contain any np.nan or np.inf values."""
""" X : array-like, shape=[n_obs, n_vars], optional (default=None)
An optional 2-d array of exogenous variables. If provided, these variables are used as additional features in the regression operation. This should not include a constant or trend. Note that if an ARIMA is fit on exogenous features, it must be provided exogenous features for making predictions."""
So X is the exogenous features. So pass into X your fourier features and your other exog features as well!
Make sure all you exog features have the same time index as well, should be weekly given your context!
@@egorhowell ok, thank you so much! :)
hey you are doing a great job, i see you are not getting much credit for your work but thankyou so much it helped me a lot. soon you will be a big youtuber💟
Hi thanks for the really kind feedback! I just enjoying making useful content, doesn't really matter where it takes me :)
Like the flex about your physics degree 😜
haha thanks :)
why don't you just grab the spectrum with FFT, filter HF noise, and use the remaining harmonics ?
Hey that is another option you are right! However, its a bit more fiddly and involved from personal experience!