I don't often leave the comments anywhere, but this video is just excellent. The best one, that builds a good intuition behind the process and that describes the process in the most simple and yet efficient way. Thank you!
what to say i am short on words , impressive video , super intuitive understanding delivered in such a short video !! Please keep going , appreciate your work
Really appreciate your videos, literally had 2 doubts before the video. You literally mentioned that people have these 2 confusions. You are a great teacher!
There is one thing I don't really understand: You say the error term et-1 disappears and this is what the equations at 3:48 seems to indicate. However, the prediction Yt directly depends on et-1 and since the error et directly depends your prediction, you still have the error term et-1 in Yt+1. Did I miss something?
No problem at all! We can do exactly what you are saying to show that MA models are actually infinitely long AR models (just recursively plugging things in over and over like you did for a single step). The fun part is the same can be said for AR models - they are infinitely long MA models. You can almost think about it like those previous actual lags are cancelling out with each other to only leave the error term at the very end of the series.
You are energic and the content is complete and relevant. Just one thing made me uncomfortable, you should speak a bit slowlier and have some pause. To let the people the time to think about and understand the concepts. But really nice videos, thank you :- )
3:35 So when I predict for Y_t+1, how will I know the value of error e_t+1? I need it for predicting the value of tomorrow, yet I don't know it, and it's not based on the previous errors.
Happy to help! e_t+1 will never be known at t+1. That is the point of the random error. Your prediction will never be perfect because of that. You can use all he information up until then, but that e_t+1 accounts for the differences between what you predict and what actually happens.
I've read in textbooks that errors can be interpreted as difference of previous values (y_t - y_t-1) or even as difference of other time series (temperature variation in predicting lemonade demand variation). Are those interpretations wrong?
You can think about them that way! Once they occur, they are no longer unmeasurable. Without getting into too much of the math, they are essentially one long term effect minus some slightly less long term effect which gives us the short term effect. This actually equates to just the errors components (the only difference between the long term and slightly less long term model). Hope this helps!
I understand that e_(t-1) is the error between actual value and the model estimate one lag in the past. But what about e_t? Is that the error between our current actual value and estimate? How can you use current estimate in the error calculation when you need to know the error calculation to come up with that estimate? Isn't that impossibly recursive?
Every model has an e_t in it for the error of the current time period that we can never know. Even AR models have the same thing. That error is never known until after that time period is up.
Thank you Dr. Aric. I would like to see more videos from you. But, I also advise you to keep the videos empty from background music, as they are. It is a pitfall many content makers fall into.
You are not wrong! It is hard to find pure processes of either AR or MA models because of how complicated real world data is. I have seen MA processes in things that react in a short term, but have no real long term pattern. Things like economic indices for example. Not that they don't sometimes have long term pattern, but a lot of times they can be affected by short term things and then new short term things change them. Fun fact! A simple exponential smoothing model is an MA model on a single difference.
Not really! We do this iteratively. For example, your very first prediction of time point 2 might be just repeating time point 1. Then you have a single error to use to predict time point 3. Then you have two errors to predict time point 4, etc. This process is then iteratively optimized to find the "best" parameter to solve the MA equation. It's been shown that if you have a long time series, the original prediction/guess really doesn't impact anything. Hope this helps!
One question: for getting this model we need the values of e(t-1). So in practice how can we find e(1), e(2), etc.? Because after that we can do regression and find "w and thetta"
That's the beautiful part! Most software will take care of that for you. You don't have to create them yourself. The way the software does this is that it gives a prediction for the first time point (average for example) and now you have the first error (e(1)) and then you can use it to build a model for the second time point which gives you an error (e(2)) and it grows from there!
@@aramyesayan1979 Unfortunately, this is not one of those "do by hand" techniques since you have to fit a new model at each iteration to build out the best coefficients. Excel cannot really do this by default. The XLMiner package you can add to Excel can do this for you, but not base Excel. Otherwise, you will have to add a bunch of AR terms to try and account for this since AR and MA models are opposites.
thanks for the video. How can the dependence on previous errors completely disappear? The previous predicted value (t-1) depends on its preceding errors (t-2), but now we use that previous predicted value's error (t-1) to predict the next value (t) - but the previous predicted value's error is still indirectly affected by the (t-2) error... wouldn't that mean the dependency of the far-enough errors becomes marginally small, but doesn't completely disappear?
Hey Hans! Close! So the errors from one time period to the next are random and independent from each other. We essentially assume that missing today doesn't impact how much you miss tomorrow. So those effects do last, but will disappear the further in time that we go! Remember, that our observed errors are just estimates of the "actual" errors (think theoretical things we cannot see) which have all the nice properties we need. Does this help?
Thanks Aric for the this awesome explanation. I've one query though in this video. You said that the solid line represent the actual value of Y, but instead shouldn't it be dotted line because it seems that Y represents the forecast and the solid line represents the Historical Actuals?
Sorry for any confusion. Historical actuals are the solid line and the forecasts are the dashed. Y is actuals, but predicted Y would be dashed. I think the confusion might be what we are thinking of Y as, but the important piece is exactly what you got out of it as the solid line is true and dashed is predicted.
Please, help me with this: if you increase the order or MA, do you have to increase the order of the AR ..? Or you just use the AR(1) errors for any order of the MA you want?
Hey Eliezer! They are treated separately. There are signals for MA terms and for AR terms. Take a look at the ARIMA video where I talk about model selection and that can help!
i can't understand how is it different from AR? I mean, Y_t+1 values still depends on e_t-1 if you plug the previous equation (expressed in e_t). I have no doubt i'm wrong but what am i missing? i'm having a hard time,if you can explain it to me i would be very grateful!
You are correct that we still depend on things from the past, but the question is what we depend on. In AR models, we compare to previous values of Y. In MA models, we compare to previous values of error. Now, in the ARIMA model video it talks about the comparison between AR and MA models! That might help with the understanding as well.
thank you for the video. But, I do not know how to calculate from the data. Is this understanding correct? When using trained MA(q) model to predict time point n (n>>q). First, we predict time points from n-q to n-1 using time points from n-2q to n-q-1 by trained MA(q) model. Next, we calculate residuals between predicted and observed values(time points from n-q to n-1). Finally, We predict time point n using residuals. I thought while writing. When predicting from n-2q to n-q-1, we also need more previous predicted values. Therefore, it should be necessary to calculate previous values recursively. But, there is no predicted values to calculate the first error, it cannot be predicted?
So this is something that cannot be easily calculated by hand recursively unfortunately. For the first prediction we can use something simple like the overall average. That will give us the prediction for the first observation and therefore the residual from the first observation. Then we can build up the regression model. However, remember, the whole point of software isn't just building up the model recursively, but also optimizing the coefficients in the MA(q) model to be the "best" coefficients. In terms of predictions, the MA(q) model has a specific solution mathematically. Anything beyond q time points in the future get the prediction of the mean. That is because we run out of observations to go back and build off the error for. For example, for an MA(1) model, I can predict the next time point (t+1), but beyond that, my best guess of errors in 0 and therefore, all I am left with is the average.
All done through either maximum likelihood estimation or conditional least squares estimation. Either way, basically letting the computer find the optimal solution to get our estimates as close to the predictions as we can!
The current error really exists in all models! Every observation has some current error (unseen thing) that influences it value. Even the AR model has error in the current observation. When we talk about forecasting, we typically refer to past things that influence current observations. Hope this helps!
So it is definitely a little confusing of a term because it is NOT moving average smoothing where you take the average of a moving window of observations. MA models can be thought of as a weighted moving average of the past few forecast errors. I didn't name them ;-)
if the future error is completely random why does it even matter to incorporate past error into our model? Like shouldn't if be completely irrelevant we could just incorporate a random amount of error into our model and it would just be just as good no?
Close! Future error is completely random because we haven't observed it and we don't know where it will go. However, previous errors are observable and in time series anything that is observable in the past can be tested if it has correlation over time. In the MA model we can show that current observations are related to actual measurable errors. You can think of this as how we account for short term effects. You are correct in the intuition that errors don't last long which is why the effects don't last long either!
Any explanation of Statistics must be backed by numbers and demonstration using a Spreadsheet. PPTs have a nice fluff value to it and gives a false sense of understanding. I can't understand why many teachers are "theorizing" what is essentially useful only when done practically with some data.
We will have to agree to disagree on that one! A lot of times you have to explain a concept to someone in a quick and concise manner without the luxury of going through an example. These videos are meant to help people understand the concept quickly and concisely, not to go through an example of how to use them with software.
I don't often leave the comments anywhere, but this video is just excellent. The best one, that builds a good intuition behind the process and that describes the process in the most simple and yet efficient way. Thank you!
Glad you enjoyed it!
what to say i am short on words , impressive video , super intuitive understanding delivered in such a short video !! Please keep going , appreciate your work
I have been stuck on understanding SARIMA models for months and your videos have cleared everything up, thank you!!
Happy to help!
You are simply amazing at explaining such complex topics. Expecting more of these from you!
Thank you so much!
you are really great. I have never found any other videos like yours can explain ARMA in such an excellent way.
Video from 4 years ago, so idk if you read it man, but just so you know, I sincerely consider you a genuinely wonderful person for doing this series
I do try to still check comments. Thank you for the kind words!
Really appreciate your videos, literally had 2 doubts before the video. You literally mentioned that people have these 2 confusions. You are a great teacher!
What are those 2 confusions that u had ?
these are the best time series short lectures I have found on TH-cam, thanks for being here
Love the energy! Thanks for bringing the electricity to what I would have otherwise thought was a dry topic.
Best channel I've seen for intuition
Thank you very much!
Seeing myself learning this in 5 minutes is still a shock
Thank you❤
Happy to hear that!
best video in youtube for MA, one really outstanding style of you that you keep it short. please do more video.
The whole playlist was so useful! Thanks a lot
Great video!! stats world need more of these simple, straightforward explanations
hey your method of teaching is really good . plz upload more videos on time series analysis
Your videos on time series are so well done. And in 5 mins! amazing. Keep it up.
There is one thing I don't really understand: You say the error term et-1 disappears and this is what the equations at 3:48 seems to indicate. However, the prediction Yt directly depends on et-1 and since the error et directly depends your prediction, you still have the error term et-1 in Yt+1. Did I miss something?
No problem at all! We can do exactly what you are saying to show that MA models are actually infinitely long AR models (just recursively plugging things in over and over like you did for a single step). The fun part is the same can be said for AR models - they are infinitely long MA models. You can almost think about it like those previous actual lags are cancelling out with each other to only leave the error term at the very end of the series.
Looking forward to the next video!
You are energic and the content is complete and relevant. Just one thing made me uncomfortable, you should speak a bit slowlier and have some pause. To let the people the time to think about and understand the concepts. But really nice videos, thank you :- )
the concepts explained are absolutely clear!!
3:35 So when I predict for Y_t+1, how will I know the value of error e_t+1? I need it for predicting the value of tomorrow, yet I don't know it, and it's not based on the previous errors.
Happy to help! e_t+1 will never be known at t+1. That is the point of the random error. Your prediction will never be perfect because of that. You can use all he information up until then, but that e_t+1 accounts for the differences between what you predict and what actually happens.
Thanks a ton! You're a genius at explaining!!
you're a king and my professor should learn some didactics by watching your videos
Everyone has different teaching styles :-), but thank you!
Great videos! Super insightful thank you
You really deserve Nobel prize 🏆
Please do more videos on stochastic series! Really good videos
Very professional, well explained videos.
Hello Aric!!! I have subscribed, excellent explanation
I love this channel!!!!
I've read in textbooks that errors can be interpreted as difference of previous values (y_t - y_t-1) or even as difference of other time series (temperature variation in predicting lemonade demand variation). Are those interpretations wrong?
Nice video! Want a little bit more intuition as to what does lagged errors capture that is explaining Yt. Are they some hidden factors?
You can think about them that way! Once they occur, they are no longer unmeasurable. Without getting into too much of the math, they are essentially one long term effect minus some slightly less long term effect which gives us the short term effect. This actually equates to just the errors components (the only difference between the long term and slightly less long term model). Hope this helps!
I understand that e_(t-1) is the error between actual value and the model estimate one lag in the past. But what about e_t? Is that the error between our current actual value and estimate? How can you use current estimate in the error calculation when you need to know the error calculation to come up with that estimate? Isn't that impossibly recursive?
Every model has an e_t in it for the error of the current time period that we can never know. Even AR models have the same thing. That error is never known until after that time period is up.
So when is that ARMA model video coming out 🥺
It has just been posted!
Thank you Dr. Aric. I would like to see more videos from you. But, I also advise you to keep the videos empty from background music, as they are. It is a pitfall many content makers fall into.
An MA process seems to be kind a Noumenon, quite a tricky thing. Are there some examples of pure MA process?
You are not wrong! It is hard to find pure processes of either AR or MA models because of how complicated real world data is.
I have seen MA processes in things that react in a short term, but have no real long term pattern. Things like economic indices for example. Not that they don't sometimes have long term pattern, but a lot of times they can be affected by short term things and then new short term things change them.
Fun fact! A simple exponential smoothing model is an MA model on a single difference.
@@AricLaBarr Thank You, Aric. Both for the answer and for your videos )
Very well explained!
Glad it was helpful!
So before you can fit a MA model, don't you need a separate model to make the predictions which result in the errors?
Not really! We do this iteratively. For example, your very first prediction of time point 2 might be just repeating time point 1. Then you have a single error to use to predict time point 3. Then you have two errors to predict time point 4, etc. This process is then iteratively optimized to find the "best" parameter to solve the MA equation. It's been shown that if you have a long time series, the original prediction/guess really doesn't impact anything.
Hope this helps!
excellent explanation! hope you make videos more often
Thank you! Trying to make more, but the job gets in the way sometimes :-)
@@AricLaBarr oh I get it haha
One question: for getting this model we need the values of e(t-1). So in practice how can we find e(1), e(2), etc.? Because after that we can do regression and find "w and thetta"
That's the beautiful part! Most software will take care of that for you. You don't have to create them yourself. The way the software does this is that it gives a prediction for the first time point (average for example) and now you have the first error (e(1)) and then you can use it to build a model for the second time point which gives you an error (e(2)) and it grows from there!
@@AricLaBarr Тhank You, but if I want to do this using Excel, I need the series of e(t), so there is way to find the values of e(t)?
@@aramyesayan1979 Unfortunately, this is not one of those "do by hand" techniques since you have to fit a new model at each iteration to build out the best coefficients. Excel cannot really do this by default. The XLMiner package you can add to Excel can do this for you, but not base Excel. Otherwise, you will have to add a bunch of AR terms to try and account for this since AR and MA models are opposites.
thanks for the video. How can the dependence on previous errors completely disappear? The previous predicted value (t-1) depends on its preceding errors (t-2), but now we use that previous predicted value's error (t-1) to predict the next value (t) - but the previous predicted value's error is still indirectly affected by the (t-2) error... wouldn't that mean the dependency of the far-enough errors becomes marginally small, but doesn't completely disappear?
Hey Hans! Close! So the errors from one time period to the next are random and independent from each other. We essentially assume that missing today doesn't impact how much you miss tomorrow. So those effects do last, but will disappear the further in time that we go! Remember, that our observed errors are just estimates of the "actual" errors (think theoretical things we cannot see) which have all the nice properties we need. Does this help?
Thanks Aric for the this awesome explanation. I've one query though in this video. You said that the solid line represent the actual value of Y, but instead shouldn't it be dotted line because it seems that Y represents the forecast and the solid line represents the Historical Actuals?
Sorry for any confusion. Historical actuals are the solid line and the forecasts are the dashed. Y is actuals, but predicted Y would be dashed. I think the confusion might be what we are thinking of Y as, but the important piece is exactly what you got out of it as the solid line is true and dashed is predicted.
@@AricLaBarr Thanks Aric
This is amazing
I just finished your playlist and subscribed, too bad there aren't more videos about ARIMA .. Otherwise really appreciated
They are on the way! Full time job takes up my time :-)
Please, help me with this: if you increase the order or MA, do you have to increase the order of the AR ..? Or you just use the AR(1) errors for any order of the MA you want?
Hey Eliezer! They are treated separately. There are signals for MA terms and for AR terms. Take a look at the ARIMA video where I talk about model selection and that can help!
Instead of saying that a moving average process adjusts to a forecast error, you can say that it adjusts to latest information to correct its error.
Great lectures
i can't understand how is it different from AR? I mean, Y_t+1 values still depends on e_t-1 if you plug the previous equation (expressed in e_t). I have no doubt i'm wrong but what am i missing? i'm having a hard time,if you can explain it to me i would be very grateful!
You are correct that we still depend on things from the past, but the question is what we depend on. In AR models, we compare to previous values of Y. In MA models, we compare to previous values of error.
Now, in the ARIMA model video it talks about the comparison between AR and MA models! That might help with the understanding as well.
great explanations thank you
Next video on random walk and mcmc method please
thank you for the video. It is very useful
thank you for the video.
But, I do not know how to calculate from the data.
Is this understanding correct?
When using trained MA(q) model to predict time point n (n>>q).
First, we predict time points from n-q to n-1 using time points from n-2q to n-q-1 by trained MA(q) model.
Next, we calculate residuals between predicted and observed values(time points from n-q to n-1).
Finally, We predict time point n using residuals.
I thought while writing.
When predicting from n-2q to n-q-1, we also need more previous predicted values.
Therefore, it should be necessary to calculate previous values recursively.
But, there is no predicted values to calculate the first error, it cannot be predicted?
So this is something that cannot be easily calculated by hand recursively unfortunately. For the first prediction we can use something simple like the overall average. That will give us the prediction for the first observation and therefore the residual from the first observation. Then we can build up the regression model. However, remember, the whole point of software isn't just building up the model recursively, but also optimizing the coefficients in the MA(q) model to be the "best" coefficients.
In terms of predictions, the MA(q) model has a specific solution mathematically. Anything beyond q time points in the future get the prediction of the mean. That is because we run out of observations to go back and build off the error for. For example, for an MA(1) model, I can predict the next time point (t+1), but beyond that, my best guess of errors in 0 and therefore, all I am left with is the average.
long live and prosper!
Appreciate for ur lectures
How are the coefficients of the error terms determined? Is there any rule? Are they given arbitrarily?
All done through either maximum likelihood estimation or conditional least squares estimation. Either way, basically letting the computer find the optimal solution to get our estimates as close to the predictions as we can!
You are great. Thank you!!!
so in the MA models is it only depends on the past errors or past errors plus the current errors?
The current error really exists in all models! Every observation has some current error (unseen thing) that influences it value. Even the AR model has error in the current observation. When we talk about forecasting, we typically refer to past things that influence current observations. Hope this helps!
@@AricLaBarr thank you for the reply
super useful tysm~!!
Waiting for ARIMA and seasonal ARIMA
Just posted the ARIMA one! Seasonal ARIMA is next in line!
@@AricLaBarr really nice.
What is the meaning of average in MA model?
So it is definitely a little confusing of a term because it is NOT moving average smoothing where you take the average of a moving window of observations. MA models can be thought of as a weighted moving average of the past few forecast errors. I didn't name them ;-)
if the future error is completely random why does it even matter to incorporate past error into our model? Like shouldn't if be completely irrelevant we could just incorporate a random amount of error into our model and it would just be just as good no?
Close! Future error is completely random because we haven't observed it and we don't know where it will go. However, previous errors are observable and in time series anything that is observable in the past can be tested if it has correlation over time. In the MA model we can show that current observations are related to actual measurable errors. You can think of this as how we account for short term effects. You are correct in the intuition that errors don't last long which is why the effects don't last long either!
amazing
Thank you!
omfg i finally understand MA processes...
me too, finally from this video
next is LSTM i guess :)
You're good 👍
you got me at hallelujah
🙏 Sir from India...
Thank you!
Any explanation of Statistics must be backed by numbers and demonstration using a Spreadsheet. PPTs have a nice fluff value to it and gives a false sense of understanding. I can't understand why many teachers are "theorizing" what is essentially useful only when done practically with some data.
We will have to agree to disagree on that one! A lot of times you have to explain a concept to someone in a quick and concise manner without the luxury of going through an example.
These videos are meant to help people understand the concept quickly and concisely, not to go through an example of how to use them with software.
hallelujah, my errors are gone!!!