One of the best ARIMA implementation tutorials I have seen. I’m a bit frustrated I found it after I had used ARIMA for a project. I can’t even tell you how much time I had wasted going online and on forums, trying to understand how it works. But hey, now that I learned it the hard way it better be sticking. 😂 Appreciate it!
I have come across many blogs and videos to understand the time series process, but I didn't get a clear picture. However, this video gave me a clear understanding of the process. Really great work! Much appreciated.
@@learnerea. Please, can you make a video on how to use the transformed data especially one gotten using log, sqrt and shift. I have been trying to figure that out. The area that got me confused is how to transform the data back to the original format. Thank you
Incredible video, thank you! I kept trying to train my model with the Differenced data and was not getting good results but I caught my error because of this video.
Hello sir, I don't know whats your mistake But i got desired results using arima model at time 1:13;45 Instead of the line at the bottom i got desired results. And I followed all things teached by you.
So informative. I do not see a relation between the transformation (Log&Sqrt&Shift) which makes the data stationary and the ARIMA model you build. I'm so confused at this step. I tried with my data and noted that the ShiftDiff transformation makes my data stationary but when it comes to building the model, it does not fit well. Thank in advance.
Thank you so much for this vedio, studying since last 3 years, taken some expensive courses, this is the best explanation, kept me motivated to explore and learn throughout the vedio...let us know how we can support you to make more learning vedio thanks.
If you use the time shift method, d will be the interval for the shift. What happens if you use any other method like the log or square root? What will d be?
as you said you were trying to keep it to the beginner's level that's why it's kind of more understandable to the smallest degree possible, except you just got it wrong about the model, it's not ARIMA model that is working bad, it's you trying to predict a whole range of values with the same training data. it means, it'd work well on the first few values but not for all. you have to use the walk forward variation, that is basically to update you training set each time you predict a new value, Thats my idea. and thank you for the good video.
Your content is too good. I am not able to understand why yiu have such a low views on this video. One suggesgion please make the thumnail little bit eye catchy.
Thanks for the video. There is a mior mistake in ADF I noticed is that you cannot accept the null hypothesis and you can only reject the null hypothesis.
Thanks very much for this video, really helpful. However, I have a question, the ARIMAX can be used to implement panel dataset right? do you have a tutorial on the implementation of an ARIMAX?
Basic Question...Why did we run the model on original set and towards the end you mentioned on running model on altered data set basically diff/square root ?
Hi, Content is very good and very well explained. thanks for sharing it. Can you please help me understand that we have tried to identify the stationarity but did not use it in modelling. and even identifying the stationarity was not concluded. we did not get desired results.
Thank you very much for watching it. Yes, that was primarily because it was a beginner level and hence we did not want to spend a lot of time in reverting it back. Certainly we will make another one where we conclude and utilize the stationarity.
i have sales data consisting of time period and other features including different schemes as features, almost 7-8 those are active on some months so basically they are categorial variables containing 0 or 1. Should i go ahead with Armia for forecasting, if yes then how to consider those categorical variable
In a sarima model while doing an analysis i found that for d=0,D=1(as i did seasonal differencing one and no non-seasonal differencing) prediction is fitting whole data except initial 22 values(predicting almost 0 values for initial 22 values) which is the seasonality of my data. can you explain why is this happening? I hope you got my question
Assuming you are using the same data as in video, please share your code at learnerea.edu@gmail.com so that we could have a view.. and guide you more specifically.. include the data as well if it's different from the video
Hi, I was using your tutorial to learn how to implement ARIMA models. I then went about and implemented my own with some of my own data that I'm using for a school project. However, while my model fit my data very well, my forecasts are flat and they're strange. Could you help me in any way?
brother your work is extremely helpful ,brother i looked for the rolling statistics video link but couldn't find it please share it then thanks in anticipation
Glad it was helpful. the below one is on linear regression - th-cam.com/video/IigoyVON0eM/w-d-xo.html here is a problem we solved using the regression and other best fit models - th-cam.com/video/2YAheiIHNzI/w-d-xo.html I recommend you to have a look at the whole datascience playlist - th-cam.com/play/PL4GjoPPG4VqOmyh7hQ730evtLaz04LwSf.html
Hellow Dr thanks a lot for sharing the information and teanch us . I have a little question with your permission the question is : if we estimate our model "ARIMA" and found that there is autocorolation between the riseduals the the model ...... how can we fix this problem ? thanks again 🤗🙏🙏🧡❤
There are several potential approaches you can take if you find autocorrelation in the residuals of your ARIMA model. Here are a few options you could consider: Adding additional AR or MA terms to the model: If the autocorrelation is due to a pattern that has not been captured by the current model, adding additional terms may help to capture this pattern and improve model performance. Differencing the data: If the autocorrelation is due to a trend in the data, differencing the data may help to remove this trend and improve model performance. Using a different model: If the ARIMA model is not suitable for the data, you may need to consider using a different model altogether. For example, a seasonal ARIMA (SARIMA) model may be more appropriate for data with seasonal patterns. Modeling the residuals: If none of the above approaches work, you can try modeling the residuals as a separate time series. This can help to capture any remaining patterns in the data that are not accounted for by the primary model.
I have a situation where I can make reasonable training and predictions with the original (non-stationary) data. When I transform the data, I am able to successfully make it stationary BUT it loses all autocorrelation so predictions are junk. Have you ever seen this? I have found some things on line that says this is possible but it depends very much on the characteristics of the time series.
Yes, the situation you're describing is not uncommon in time series analysis, and it's often a delicate balance to strike between achieving stationarity and preserving important characteristics like autocorrelation. When you difference or transform a time series to achieve stationarity, you are essentially altering the original data to make it more amenable to modeling. However, as you've observed, too aggressive a transformation can result in the loss of autocorrelation, which is crucial for capturing temporal dependencies in the data. Here are a few considerations and potential approaches to handle this situation: Selective Transformation: Instead of applying a uniform transformation to the entire time series, consider selectively applying transformations to specific components. For example, you might difference the data only where it's necessary or apply different transformations to different seasonal components. Partial Transformation: Rather than making the entire time series stationary, consider transforming only certain parts of it. For instance, you might apply differencing or another transformation to the trend component while leaving the seasonal component untouched. Different Models for Different Components: If your time series exhibits both trend and seasonality, you might consider using models that can handle each component separately. Seasonal decomposition of time series (STL) is one such approach where the time series is decomposed into trend, seasonal, and residual components, and each can be modeled independently. Advanced Models: Explore advanced models that can handle non-stationary data more effectively. Long Short-Term Memory (LSTM) networks and other recurrent neural networks (RNNs) are known for their ability to capture temporal dependencies in data. Ensemble Approaches: Combine predictions from models trained on the original data and models trained on the transformed data. Ensemble methods can sometimes capture the strengths of different models. Grid Search and Cross-Validation: Systematically experiment with different combinations of transformations and models. Use grid search and cross-validation to evaluate the performance of various configurations and find the optimal solution. It's worth noting that the ideal approach can vary depending on the specific characteristics of your time series data. Experimentation and a deep understanding of the data's behavior are key. If possible, consider consulting with domain experts or seeking feedback from colleagues who have experience with similar time series patterns. Remember that achieving stationarity is a means to an end (better model performance), and the goal is to strike a balance that preserves the essential characteristics of the data while making it amenable to modeling.
Thanks for your detailed reply. How do you conduct a partial transformation? for example, do I difference only a section of the source data that I’m training the model on? How would I even then reverse transform predictions?
I have a doubt... at 54 mins when you are using ARIMA model and you started with the original data. Then why did you transform the data to stationary data since you used the original data instead?Thank you so much.
Being the Data Scientist, you gotta explore all the posibbilities.. as explained in the video as well... the decision was taken basis on analysis where it was observed that it won't perform better comparatively and it has also been suggested, that we will try making another video where we utilize the stationary data to see the how it performs.. As a learner, your question make sense.. keep asking the questions for clarity
diff computes the difference of a set of values, essentially subtracting each value from the subsequent value in an array or list, if can provide the timestamp here, will be able to give you the specific guidence
The original data series is not a stationary series yet, I see you have done some way to convert it to a stationary series. But why do you use the initial data when training the model when it is not a stationary sequence?
I have the same query as well. I can understand the section on checking on stationarity, but I don't see how that's getting incorporated into the subsequent training and model fitting. If the original dataset can be used for training rather than the transformed dataset, what's the use of determining if the data is stationary or not? Did I miss something ? Otherwise, excellent video, clearly explained. Would be interested to see videos on Time Series Analysis using other models such as XGBoost, Prophet. Thank you sir.
That's an excellent problem statement to choose, little bit of more detail which you might have provided is - >> what sort of model you want to develop >> what is the main purpose/scope of the model etc. lets assume that you want to build a credit risk model and the data which you are taking under consideration, includes the covid period as well. (Before I start, make sure that the data is in relatively balanced quantity & period). Below are the approach which you can undertake - Data Collection: Gather historical credit risk data, including loan performance, defaults, delinquencies, and relevant economic indicators. Include data specific to the COVID-19 period, such as unemployment rates, government stimulus programs, and financial relief measures. Data Preprocessing: Clean and preprocess the data by addressing missing values, outliers, and data inconsistencies. Create relevant features, such as lagged values of credit risk indicators and economic variables, to capture time dependencies. Exploratory Data Analysis (EDA): Perform EDA to understand the data's characteristics and relationships. Explore trends, seasonality, and patterns, paying specific attention to changes during the COVID-19 period. Define the Target Variable: Define the credit risk metric you want to predict, such as default probability or loan delinquency. Feature Selection: Identify relevant features that may influence credit risk. This includes economic indicators, loan characteristics, borrower information, and external factors. Time Series Decomposition: Decompose the time series data to understand underlying trends, seasonality, and residuals, considering the effects of COVID-19. Create a Historical Train-Test Split: Split the data into training and testing sets, ensuring that the testing set includes the COVID-19 period. Model Selection: Choose a suitable forecasting model. In this case, time series models like ARIMA, SARIMA, or Prophet may be appropriate. Consider using machine learning models like Gradient Boosting, Random Forest, or LSTM if you have sufficient data. Model Training: Train the selected model on the historical data, excluding the testing period. Model Validation: Evaluate the model's performance using the testing data, specifically during the COVID-19 period. Use appropriate evaluation metrics, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or classification metrics for binary outcomes. Model Interpretation: Interpret the model's predictions to understand which factors contribute to credit risk during the COVID-19 period. Feature Importance: Analyze feature importance to identify key drivers of credit risk during the pandemic. Model Refinement: Fine-tune the model and hyperparameters if the initial model's performance is suboptimal. Scenario Analysis: Conduct scenario analysis to assess credit risk under different economic conditions related to COVID-19, such as varying levels of unemployment or government interventions. Model Deployment: Deploy the trained model for ongoing credit risk assessment and predictions. Monitoring and Feedback Loop: Continuously monitor the model's performance and retrain it as new data becomes available. Regulatory Compliance: Ensure that your credit risk model complies with regulatory requirements and standards relevant to your industry. Documentation: Document the entire modeling process, including data sources, preprocessing steps, model selection, and evaluation metrics. Keep in mind that the unique challenges posed by the COVID-19 pandemic may require you to adapt your model and data sources to reflect changing economic conditions and government policies. Regularly update and refine your credit risk prediction model to account for these dynamics.
Suppose month attribute is missing you only have year attribute in that case how can u make data stationary,can you explain please I mean u only have year and passenger attribute in that case how to make the data stationary.Please reply
Stationarity can be on year basis as well.. When you're dealing with time series data that only has a yearly frequency, the approach to making the data stationary is similar to what you'd do with more frequent data, but with some specifics to consider. Visualizing the Data: Start by plotting the data. This will give you an idea of the overall trend, seasonality, and variance. Since the data is yearly, you might not observe any distinct seasonality. python code - import matplotlib.pyplot as plt plt.plot(year, passenger) plt.xlabel('Year') plt.ylabel('Passenger') plt.title('Yearly Passenger Count') plt.show() Differencing: A common approach to making time series data stationary is by differencing the data. Differencing helps to remove trends in the data. You subtract the previous year's observation from the current year's observation. python code- passenger_diff = passenger.diff().dropna() After differencing, plot the data again to see if it appears more stationary. Checking for Stationarity: The Augmented Dickey-Fuller test is commonly used to check the stationarity of a time series. python code - from statsmodels.tsa.stattools import adfuller result = adfuller(passenger_diff) print('ADF Statistic:', result[0]) print('p-value:', result[1]) A low p-value (typically ≤ 0.05) indicates that the time series is stationary. Transformations: If differencing isn't enough, consider other transformations like: Log transformation: To stabilize variance. python code import numpy as np passenger_log = np.log(passenger) Rolling means: To smooth out short-term fluctuations and highlight longer-term trends. python code - rolling_mean = passenger.rolling(window=5).mean() # 5-year window as an example passenger_detrended = passenger - rolling_mean passenger_detrended.dropna(inplace=True) Decomposition: Even though the data is yearly, if you suspect any seasonality or a strong trend, you can use decomposition. The Seasonal Decomposition of Time Series (STL) from the statsmodels library can be useful. python code - from statsmodels.tsa.seasonal import STL stl = STL(passenger, seasonal=13) result = stl.fit() detrended = result.trend deseasonalized = result.seasonal You can then work with the residuals from the decomposition process, which should ideally be stationary.
you can mention the function name which has been used in the video from statsmodel but you do not find them in the model now.. we will try to find and help you with closest alternative function if that doesn't exist
the dataset is part of seaborn library.. you can just run the code - import seaborn as sns df = sns.load_dataset('flights') you can also download the notebook github link provided in the description
One of the best ARIMA implementation tutorials I have seen. I’m a bit frustrated I found it after I had used ARIMA for a project. I can’t even tell you how much time I had wasted going online and on forums, trying to understand how it works.
But hey, now that I learned it the hard way it better be sticking. 😂
Appreciate it!
Glad it helped!
This is one of the best video on Timeseries in youtube .Well Explained.Content is very nice.
Glad you liked it
I have come across many blogs and videos to understand the time series process, but I didn't get a clear picture. However, this video gave me a clear understanding of the process. Really great work! Much appreciated.
Glad it was helpful!
one of the best video i have ever seen base on the time series in yt. Thanks for making it.
Glad it was helpful
You really did justice to this topic. Very well done!
thank you very much
@@learnerea. Please, can you make a video on how to use the transformed data especially one gotten using log, sqrt and shift. I have been trying to figure that out. The area that got me confused is how to transform the data back to the original format. Thank you
Incredible video, thank you! I kept trying to train my model with the Differenced data and was not getting good results but I caught my error because of this video.
Glad it helped!
Its really a crazy explanation. I would recommend this in my org, Jio. Keep it up man. God bless you!
Thank you, I will
Really appreciate this video !!👍
Thanks for this amazing VDO!!!
Glad it was helpful!!
Hello sir,
I don't know whats your mistake
But i got desired results using arima model at time 1:13;45
Instead of the line at the bottom i got desired results.
And I followed all things teached by you.
Using arima model i got 43 as my mean squared error
Super
you are great! helped me with my project last minute thanks for the video!!
Glad I could help!
So informative. I do not see a relation between the transformation (Log&Sqrt&Shift) which makes the data stationary and the ARIMA model you build. I'm so confused at this step. I tried with my data and noted that the ShiftDiff transformation makes my data stationary but when it comes to building the model, it does not fit well. Thank in advance.
Thank you so much for this vedio, studying since last 3 years, taken some expensive courses, this is the best explanation, kept me motivated to explore and learn throughout the vedio...let us know how we can support you to make more learning vedio thanks.
You are most welcome, and I'm glad that it was helpful..
keep watching
ARIMA Model Building starts here: 56:47
if I am dealing with time series data with hourly frequency data collected for 2 years. What should I take as lag (shift) value.
If you use the time shift method, d will be the interval for the shift. What happens if you use any other method like the log or square root? What will d be?
as you said you were trying to keep it to the beginner's level that's why it's kind of more understandable to the smallest degree possible, except you just got it wrong about the model, it's not ARIMA model that is working bad, it's you trying to predict a whole range of values with the same training data. it means, it'd work well on the first few values but not for all. you have to use the walk forward variation, that is basically to update you training set each time you predict a new value, Thats my idea.
and thank you for the good video.
Your content is too good. I am not able to understand why yiu have such a low views on this video. One suggesgion please make the thumnail little bit eye catchy.
Noted
Thanks for the video. There is a mior mistake in ADF I noticed is that you cannot accept the null hypothesis and you can only reject the null hypothesis.
great tutor thanks for the video ❤❤
Glad you liked it!
Thanks very much for this video, really helpful. However, I have a question, the ARIMAX can be used to implement panel dataset right? do you have a tutorial on the implementation of an ARIMAX?
Hello sir, in the 35:35 , ai didnt get the same result with you when i execute the line of df.head()
>> You may like to revisit the code, you have created
>> You can put the code here as well, we will analyze the diff. and can help
really good work👌, keep it up
Thanks a lot 😊
Basic Question...Why did we run the model on original set and towards the end you mentioned on running model on altered data set basically diff/square root ?
Did you mistakenly plot the PACF of airP['arimaPred'] at time stamp - 1:15:52 ?
I am not sure why you would plot PACF of predicted values. 😕
For that airP['12diff'] we have to take, as it is seasonal difference
Hi, Content is very good and very well explained. thanks for sharing it. Can you please help me understand that we have tried to identify the stationarity but did not use it in modelling. and even identifying the stationarity was not concluded. we did not get desired results.
Thank you very much for watching it. Yes, that was primarily because it was a beginner level and hence we did not want to spend a lot of time in reverting it back. Certainly we will make another one where we conclude and utilize the stationarity.
i have sales data consisting of time period and other features including different schemes as features, almost 7-8 those are active on some months so basically they are categorial variables containing 0 or 1. Should i go ahead with Armia for forecasting, if yes then how to consider those categorical variable
You are amazing. I love the way you explain. Can you do the same for multidimensional data sets?
Yes, soon
the best tutorials bro
Glad it was helpful
In a sarima model while doing an analysis i found that for d=0,D=1(as i did seasonal differencing one and no non-seasonal differencing) prediction is fitting whole data except initial 22 values(predicting almost 0 values for initial 22 values) which is the seasonality of my data.
can you explain why is this happening?
I hope you got my question
Assuming you are using the same data as in video, please share your code at learnerea.edu@gmail.com so that we could have a view.. and guide you more specifically.. include the data as well if it's different from the video
Hi, I was using your tutorial to learn how to implement ARIMA models. I then went about and implemented my own with some of my own data that I'm using for a school project. However, while my model fit my data very well, my forecasts are flat and they're strange. Could you help me in any way?
Can't thank you enough 🙏
Glad it was helpful
brother your work is extremely helpful ,brother i looked for the rolling statistics video link but couldn't find it please share it then thanks in anticipation
This was so informative. Thank you a bunch! I understood time series. Do you have similar videos for regressions? Thank you!
Subscribed
Glad it was helpful. the below one is on linear regression -
th-cam.com/video/IigoyVON0eM/w-d-xo.html
here is a problem we solved using the regression and other best fit models -
th-cam.com/video/2YAheiIHNzI/w-d-xo.html
I recommend you to have a look at the whole datascience playlist -
th-cam.com/play/PL4GjoPPG4VqOmyh7hQ730evtLaz04LwSf.html
@@learnerea Thank you so much. Love you guys!
Hellow Dr thanks a lot for sharing the information and teanch us .
I have a little question with your permission
the question is : if we estimate our model "ARIMA" and found that there is autocorolation between the riseduals the the model ...... how can we fix this problem ?
thanks again 🤗🙏🙏🧡❤
There are several potential approaches you can take if you find autocorrelation in the residuals of your ARIMA model. Here are a few options you could consider:
Adding additional AR or MA terms to the model: If the autocorrelation is due to a pattern that has not been captured by the current model, adding additional terms may help to capture this pattern and improve model performance.
Differencing the data: If the autocorrelation is due to a trend in the data, differencing the data may help to remove this trend and improve model performance.
Using a different model: If the ARIMA model is not suitable for the data, you may need to consider using a different model altogether. For example, a seasonal ARIMA (SARIMA) model may be more appropriate for data with seasonal patterns.
Modeling the residuals: If none of the above approaches work, you can try modeling the residuals as a separate time series. This can help to capture any remaining patterns in the data that are not accounted for by the primary model.
@@learnerea 🥰🥰🥰🥰🥰❤❤🧡💛 thanks a lot 🙏🙏
I have a situation where I can make reasonable training and predictions with the original (non-stationary) data. When I transform the data, I am able to successfully make it stationary BUT it loses all autocorrelation so predictions are junk. Have you ever seen this? I have found some things on line that says this is possible but it depends very much on the characteristics of the time series.
Yes, the situation you're describing is not uncommon in time series analysis, and it's often a delicate balance to strike between achieving stationarity and preserving important characteristics like autocorrelation.
When you difference or transform a time series to achieve stationarity, you are essentially altering the original data to make it more amenable to modeling. However, as you've observed, too aggressive a transformation can result in the loss of autocorrelation, which is crucial for capturing temporal dependencies in the data.
Here are a few considerations and potential approaches to handle this situation:
Selective Transformation:
Instead of applying a uniform transformation to the entire time series, consider selectively applying transformations to specific components. For example, you might difference the data only where it's necessary or apply different transformations to different seasonal components.
Partial Transformation:
Rather than making the entire time series stationary, consider transforming only certain parts of it. For instance, you might apply differencing or another transformation to the trend component while leaving the seasonal component untouched.
Different Models for Different Components:
If your time series exhibits both trend and seasonality, you might consider using models that can handle each component separately. Seasonal decomposition of time series (STL) is one such approach where the time series is decomposed into trend, seasonal, and residual components, and each can be modeled independently.
Advanced Models:
Explore advanced models that can handle non-stationary data more effectively. Long Short-Term Memory (LSTM) networks and other recurrent neural networks (RNNs) are known for their ability to capture temporal dependencies in data.
Ensemble Approaches:
Combine predictions from models trained on the original data and models trained on the transformed data. Ensemble methods can sometimes capture the strengths of different models.
Grid Search and Cross-Validation:
Systematically experiment with different combinations of transformations and models. Use grid search and cross-validation to evaluate the performance of various configurations and find the optimal solution.
It's worth noting that the ideal approach can vary depending on the specific characteristics of your time series data. Experimentation and a deep understanding of the data's behavior are key. If possible, consider consulting with domain experts or seeking feedback from colleagues who have experience with similar time series patterns.
Remember that achieving stationarity is a means to an end (better model performance), and the goal is to strike a balance that preserves the essential characteristics of the data while making it amenable to modeling.
Thanks for your detailed reply. How do you conduct a partial transformation? for example, do I difference only a section of the source data that I’m training the model on? How would I even then reverse transform predictions?
thanks
Welcome
I have a doubt... at 54 mins when you are using ARIMA model and you started with the original data. Then why did you transform the data to stationary data since you used the original data instead?Thank you so much.
exactly,,, i have this question as well? Because we were taught to fit the model on transformed data? Please reply it would be very helpful.
great
thank you very much for watching
👌
if we were not use the stationarity stuffs, why we calculated them?
Being the Data Scientist, you gotta explore all the posibbilities..
as explained in the video as well... the decision was taken basis on analysis where it was observed that it won't perform better comparatively and it has also been suggested, that we will try making another video where we utilize the stationary data to see the how it performs..
As a learner, your question make sense.. keep asking the questions for clarity
i was wondering the same
Hi can you plz help me to understand why lag for pacf is 20
It will be great if you can share the time stamp where you spot this point
What does diff(12) mean
diff computes the difference of a set of values, essentially subtracting each value from the subsequent value in an array or list, if can provide the timestamp here, will be able to give you the specific guidence
The original data series is not a stationary series yet, I see you have done some way to convert it to a stationary series. But why do you use the initial data when training the model when it is not a stationary sequence?
I have the same query as well. I can understand the section on checking on stationarity, but I don't see how that's getting incorporated into the subsequent training and model fitting. If the original dataset can be used for training rather than the transformed dataset, what's the use of determining if the data is stationary or not? Did I miss something ? Otherwise, excellent video, clearly explained. Would be interested to see videos on Time Series Analysis using other models such as XGBoost, Prophet. Thank you sir.
Hi, you did not upload a video where stationery data was used.
I think, not yet..
How to approach forecasting with he lockdown data?
That's an excellent problem statement to choose, little bit of more detail which you might have provided is -
>> what sort of model you want to develop
>> what is the main purpose/scope of the model etc.
lets assume that you want to build a credit risk model and the data which you are taking under consideration, includes the covid period as well. (Before I start, make sure that the data is in relatively balanced quantity & period). Below are the approach which you can undertake -
Data Collection:
Gather historical credit risk data, including loan performance, defaults, delinquencies, and relevant economic indicators.
Include data specific to the COVID-19 period, such as unemployment rates, government stimulus programs, and financial relief measures.
Data Preprocessing:
Clean and preprocess the data by addressing missing values, outliers, and data inconsistencies.
Create relevant features, such as lagged values of credit risk indicators and economic variables, to capture time dependencies.
Exploratory Data Analysis (EDA):
Perform EDA to understand the data's characteristics and relationships.
Explore trends, seasonality, and patterns, paying specific attention to changes during the COVID-19 period.
Define the Target Variable:
Define the credit risk metric you want to predict, such as default probability or loan delinquency.
Feature Selection:
Identify relevant features that may influence credit risk. This includes economic indicators, loan characteristics, borrower information, and external factors.
Time Series Decomposition:
Decompose the time series data to understand underlying trends, seasonality, and residuals, considering the effects of COVID-19.
Create a Historical Train-Test Split:
Split the data into training and testing sets, ensuring that the testing set includes the COVID-19 period.
Model Selection:
Choose a suitable forecasting model. In this case, time series models like ARIMA, SARIMA, or Prophet may be appropriate.
Consider using machine learning models like Gradient Boosting, Random Forest, or LSTM if you have sufficient data.
Model Training:
Train the selected model on the historical data, excluding the testing period.
Model Validation:
Evaluate the model's performance using the testing data, specifically during the COVID-19 period.
Use appropriate evaluation metrics, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or classification metrics for binary outcomes.
Model Interpretation:
Interpret the model's predictions to understand which factors contribute to credit risk during the COVID-19 period.
Feature Importance:
Analyze feature importance to identify key drivers of credit risk during the pandemic.
Model Refinement:
Fine-tune the model and hyperparameters if the initial model's performance is suboptimal.
Scenario Analysis:
Conduct scenario analysis to assess credit risk under different economic conditions related to COVID-19, such as varying levels of unemployment or government interventions.
Model Deployment:
Deploy the trained model for ongoing credit risk assessment and predictions.
Monitoring and Feedback Loop:
Continuously monitor the model's performance and retrain it as new data becomes available.
Regulatory Compliance:
Ensure that your credit risk model complies with regulatory requirements and standards relevant to your industry.
Documentation:
Document the entire modeling process, including data sources, preprocessing steps, model selection, and evaluation metrics.
Keep in mind that the unique challenges posed by the COVID-19 pandemic may require you to adapt your model and data sources to reflect changing economic conditions and government policies. Regularly update and refine your credit risk prediction model to account for these dynamics.
Suppose month attribute is missing you only have year attribute in that case how can u make data stationary,can you explain please I mean u only have year and passenger attribute in that case how to make the data stationary.Please reply
Stationarity can be on year basis as well..
When you're dealing with time series data that only has a yearly frequency, the approach to making the data stationary is similar to what you'd do with more frequent data, but with some specifics to consider.
Visualizing the Data:
Start by plotting the data. This will give you an idea of the overall trend, seasonality, and variance. Since the data is yearly, you might not observe any distinct seasonality.
python code -
import matplotlib.pyplot as plt
plt.plot(year, passenger)
plt.xlabel('Year')
plt.ylabel('Passenger')
plt.title('Yearly Passenger Count')
plt.show()
Differencing:
A common approach to making time series data stationary is by differencing the data. Differencing helps to remove trends in the data. You subtract the previous year's observation from the current year's observation.
python code-
passenger_diff = passenger.diff().dropna()
After differencing, plot the data again to see if it appears more stationary.
Checking for Stationarity:
The Augmented Dickey-Fuller test is commonly used to check the stationarity of a time series.
python code -
from statsmodels.tsa.stattools import adfuller
result = adfuller(passenger_diff)
print('ADF Statistic:', result[0])
print('p-value:', result[1])
A low p-value (typically ≤ 0.05) indicates that the time series is stationary.
Transformations:
If differencing isn't enough, consider other transformations like:
Log transformation: To stabilize variance.
python code
import numpy as np
passenger_log = np.log(passenger)
Rolling means: To smooth out short-term fluctuations and highlight longer-term trends.
python code -
rolling_mean = passenger.rolling(window=5).mean() # 5-year window as an example
passenger_detrended = passenger - rolling_mean
passenger_detrended.dropna(inplace=True)
Decomposition:
Even though the data is yearly, if you suspect any seasonality or a strong trend, you can use decomposition. The Seasonal Decomposition of Time Series (STL) from the statsmodels library can be useful.
python code -
from statsmodels.tsa.seasonal import STL
stl = STL(passenger, seasonal=13)
result = stl.fit()
detrended = result.trend
deseasonalized = result.seasonal
You can then work with the residuals from the decomposition process, which should ideally be stationary.
can you the forecast this?
But sir the new statsmodels seems to have different functions
you can mention the function name which has been used in the video from statsmodel but you do not find them in the model now..
we will try to find and help you with closest alternative function if that doesn't exist
@@learnerea@learnerea can you make a new video on implementation of arima.. On share market dataset or weather dataset
Hi, i cannot find the data set, could you help me please! =D
the dataset is part of seaborn library.. you can just run the code -
import seaborn as sns
df = sns.load_dataset('flights')
you can also download the notebook github link provided in the description
Where is the dataset ?
share your python notebook sir @Learnerea
Here you go -
github.com/LEARNEREA/Data_Science/blob/main/Scripts/time_series_air_passengers.py
my data is the form of year, week
the D parameter is the number of differences you take on your data, which is not what you said. This is as basic as it gets man, come on
Can you share this jupyter notebook with me?
via mail
Hi Meronika,
you can find that using -
file name - time_series_air_passengers.py
url - github.com/LEARNEREA/Data_Science/tree/main/Scripts