Hi! First of all thank you for this great tutorial! I have a question about train-test split while using lag/window features. When you apply lag/window features on the whole dataset and then make the split, doesn't it lead to data leakage - since you're using test data's information on train dataset? I understand that in this case, an unseen 30 days of data from test was used in train with lag features, am I wrong?
First of all, that was well explained project. However, I do have a problem with my code. Line 45 of your notebook, l am trying to run it in my notebook o am receiving the following error: Expected 2D array, got 1D array instead: array=[6.0e-02 6.2e+01 4.4e+01]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. HOW CAN I FIX THIS??
Just noticed that at longer forecast times, a lag appears to develop in the model. Is this normal or an issue with my coding? For example, at a forecast time of 9 months instead of 1 month, the MSE is quite high. However, when I shift the predictions back 9 months it matches up much better with what actually happened.
Firstly, I would like to express my sincere gratitude for the invaluable tutorial you provided. It has been incredibly helpful in our coding journey so far. However, while implementing the concepts from the tutorial, we encountered a small issue related to the following code snippet: weather["month_day_max"] = weather["month_max"] / weather["t_max"] weather["max_min"] = weather["t_max"] / weather["t_min"] Unfortunately, we noticed that some values in our dataset for t_min or t_max are zero, resulting in division by zero and subsequently producing infinite values. As a consequence, we encounter errors during the execution of our code later I would greatly appreciate your guidance on how to overcome this problem. Are there any alternative approaches or modifications we can make to the code in order to avoid these errors? Thank you once again for your time and assistance. I eagerly await your response.
I know i can be a little late, but you could use np.where to add a condition to ensure that the denominator is not zero. If the denominator is zero, you can set the value as np.nan and then fill it properly later
@@RafeuLopo Figured out how to check for 0 using .where(): core_weather["month_day_max"] = core_weather["month_max"] / core_weather["temp_max"].where(core_weather["temp_max"] != 0) core_weather.loc[core_weather["month_day_max"].isnull(), "month_day_max"] = core_weather["month_max"] / 0.1 core_weather["max_min"] = core_weather["temp_max"] / core_weather["temp_min"].where(core_weather["temp_min"] != 0) core_weather.loc[core_weather["max_min"].isnull(), "max_min"] = core_weather["temp_max"] / 0.1 Is dividing by 0.1 the "proper" result though?
Thank you. There are lot of examples like this, but they are not useful. You can't reliably predict tommorow temperature by using previous days. You must assess weather patterns, for that you need all possible variables you can get for your inputs (features), like solar radiation, geopotential heights, wind directions on various levels, humidity on levels, temperature on levels, convergence, divergence, ideally surface and soil temperatures and moistures, and so on. Then you need to find which of those have impact on temperature by checking correlations, and remove all other not-useful inputs. Then you might get really somewhere...
Hello Thanks for this video. I.m getting an error on line 66/67 saying "TypeError: incompatible index of inserted column with frame index. Here is my line of code core_weather["monthly_avg"] = core_weather["temp_max"].groupby(core_weather.index.month).apply(lambda x: x.expanding(1).mean()) If it makes a difference, I'm running this from vscode. Everything has worked fine so far except I didnt get the plots
while i'm deeply in AI and i really like the content and the speaker, there is nothing MORE BORING than predicting the weather? Imagine a 16 yeard coding-interested in AI etc.: what would he rather predict? The weather or the outcome of his soccer games? 😀
Hey Dataquest ! I have a question :) ! I followed your video and it was pretty straightforward, well explained. Buuuuut, i'm trying to adapt this to a personnal case, for my studies. I took an other dataset, with 3 values ( Temperature / Humidity / Wind ), and i " randomized them. By random, i mean Temperature is always between 18 and 25, and Humidity is Temperature + 10. When i get my predictions, i'm trying to predict my Temperature, they are like all at 19.5. So when i plot, i got nearly a line. Any idea why this happens ? I tought with just a Humidity = Temperature + 10 and those kinds of relation between my values, i could actually get a decent prediction range, but it looks like i'm not understanding something. Thank you for the answer :) !
Machine learning models can't predict if the values are random. Tomorrow's temperature would need to be correlated with today's temperate to be able to make future predictions. I would check the correlations between what you're using to predict, and what you're trying to predict.
hi , first thanks for this tutorial , but i've some difficulties to have the same csv as you on my notebook . In mine there's no date column , STATION NAME , ACMH etc . Is it possible for you to help me please ?
Great video! One question I have is about how to make a forecast using this. Right now we are just able to see the models prediction for the test time frame and see how accurate it is. For example, my dataset ends 07-01-22, and so the last value predicted by the model is for June 30th. What code should I use to let the model make a forecast for 07-02?
So if you want to make a prediction for tomorrow, just feed in the data for today. So if the max temp today was 50, and the min temp was 40, you can feed that into the algorithm. The prediction you get will be for the next day. So if you're using data for 7-1-2022 to generate the predictions, your prediction will be for 7-2-2022.
@@Dataquestio Oh ok, that makes sense. So if I remove the line coreweather = coreweather.iloc[:-1,:].copy(), I will then get the forecast for the next day?
@@lakshya6909 Yes I did, train = df.loc['1950-01-01':'2000-12-01'] test = df.loc['2001-01-01':] reg.fit(train[predictors], train['target']) predictions = reg.predict(test[predictors]) To generate a prediction, you use the code above. Lmk if you have any questions.
Hi Vikki - the video shows how to predict the weather for the next day. This is in the second half of the video, when we're training a machine learning algorithm.
Hi Dataquest, may I ask how to predict the future max and min temperature , examples my data from 1990 to 2021, i want to get the prediction from 2030 -2060 , how is it ya? Is there example from the video?
Hello Dataquest...I have a question. I want to predict 90 days of temperature and rain....Dou you have the script to predict series for many days for this models.? Regards Friend
Hi Magno - I don't have the code, but you can modify this code to make predictions for several days out. You just have to change the target being predicted. -Vik
excellent video, it's my first contact with machine learning. I have a doubt: I work with meteorological data with 10 years of data, and I would like to reconstruct the time series of the past, in about 20 years (the climatological normal), and then make the forecast for the next years. it would be possible? what would be the best approach? currently I work with hourly wind speed data in brazil. thank you. regards
I am getting the following error. I am not sure where it is coming from or how to fix it: ValueError: Input X contains infinity or a value too large for dtype('float64').
This is extraordinary in every way. I recently read a similar book, and it was extraordinary in every way. "The Art of Meaningful Relationships in the 21st Century" by Leo Flint
Hey, I am new to programming and Ml. Infact, this was my first project. Can anyone please tell me where I should input data for today, so as to obtain predictions for tomorrow? Basically I understood how we trained the model and all, but how do I now use it to obtain results?
Hi Vik, thanks for this video ! I used the dataset from JKF Airport and wanted to keep snow and snow_depth in. However, towards the end of the project when I write: error, combined = create_predictions(predictors, core_weather, reg) # I get the following error ValueError Traceback (most recent call last) /var/folders/d7/q_fznsr95_97r6lp_mx_vp640000gn/T/ipykernel_57500/1727150671.py in ----> 1 error, combined = create_predictions(predictors, core_weather, reg) ... and then... ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Any ideas how to solve this? I think I have some large numbers somewhere - everything up until this point is fine
You can use pd.isnan and pd.isnull to filter the dataframe and check for missing or invalid data. For very large values, you can filter to check for numbers above a certain valuem You can also use the fillna method to replace any missing data.
Did you eventually resolve this? I had the same issue. I looked for min and max values for the new predictors. max (core_weather['month_max']) min (core_weather['month_max']) max (core_weather['month_day_max']) min (core_weather['month_day_max']) max (core_weather['max_min']) #inf min (core_weather['max_min']) Then, changed the formulation of min_max from a ratio to a difference (makes more sense to me that way): core_weather["max_min"] = core_weather["temp_max"] - core_weather["temp_min"] Problem solved.
Hi everyone! You can find the code for this tutorial here - github.com/dataquestio/project-walkthroughs/tree/master/weather .
cant open the link
Truly excellent tutorial - thank you so much!
That was great curriculum material in 42 minutes!
Great tutorial, step by step very very well explained. I really thank you!
Great video it really helped me in my project to generate missing data
really very informative and helpful vedio,Vik... keep up the good work :)
Thank you for sharing this.
Thank YOU very much I am from Sri Lanka
Thanks a ton! It was of great help!
Hi! First of all thank you for this great tutorial!
I have a question about train-test split while using lag/window features.
When you apply lag/window features on the whole dataset and then make the split, doesn't it lead to data leakage - since you're using test data's information on train dataset?
I understand that in this case, an unseen 30 days of data from test was used in train with lag features, am I wrong?
Interesting but could this be done in VBA or C++? Have you done a similar example in VBA? Thanks.
Super helpful!!!!
Excellent
so will this predict the weather for tommorow.
First of all, that was well explained project. However, I do have a problem with my code. Line 45 of your notebook, l am trying to run it in my notebook o am receiving the following error:
Expected 2D array, got 1D array instead:
array=[6.0e-02 6.2e+01 4.4e+01].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
HOW CAN I FIX THIS??
excellent!
this is so dope!
Just noticed that at longer forecast times, a lag appears to develop in the model. Is this normal or an issue with my coding? For example, at a forecast time of 9 months instead of 1 month, the MSE is quite high. However, when I shift the predictions back 9 months it matches up much better with what actually happened.
too sweet.
Why didn't you get correlation of variables to target?
SIr, can you update this code to find out future forecasting ?
Firstly, I would like to express my sincere gratitude for the invaluable tutorial you provided. It has been incredibly helpful in our coding journey so far.
However, while implementing the concepts from the tutorial, we encountered a small issue related to the following code snippet:
weather["month_day_max"] = weather["month_max"] / weather["t_max"]
weather["max_min"] = weather["t_max"] / weather["t_min"]
Unfortunately, we noticed that some values in our dataset for t_min or t_max are zero, resulting in division by zero and subsequently producing infinite values. As a consequence, we encounter errors during the execution of our code later
I would greatly appreciate your guidance on how to overcome this problem. Are there any alternative approaches or modifications we can make to the code in order to avoid these errors?
Thank you once again for your time and assistance. I eagerly await your response.
I know i can be a little late, but you could use np.where to add a condition to ensure that the denominator is not zero. If the denominator is zero, you can set the value as np.nan and then fill it properly later
@@RafeuLopo Figured out how to check for 0 using .where():
core_weather["month_day_max"] = core_weather["month_max"] / core_weather["temp_max"].where(core_weather["temp_max"] != 0)
core_weather.loc[core_weather["month_day_max"].isnull(), "month_day_max"] = core_weather["month_max"] / 0.1
core_weather["max_min"] = core_weather["temp_max"] / core_weather["temp_min"].where(core_weather["temp_min"] != 0)
core_weather.loc[core_weather["max_min"].isnull(), "max_min"] = core_weather["temp_max"] / 0.1
Is dividing by 0.1 the "proper" result though?
Nice video tutorial. Can you please do something on predicting for the next day, week, month or year using time series?
Thanks, I'll add this to the list for potential future videos! - Vik
What are the computer languages they used?
getting error while fitting the model
Found array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required by Ridge.
Thank you. There are lot of examples like this, but they are not useful. You can't reliably predict tommorow temperature by using previous days. You must assess weather patterns, for that you need all possible variables you can get for your inputs (features), like solar radiation, geopotential heights, wind directions on various levels, humidity on levels, temperature on levels, convergence, divergence, ideally surface and soil temperatures and moistures, and so on. Then you need to find which of those have impact on temperature by checking correlations, and remove all other not-useful inputs. Then you might get really somewhere...
Key error in reg.fit(train[predictors],train["target"])
Hello
Thanks for this video. I.m getting an error on line 66/67 saying "TypeError: incompatible index of inserted column with frame index.
Here is my line of code
core_weather["monthly_avg"] = core_weather["temp_max"].groupby(core_weather.index.month).apply(lambda x: x.expanding(1).mean())
If it makes a difference, I'm running this from vscode. Everything has worked fine so far except I didnt get the plots
I really enjoyed watching this, even though I'm new to coding stuffs but it is possible to predict weather with any machine learning techniques?
while i'm deeply in AI and i really like the content and the speaker, there is nothing MORE BORING than predicting the weather?
Imagine a 16 yeard coding-interested in AI etc.: what would he rather predict? The weather or the outcome of his soccer games? 😀
guys I m a beginner, what kind of algorithm is used? is this a linear regression
This is a good example of people not understanding anything about weather. U can never forecast more than 14 days in advance.
When i try to create a train and test set, its showing attribute error. Function object has no attribute 'loc'. Why is that?
Hi Ajumon - can you share a few lines of code before and after the error, as well as the error traceback? That will help me answer your question.
@@Dataquestio I AM ALSO GETTING THE SAME ERROR...HOW TO REMOVE THIS..PLS SUGGEST
*PLEASE* tell in detail sir how can I use this for predicting future weather values🤕🙏
Hey Dataquest ! I have a question :) !
I followed your video and it was pretty straightforward, well explained. Buuuuut, i'm trying to adapt this to a personnal case, for my studies.
I took an other dataset, with 3 values ( Temperature / Humidity / Wind ), and i " randomized them. By random, i mean Temperature is always between 18 and 25, and Humidity is Temperature + 10.
When i get my predictions, i'm trying to predict my Temperature, they are like all at 19.5. So when i plot, i got nearly a line.
Any idea why this happens ? I tought with just a Humidity = Temperature + 10 and those kinds of relation between my values, i could actually get a decent prediction range, but it looks like i'm not understanding something.
Thank you for the answer :) !
Machine learning models can't predict if the values are random. Tomorrow's temperature would need to be correlated with today's temperate to be able to make future predictions. I would check the correlations between what you're using to predict, and what you're trying to predict.
i want to predict climate based on the 3 parameters [temp max,wind speed,precepitation] for next 7 days how to get forecast of other 2 data
It's not prediction. It should be forecasting
Post a useful comment or Keep quiet 🤫
hi , first thanks for this tutorial , but i've some difficulties to have the same csv as you on my notebook . In mine there's no date column , STATION NAME , ACMH etc . Is it possible for you to help me please ?
and when i try to run my code it doesn't run and just add another cell
Did you actually answer your question? you did create a model, but what will be the weather tomorrow?
Thankyou this was really helpful where can i find the local weather dataset i am unable to download
The data and code are linked in the project description.
Great video! One question I have is about how to make a forecast using this. Right now we are just able to see the models prediction for the test time frame and see how accurate it is. For example, my dataset ends 07-01-22, and so the last value predicted by the model is for June 30th. What code should I use to let the model make a forecast for 07-02?
So if you want to make a prediction for tomorrow, just feed in the data for today. So if the max temp today was 50, and the min temp was 40, you can feed that into the algorithm. The prediction you get will be for the next day. So if you're using data for 7-1-2022 to generate the predictions, your prediction will be for 7-2-2022.
@@Dataquestio Oh ok, that makes sense. So if I remove the line
coreweather = coreweather.iloc[:-1,:].copy(), I will then get the forecast for the next day?
@@firstinweather6504 did youu get how to make a future prediction?:( if yes then please help me out
@@lakshya6909 Yes I did,
train = df.loc['1950-01-01':'2000-12-01']
test = df.loc['2001-01-01':]
reg.fit(train[predictors], train['target'])
predictions = reg.predict(test[predictors])
To generate a prediction, you use the code above. Lmk if you have any questions.
@@firstinweather6504 how to make a future prediction bro?
very good and informative video, but what can we do to predict the weather for next day ?
Hi Vikki - the video shows how to predict the weather for the next day. This is in the second half of the video, when we're training a machine learning algorithm.
@@Dataquestio thanks
@@Dataquestio which video sir
Hi Dataquest, may I ask how to predict the future max and min temperature , examples my data from 1990 to 2021, i want to get the prediction from 2030 -2060 , how is it ya? Is there example from the video?
Hi - we'll have a new video up next week that will show how to do this.
Hello Dataquest...I have a question. I want to predict 90 days of temperature and rain....Dou you have the script to predict series for many days for this models.? Regards Friend
Hi Magno - I don't have the code, but you can modify this code to make predictions for several days out. You just have to change the target being predicted. -Vik
Auto regression?
Hey Vik, very informative video. This method of machine learning falls under which category of machine learning, like random forest of CNN etc?
Hi Jaswant - in this project, we're using ridge regression, a linear model. You can modify the code to use random forests, though.
excellent video, it's my first contact with machine learning. I have a doubt: I work with meteorological data with 10 years of data, and I would like to reconstruct the time series of the past, in about 20 years (the climatological normal), and then make the forecast for the next years. it would be possible? what would be the best approach? currently I work with hourly wind speed data in brazil. thank you. regards
you mean you want to "create" data in the past ? if this is the case , use GAN model
Hi ! Sorry about this, i have an other question ! Is that normal that my MSE / mean_squared_error on the model is 20.5 ? It seems pretty high right ?
I'm also getting same 20
I am getting the following error. I am not sure where it is coming from or how to fix it: ValueError: Input X contains infinity or a value too large for dtype('float64').
This is extraordinary in every way. I recently read a similar book, and it was extraordinary in every way. "The Art of Meaningful Relationships in the 21st Century" by Leo Flint
Hey, I am new to programming and Ml. Infact, this was my first project. Can anyone please tell me where I should input data for today, so as to obtain predictions for tomorrow? Basically I understood how we trained the model and all, but how do I now use it to obtain results?
If you feed data for today into the predict method (max temp, min temp, etc), it will return the prediction for tomorrow.
Is the Data Free? I mean will the charge us for the Data .
Hi Mariri - downloading the data is completely free.
29:00
sir
Is INDIAN dataset availabale ??????????????
Hi Vik, thanks for this video ! I used the dataset from JKF Airport and wanted to keep snow and snow_depth in. However, towards the end of the project when I write:
error, combined = create_predictions(predictors, core_weather, reg) # I get the following error
ValueError Traceback (most recent call last)
/var/folders/d7/q_fznsr95_97r6lp_mx_vp640000gn/T/ipykernel_57500/1727150671.py in
----> 1 error, combined = create_predictions(predictors, core_weather, reg)
... and then...
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Any ideas how to solve this? I think I have some large numbers somewhere - everything up until this point is fine
You can use pd.isnan and pd.isnull to filter the dataframe and check for missing or invalid data. For very large values, you can filter to check for numbers above a certain valuem You can also use the fillna method to replace any missing data.
Did you eventually resolve this? I had the same issue. I looked for min and max values for the new predictors.
max (core_weather['month_max'])
min (core_weather['month_max'])
max (core_weather['month_day_max'])
min (core_weather['month_day_max'])
max (core_weather['max_min']) #inf
min (core_weather['max_min'])
Then, changed the formulation of min_max from a ratio to a difference (makes more sense to me that way):
core_weather["max_min"] = core_weather["temp_max"] - core_weather["temp_min"]
Problem solved.
When i try to create a train and test set, its showing attribute error. Function object has no attribute 'loc'. Why is that?