Hello, thanks for the course. I'm confused at 38:00: Your X_train and X_test should not include the output column 'cnt', right? Otherwise you will be training and make predictions using the output as one of your features.
but as he has chosen different scaler methods for training and test data. it may not come into notice to the neural network that the data is repeated but still it should not use I think.
The model uses previous values to predict future value. When we know the number of shared bikes in past, we can use the past values as input to make prediction. In the method "create_dataset", we take the previous values of both independent and dependent variables as input (X), because we know all of the past data. At each loop, The method uses only one future value in the label column as output (y).
At 37:01 you pass train as your X. And as train already consists of count that makes the whole model and training meaningless as the model will only use the count variable from the X to predict the y with near 100% accuracy.
X_train will contain count from the previous 24h, while y is the count of next hour (24+1 if you wish). So the question is more : is this really what would happen in real life ? Would we really need to predict only 1h in advance ?
Since we are predicting the future value of cnt from the actual past values that we have at our disposal at the time we predict the next count, there is nothing wrong to include the previous value of cnt in the training data.
This is yet another tutorial that explains how to use a LSTM model to do time series prediction. Yet, IMHO the results are quite modest in view of the fact that we are bringing in the big guns. A simple test I have previously done against a similar model was too compare what R^2 value we would get by just predicting that the next value we are predicting is equal to the previous one, essentially shifting the graph to the right. I bet in this particular case the error values would not be as great in the extremes. Note that if the extremes are missed, who cares about the low values in a real life situation?
please can you answer this question, @36:43 in your create datasets method, you are taking a future couple of data sets, along with the current, but after that, you're saying that you will take the history and predict the future, and that didn't happen!
Thanks for the video. Super helpful! yes yes , need more videos on time series analysis on the mentioned topics. Also, can you plz analyse time series using different techniques like CNN, attention models and transformers?
How would you expand this to predict future values rather than just predicting and checking against existing data? Additionally how do we get the actual datetime of the prediction instead of just the time steps?
Another Question (@47:00): Is there any reason, why you did not use 'validation_data=(X_test, y_test)' but rather 'validation_split'? Ah, I now realized, that you use the test data for prediction later on.
Hey Venelin, thank you very much for your videos! While reconstructing and "copying" your code, I am wondering (@31:00) why you fit the RobustScaler() only on the training data and user that fitted RobustScaler afterwards for the testing data. Shouldn't we fit a new RobustScaler to use with the testing dataset?
Need to try ;) I think Venelin supposes the test dataset is like it is never encountered by model. And we need to prepare our data, as if the test dataset is never be given. Because of that, the scaling process (RobustScaler.fit()) is conducted using only the training set. In the real world, the model won't take the new data and use it for rescaling itself. I think it is related to the continuous (or real-time) learning. In this example, the model learns one time and predicts only with learned weights.
Thanks for making this video. I have a question though, 1. Is there a continuity of sequences across batches. Looks like there is. It would be good if you can confirm ? 2. you are using sequence length as 24. Does this mean bptt/gradient calculation happens back till 24th time step? 3. By using Stateful =True, we can actually copy the last hidden state of previous batch to the first hidden state of new batch. So there seems to be continuity . Is this a good idea to explore ? If I use this idea(stateful LSTM) Can I get better result if compared to without using it.? 4. bptt and sequence length are they same ? I am confused between them.
thanks a lot for the video. I wanted to ask, if it's possible to use model.predict() for the data we don't have ? In such a case, what should be the arguments for model.predict()?
I'm also interested in learning how to predict FUTURE values (outside dataset). For that I want so share a small GitHub file, with some intro. to LSTM and an applied example for predicting future values (as you asked). Here is the link file: github.com/luismgar/Predict_Future_Values/blob/master/Future_Predict_LSTM.ipynb Main folder: github.com/luismgar/Predict_Future_Values Inside the file you'll find the original source-code idea, and related info. I'm not an expert, just a hobbiest trying to learn data analysis and forecasts. i hope this helps a bit.
Thanks for the video. How would you change to model to predict for 24 hours into the future instead of one. I understand how to change that dataset to contain 24 steps of input, and 24 steps of output. But not quite sure how to convey that to the model.
Hi, great walktrough! May the RobustScaler be the reason for your model to be not learning top points well-enough? (as it might be seeing those points as outliers and decreasing them more actually) Best,
@@muhamedfarooq8936 That is not true. If you see, initially his data has 13 columns and then when it's passed to LSTM as input it has also 13 columns. Can you see it also?
Agreed! He is predicting the column which already exist in the training dataset. He didn't drop any column he used all 13 columns in his training datasets
rcParams['figure', 'figsize']= 22, 10 this line in second cell doesn't work, and my plots are small and confusing. i tried to set the figsize in seaborn parameters but it gave me en error. i can't be serching around for solution, that's why i asked here, directly.
Hi, where are the concepts of stationarity used that you introduced in the beginning? Also, what if there were no trends or signal in the data, can you just model it with LSTM?
Since we are predicting the future value of cnt from the actual past values that we have at our disposal at the time we predict the next count, there is nothing wrong to include the previous value of cnt in the training data.
This tutorial is excellent.Thank you! However, I think, there's an inconspicuous mistake in the code. If I understand correctly, the arrays X_train and X_test contain not only the features but also the labels? Thus the Y-values (count) to be predicted are already included in the X-data?
I confirm that label "cnt" was not deleted from train and test nor X_train or X_test. Applying same logic than him but properly deleting labels from features dramatically increase mean squared errors for train set to 0.5, it plateaus at this quite bad score....
Hello there! Very nice walkthrough! I tried implementing this method with vibration data that I acquired from a motor. 12.000 rows or so. Unfortunately in my case, the model does not recognize the spikes in vibration at all. I tried different architectures of LSTM but the result is much the same. My goal is to predict when the next spike will occur. Now I do not know what path to take...get more examples to train? Change the LSTM architecture(add more LSTM cells or layers?), increase the epochs? I know that is difficult to advise me, hence you do not have the dataset but it should not matter so much. It doesn't have a upward or downward trend, it has roughly the same mean and standard deviation, but it does present seasonality. I chose to do this with LSTM especially because LSTM does not care if the data is stationary or not....should I use a VAR or ARIMA model instead?
Hi Venelin, I have a question. During test period, we won't be having the true value. So we can't create sequences. Isn't it correct to append the forecast to the sequences as we predict one value at a time?
Why do you supply the whole "train" dataset as first argument of "create dataset" function? Should you not be removing the train.cnt from this dataframe first? What am I missing?
Great video! Could you please show the code of how to add a second layer (another bidirectional LSTM) to see if you can improve the performance. Thanks !!
Thanks, a lot... If we have same type of data, and we are given 'Bike ID' as one of the feature and we have say 10 bikes, so for each unique bike ID there will be data from 2015 till 2017. In that case what should be our approach and how we should aggregate or work on the data?? Please help me..
Just like we import numpy as np it's just a convention that everyone does. Surely whoever started using 42 intended the reference now people just blindly use it because everyone else does lol.
Nice tutorial! However, why you do not shuffle the data while training? I thought that the sequence data itself has to be in time order, but the sequence samples can be shuffled? Or do the sequence samples has to be also in order?
Sequence inside batch must be respected for time series. If stateful = Yes dont shuffle the order of batches otherwise you could shuffle the order of batches if you want
Trial and errors. Those are hyperparameters and according to the book "Neural Networks from scratch in Python" adjusting hyperparameters is done by hand, and it takes time. That's why It takes so long to train NN. Btw I am kind of afraid of how much time I will have to spend on training my own NN when it takes my PC to train 3 hours with 10 epochs and 468 steps in each epoch on MNIST_fashion dataset, which consists of 60 000 training samples xD ( it's the example from the said book)
Because it is iterating with a window.. [i: (i+time_steps)] .. so last ith value will iterate from len(X)-time_steps till len(X).. else you will get out of index error
Здрасти, пробвал ли си LSTM за предсказване на ETF или акции? Интересно ми е ако да, каква акуратност ти се е получила. Аз онзи ден пробвах с друг подход да предиктна цената на среброто за следващ ден, обаче като резултат върху тестинг данните ми излезе 50% точност, което си е чиста ези тура :).
Hi Venelin! Great video! I am working on a project, just for fun because i want to get better at deep learning, about predicting sales prices on auctions based on a number of features over time and also the state of the economy, probably represented by the stock market or GDP. So its a Time Series prediction project. And i want to use transfer learning, finding a good pretrained model i can use. Do you have any idea about a model i can use?
Thank you very much, very nice video! What if my test data have no label? Maybe I could use the last "n" values to predict the first step and then use the predicted value (and n-1 previous ones) to feed the model and predict following one (and so on so forth). What do you think?
units = dimension of output (hidden layer). I guess it's problem-specific number. Probably taken from literature, where it was tested by trial-and-error method or sensitivity analysis. If I'm wrong then I would like to hear correct answer from someone in the future :)).
@@madhavjariwala4548 I tried this, but I was not able to interpret the results. I mean, get the actual forecasted values back and calculate r-square or plots. Please, if you have a code that does so (especially, many inputs, multistep output), share it with us. Thanks.
Why all the tutorials say predict, but do it on already known results? It's more like to build a model that as close to KNOWN results as possible. Prediction is different from forecasting I guess.
Nice! Thanks! Еваларка за туториала! But how about if we needed to predict more than one step ahead? Say 10 steps? Do we need a 10-neuron output layer and a Nx10 shaped y_train? За едни цени на акции питам :)
another thing, why you didn't use fit_transform method all at once instead you did separately. i know both ways are legit, i'm asking because i'm not sure myself and I'd like to confirm that fit_transform is much shorter way to perform transformation and fit at the same time.
@@pjaynem191 what a shame for me! Exponential smoothing and ARIMA are method for time series analysis and they using ONLY TİME. In this video, He is using time series as variable and using many variable. İs it mean time series prediction????
Very clear and straightforward, I wish all tutorials were like that
Very helpful, intuitive and instructive. Thank you so much Venelin
Hello, thanks for the course. I'm confused at 38:00: Your X_train and X_test should not include the output column 'cnt', right? Otherwise you will be training and make predictions using the output as one of your features.
I also had a doubt while watching him code and as you pointed out training data should not include 'cnt' column, I agree with you..
but as he has chosen different scaler methods for training and test data. it may not come into notice to the neural network that the data is repeated but still it should not use I think.
I also have the same question...
The model uses previous values to predict future value. When we know the number of shared bikes in past, we can use the past values as input to make prediction.
In the method "create_dataset", we take the previous values of both independent and dependent variables as input (X), because we know all of the past data.
At each loop, The method uses only one future value in the label column as output (y).
Thanks, I watched the video through and have it bookmarked. Just what I was looking for, champion :)
At 37:01 you pass train as your X. And as train already consists of count that makes the whole model and training meaningless as the model will only use the count variable from the X to predict the y with near 100% accuracy.
X_train will contain count from the previous 24h, while y is the count of next hour (24+1 if you wish). So the question is more : is this really what would happen in real life ? Would we really need to predict only 1h in advance ?
Since we are predicting the future value of cnt from the actual past values that we have at our disposal at the time we predict the next count, there is nothing wrong to include the previous value of cnt in the training data.
This is yet another tutorial that explains how to use a LSTM model to do time series prediction. Yet, IMHO the results are quite modest in view of the fact that we are bringing in the big guns. A simple test I have previously done against a similar model was too compare what R^2 value we would get by just predicting that the next value we are predicting is equal to the previous one, essentially shifting the graph to the right. I bet in this particular case the error values would not be as great in the extremes. Note that if the extremes are missed, who cares about the low values in a real life situation?
please can you answer this question, @36:43 in your create datasets method, you are taking a future couple of data sets, along with the current, but after that, you're saying that you will take the history and predict the future, and that didn't happen!
At 17:39 you were trying to change your data with df.iloc[-200:] but you left your x values as df.index, which has a different length.
Thanks for the video. Super helpful! yes yes , need more videos on time series analysis on the mentioned topics. Also, can you plz analyse time series using different techniques like CNN, attention models and transformers?
i was facing probelms in creating time series dataset for lstm ,thank yopu for making it clear .
Excelente tutorial. Very clear and undertandable. Tks.
How would you expand this to predict future values rather than just predicting and checking against existing data? Additionally how do we get the actual datetime of the prediction instead of just the time steps?
View the dataset:
12:14
Another Question (@47:00): Is there any reason, why you did not use 'validation_data=(X_test, y_test)' but rather 'validation_split'?
Ah, I now realized, that you use the test data for prediction later on.
time series data need to split train ,validation, test data ? or just train ,test data are enough to train model (very useful video)
good video but can i ask why did the cnt arent converted to numpy ?
How Can I predict FUTURE values? ( model.predict(start = 'xx/xx/xx', end = 'xx/xx/xx) ???
Thanks for sharing. I was looking for a bidirectional layer example.
You are the best in this field..
Hey Venelin, thank you very much for your videos! While reconstructing and "copying" your code, I am wondering (@31:00) why you fit the RobustScaler() only on the training data and user that fitted RobustScaler afterwards for the testing data. Shouldn't we fit a new RobustScaler to use with the testing dataset?
Need to try ;) I think Venelin supposes the test dataset is like it is never encountered by model. And we need to prepare our data, as if the test dataset is never be given. Because of that, the scaling process (RobustScaler.fit()) is conducted using only the training set.
In the real world, the model won't take the new data and use it for rescaling itself. I think it is related to the continuous (or real-time) learning. In this example, the model learns one time and predicts only with learned weights.
46:00 Does batch_size=32 mean that we input 32 windows at once to our network?
Thanks for making this video. I have a question though,
1. Is there a continuity of sequences across batches. Looks like there is. It would be good if you can confirm ?
2. you are using sequence length as 24. Does this mean bptt/gradient calculation happens back till 24th time step?
3. By using Stateful =True, we can actually copy the last hidden state of previous batch to the first hidden state of new batch. So there seems to be continuity .
Is this a good idea to explore ? If I use this idea(stateful LSTM) Can I get better result if compared to without using it.?
4. bptt and sequence length are they same ? I am confused between them.
Thanks for the tutorial.
But bro, please, PLEASE - why do you use uppercase in name variables (X, Xs)?
why do you have the cnt column in the X_train matrix? Isn't this the label?
In your trainset, you had 13 features, shouldn't they be only 5? cnt t1 t2 hum and wind_speed?
Thanks a lot for this tutorial, it's very informative and helpful
thanks a lot for the video. I wanted to ask, if it's possible to use model.predict() for the data we don't have ? In such a case, what should be the arguments for model.predict()?
I'm also interested in learning how to predict FUTURE values (outside dataset). For that I want so share a small GitHub file, with some intro. to LSTM and an applied example for predicting future values (as you asked).
Here is the link file: github.com/luismgar/Predict_Future_Values/blob/master/Future_Predict_LSTM.ipynb
Main folder: github.com/luismgar/Predict_Future_Values
Inside the file you'll find the original source-code idea, and related info.
I'm not an expert, just a hobbiest trying to learn data analysis and forecasts. i hope this helps a bit.
This is very informative. Thanks a lot.
Thanks for the video. How would you change to model to predict for 24 hours into the future instead of one. I understand how to change that dataset to contain 24 steps of input, and 24 steps of output. But not quite sure how to convey that to the model.
Very helpful tutorial!
Hi, great walktrough! May the RobustScaler be the reason for your model to be not learning top points well-enough? (as it might be seeing those points as outliers and decreasing them more actually) Best,
Is it possible to create a Real Time Time Series Analysis? (Where the Time Series is updated in Real Time?)
your features array contains the output as a feature which is a massive problem
In the Preprocessing step he specifies the columns to be scaled and drops the cnt column from the training set
@@muhamedfarooq8936 That is not true. If you see, initially his data has 13 columns and then when it's passed to LSTM as input it has also 13 columns. Can you see it also?
@@alchemication He had 13 features to pass through LSTM, shouldn't they be only 5? cnt t1 t2 hum and wind_speed?
Agreed! He is predicting the column which already exist in the training dataset. He didn't drop any column he used all 13 columns in his training datasets
Really cool tutorial! Thanks!
rcParams['figure', 'figsize']= 22, 10
this line in second cell doesn't work, and my plots are small and confusing. i tried to set the figsize in seaborn parameters but it gave me en error. i can't be serching around for solution, that's why i asked here, directly.
Amazing video very well explained
Hi, where are the concepts of stationarity used that you introduced in the beginning? Also, what if there were no trends or signal in the data, can you just model it with LSTM?
Really really nice tutorial. Excellent job. Just one question. In the X_train and y_train, the target value 'cnt' was not dropped. It that a problem?
Since we are predicting the future value of cnt from the actual past values that we have at our disposal at the time we predict the next count, there is nothing wrong to include the previous value of cnt in the training data.
This tutorial is excellent.Thank you! However, I think, there's an inconspicuous mistake in the code. If I understand correctly, the arrays X_train and X_test contain not only the features but also the labels? Thus the Y-values (count) to be predicted are already included in the X-data?
In the Preprocessing step, he specifies which columns to scale ==> drops the cnt column
Ah, I see. Thank you! and sorry for my false accusation.
@@rayadagio where? i didnt see that D:
@@rayadagio where did he drop it?
I confirm that label "cnt" was not deleted from train and test nor X_train or X_test. Applying same logic than him but properly deleting labels from features dramatically increase mean squared errors for train set to 0.5, it plateaus at this quite bad score....
Hello there! Very nice walkthrough! I tried implementing this method with vibration data that I acquired from a motor. 12.000 rows or so. Unfortunately in my case, the model does not recognize the spikes in vibration at all. I tried different architectures of LSTM but the result is much the same. My goal is to predict when the next spike will occur. Now I do not know what path to take...get more examples to train? Change the LSTM architecture(add more LSTM cells or layers?), increase the epochs? I know that is difficult to advise me, hence you do not have the dataset but it should not matter so much. It doesn't have a upward or downward trend, it has roughly the same mean and standard deviation, but it does present seasonality. I chose to do this with LSTM especially because LSTM does not care if the data is stationary or not....should I use a VAR or ARIMA model instead?
Hi Venelin, I have a question. During test period, we won't be having the true value. So we can't create sequences. Isn't it correct to append the forecast to the sequences as we predict one value at a time?
Why do you supply the whole "train" dataset as first argument of "create dataset" function? Should you not be removing the train.cnt from this dataframe first? What am I missing?
I think the same way @Venelin Valkov
Great video!
Could you please show the code of how to add a second layer (another bidirectional LSTM) to see if you can improve the performance. Thanks !!
How to include the other variables like is_holiday, winds peed etc to improve predictions using lstm
Thanks, a lot... If we have same type of data, and we are given 'Bike ID' as one of the feature and we have say 10 bikes, so for each unique bike ID there will be data from 2015 till 2017. In that case what should be our approach and how we should aggregate or work on the data?? Please help me..
Can't help but think that your random seed is a reference to the hitch-hikers guide to the universe
Just like we import numpy as np it's just a convention that everyone does.
Surely whoever started using 42 intended the reference now people just blindly use it because everyone else does lol.
Why didn't you use train_test_split to separate the data?
I saw many videos about time series forecasting. Could you predict 1 week forward? Any market, hourly time frame..
This is too good to be true...
Nice tutorial! However, why you do not shuffle the data while training? I thought that the sequence data itself has to be in time order, but the sequence samples can be shuffled? Or do the sequence samples has to be also in order?
Sequence inside batch must be respected for time series. If stateful = Yes dont shuffle the order of batches otherwise you could shuffle the order of batches if you want
Thanks for sharing, so helpful.
Great stuff, thanks!
How to choose the number of epochs and the size of batches so that we have a high accuracy?
Trial and errors.
Those are hyperparameters and according to the book "Neural Networks from scratch in Python" adjusting hyperparameters is done by hand, and it takes time.
That's why It takes so long to train NN.
Btw I am kind of afraid of how much time I will have to spend on training my own NN when it takes my PC to train 3 hours with 10 epochs and 468 steps in each epoch on MNIST_fashion dataset, which consists of 60 000 training samples xD ( it's the example from the said book)
What can be done to capture the extremes? I'm struggling with that..
Any suggestions....
Hi Venelin, why you have subtracted timesteps from length of training rows? please let me know, I am not able to convince myself :)
Because it is iterating with a window.. [i: (i+time_steps)] .. so last ith value will iterate from len(X)-time_steps till len(X).. else you will get out of index error
can i use the same model for wind speed forecasting, with the data given in the same file?
Здрасти, пробвал ли си LSTM за предсказване на ETF или акции? Интересно ми е ако да, каква акуратност ти се е получила.
Аз онзи ден пробвах с друг подход да предиктна цената на среброто за следващ ден, обаче като резултат върху тестинг данните ми излезе 50% точност, което си е чиста ези тура :).
great tutorial, thanks
Hi Venelin! Great video! I am working on a project, just for fun because i want
to get better at deep learning, about predicting sales prices on auctions
based on a number of features over time and also the state of the economy,
probably represented by the stock market or GDP. So its a Time Series prediction project.
And i want to use transfer learning, finding a good pretrained model i can use.
Do you have any idea about a model i can use?
Superb video
Thank you very much, very nice video! What if my test data have no label? Maybe I could use the last "n" values to predict the first step and then use the predicted value (and n-1 previous ones) to feed the model and predict following one (and so on so forth). What do you think?
Why did you use 128 units for your lstm? Thanks for the video!
units = dimension of output (hidden layer). I guess it's problem-specific number. Probably taken from literature, where it was tested by trial-and-error method or sensitivity analysis. If I'm wrong then I would like to hear correct answer from someone in the future :)).
This model is only predicting one time step into the future at a time? How can I modify this model to predict 13 time steps into the future?
www.tensorflow.org/tutorials/structured_data/time_series
This might help
@@madhavjariwala4548 I tried this, but I was not able to interpret the results. I mean, get the actual forecasted values back and calculate r-square or plots. Please, if you have a code that does so (especially, many inputs, multistep output), share it with us. Thanks.
Why all the tutorials say predict, but do it on already known results? It's more like to build a model that as close to KNOWN results as possible. Prediction is different from forecasting I guess.
You generally train a model with 90% of the data you have, then test it on 10% it never has seen.
How do you incorporate tensorboard?
Thanks mate very nice tutorial. Just a small advice please increase the font size next time.
Nice video, Thanks
cool. but how to expand predict to future?
it is, really, perfect!
why did you fit only some of the columns and not all?
That's why it's called training. Please search train and test data
Nice! Thanks! Еваларка за туториала! But how about if we needed to predict more than one step ahead? Say 10 steps? Do we need a 10-neuron output layer and a Nx10 shaped y_train? За едни цени на акции питам :)
why 13 features ?
another thing, why you didn't use fit_transform method all at once instead you did separately. i know both ways are legit, i'm asking because i'm not sure myself and I'd like to confirm that fit_transform is much shorter way to perform transformation and fit at the same time.
how to plotting predict axis x datetime
Can you make one with stock prices?
Why was there no activation func here,iam new so can anyone plz explain?
Thanks a lot!!! You Rock!!
And I pretty much like your accent :'D
How do I measure the accuracy of my model? Code for that?
You monitor the model against the future (I.e what you’re trying to predict)
Time Series Prediction uses only time. But you are using multi variate model. Am I wrong?
@@pjaynem191 what a shame for me! Exponential smoothing and ARIMA are method for time series analysis and they using ONLY TİME. In this video, He is using time series as variable and using many variable. İs it mean time series prediction????
Bravo :)
Самое ценное, что узнал - это как задать biderection in lstm ))
GOD
T-series prediction
Сташионарити, епта)
Думал, что послышалось)
Please pronounce "Have" correctly.
Fantastic! thank you man!