Neural Transformer Encoders for Timeseries Data in Keras (10.5)
ฝัง
- เผยแพร่เมื่อ 15 ต.ค. 2024
- In this video we see how the encoder portion of a transformer can be used to predict timeseries data.
Code for This Video:
github.com/jef...
Course Homepage: sites.wustl.ed...
Follow Me/Subscribe:
/ heatonresearch
github.com/jef...
/ jeffheaton
Support Me on Patreon: / jeffheaton
you literally saved my life (the only tutorial in transformers for time series)
Jeff even if you win the lottery or figure out the bitcoin we need you to keep on teaching us please
Aw, thanks!
Fantastic video, congrats. I understand you don’t need positional encoding, but I think it should be more complete, and is a very important part for more real and deeper examples.
Absolutely loved you video: short, concise, to the point. I am viewing your video, because I am preparing for a proposal defense, and one of the questions I am trying to answer is whether or not RNN-LSTM approach for time series prediction is better than Transformers. I would appreciate if you point me in a right direction of where I could find such information. Thank you!
What did you find out?
Hello Jeff. Must say I sure am glad you are creating this great content. Hope you get to chill in a beach somewhere as well.
Thanks for the amazing content Jeff, can you please let us know how we can incorporate the position embedding as part of this architecture.
Great video. I understood it in only 8 minutes. Other videos are like an hour long
Hi Jeff. Thank you so much for your amazing videos! On your prior transformer video you mentioned the importance of positional encoding but I notice that it isn’t built into this time series model where I’d imagine the relative position is important to accurate prediction. Is it already baked into Keras multi headed attention component?
I agree that this is a questionable omission from the design. I suspect it would impact performance.
This channel was a great discovery! Thanks a lot for all you share, Jeff
This is awesome Jeff!! As always, thank you so much.
Dear Jeff somthing confused me. If have here a univariate feature of sonspot, why head size of transformer is 256. I meant head sized should not be amount of feature here? Please explain
Thank you so much! Your material is amazing!
Is it normal that the MultiHeadAttention layer in keras is really, I mean, really slow? I have checked the sample model of transformer for time series prediction in keras documentation and just that layer, makes the model work like 7 minutes per epoch instead of the 2 seconds I get if Iremove the multiheadattention layer..... Is it because a poor implementation, or because the multihead algorithm is THAT complex no matter what you do? I'm using a gpu for the training (rtx 2070).....
Hi can you help me? What about building the decoder part? I want to do forecasting using transformers in keras but I could not find any documentation I will be thankful if you can help me
Thanks for your explanation. It seems that for time series prediction, you only need a transformer encoder, and don't need the transformer decoder part , is that right? How to predict multiple steps?
Hi Prof why is decoder not required in time series prediction? Thks so much
Yes this was useful to me. Thank you for sharing.
Thank you Jeff! Question, can this be used for text (non-numeric) sequences? For example, pizza observed sequence of events 🙂 {Dough Sauce Toppings Cheese Bake Cut Box Deliver}. We prompt Dough Sauce Toppings Cheese .... we should get bake, not Cut Box Deliver. Thank you!
Thank you for the nice video. I have a question. When using the function to_sequences(), you discarded first x observations where x = sequence length, right? So if we choose sequence length = 100, we will discard the first 100 data points for both train and test sets. Is there any way to keep those data points? Thank you
The loop he has in line 7 of that cell is written to create a range of 0 --> end of the sequence - length 100. No data should be lost here.
Hey Jeff
I want to ask u something regarding the code .
I am facing error in the building n training model part .
And error says that tensorflow.keras.layers has no attribute LayerNormalization .
Pls help me with this .
Thanku so much
What about the decoder part ?
Thanks for the well explained video
Can we have the decoder part
Hey Jeff, can you comment on your validation loss being significantly lower than your training loss? This intuitively makes no sense to me. I've seen this come up in the past and it wasn't a big issue but I can't find a satisfying explanation for it.
I agree it is odd. It is often a side effect of dropout. This captures it pretty well. towardsdatascience.com/what-your-validation-loss-is-lower-than-your-training-loss-this-is-why-5e92e0b1747e
@@HeatonResearch Thank you!
I am in the process of building a transformer to be able to forecast a time serie of sales of differents stores I have about 500 timeseries, in my case I am trying to predict the sales of 3 months in my case what do you think should change with the implementation shown in this video any advice would help, Thanks.
fantastic!
you would still be teaching this course. The beach gets boring after about a month
i think you missed the position embedding layers
yes, I dont see the position embedding either
@@saeedrahman8362 actually, I’m wondering if we really need positional encoding layers for signal study.
what if I have multiple time series as input
do you have any idea of multiple time series? thanks a alot for your reply
What do you mean by multiple time series?
@@knowledgelover2736 Wan't to input MxN matrix instead of 1xN
Sinkwince...
It took me a while to figure out that a sinkwence is just a sequence
You don't seem confident in what you're telling us and the decoder is still missing. Did you steal the code from somewhere?!