Neural Transformer Encoders for Timeseries Data in Keras (10.5)

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ต.ค. 2024
  • In this video we see how the encoder portion of a transformer can be used to predict timeseries data.
    Code for This Video:
    github.com/jef...
    Course Homepage: sites.wustl.ed...
    Follow Me/Subscribe:
    / heatonresearch
    github.com/jef...
    / jeffheaton
    Support Me on Patreon: / jeffheaton

ความคิดเห็น • 43

  • @azizshaba9283
    @azizshaba9283 ปีที่แล้ว +6

    you literally saved my life (the only tutorial in transformers for time series)

  • @Aguiraz
    @Aguiraz ปีที่แล้ว +6

    Jeff even if you win the lottery or figure out the bitcoin we need you to keep on teaching us please

  • @pverd1
    @pverd1 2 ปีที่แล้ว +2

    Fantastic video, congrats. I understand you don’t need positional encoding, but I think it should be more complete, and is a very important part for more real and deeper examples.

  • @mikedramatologist9484
    @mikedramatologist9484 11 หลายเดือนก่อน

    Absolutely loved you video: short, concise, to the point. I am viewing your video, because I am preparing for a proposal defense, and one of the questions I am trying to answer is whether or not RNN-LSTM approach for time series prediction is better than Transformers. I would appreciate if you point me in a right direction of where I could find such information. Thank you!

    • @MrSurati_
      @MrSurati_ 9 หลายเดือนก่อน +1

      What did you find out?

  • @niaguilar1994
    @niaguilar1994 2 ปีที่แล้ว

    Hello Jeff. Must say I sure am glad you are creating this great content. Hope you get to chill in a beach somewhere as well.

  • @saeedrahman8362
    @saeedrahman8362 ปีที่แล้ว +2

    Thanks for the amazing content Jeff, can you please let us know how we can incorporate the position embedding as part of this architecture.

  • @naasvanrooyen2894
    @naasvanrooyen2894 ปีที่แล้ว

    Great video. I understood it in only 8 minutes. Other videos are like an hour long

  • @thomastran3040
    @thomastran3040 2 ปีที่แล้ว +5

    Hi Jeff. Thank you so much for your amazing videos! On your prior transformer video you mentioned the importance of positional encoding but I notice that it isn’t built into this time series model where I’d imagine the relative position is important to accurate prediction. Is it already baked into Keras multi headed attention component?

    • @liamroche1473
      @liamroche1473 11 หลายเดือนก่อน +1

      I agree that this is a questionable omission from the design. I suspect it would impact performance.

  • @LeanTaoTe
    @LeanTaoTe ปีที่แล้ว

    This channel was a great discovery! Thanks a lot for all you share, Jeff

  • @r00t_sh3ll
    @r00t_sh3ll 2 ปีที่แล้ว +1

    This is awesome Jeff!! As always, thank you so much.

  • @abdi715
    @abdi715 ปีที่แล้ว

    Dear Jeff somthing confused me. If have here a univariate feature of sonspot, why head size of transformer is 256. I meant head sized should not be amount of feature here? Please explain

  • @jairsales5501
    @jairsales5501 6 หลายเดือนก่อน

    Thank you so much! Your material is amazing!

  • @davidcristobal7152
    @davidcristobal7152 2 ปีที่แล้ว +1

    Is it normal that the MultiHeadAttention layer in keras is really, I mean, really slow? I have checked the sample model of transformer for time series prediction in keras documentation and just that layer, makes the model work like 7 minutes per epoch instead of the 2 seconds I get if Iremove the multiheadattention layer..... Is it because a poor implementation, or because the multihead algorithm is THAT complex no matter what you do? I'm using a gpu for the training (rtx 2070).....

  • @somayehseifi8269
    @somayehseifi8269 2 ปีที่แล้ว +1

    Hi can you help me? What about building the decoder part? I want to do forecasting using transformers in keras but I could not find any documentation I will be thankful if you can help me

  • @amyzimmermann5335
    @amyzimmermann5335 ปีที่แล้ว +1

    Thanks for your explanation. It seems that for time series prediction, you only need a transformer encoder, and don't need the transformer decoder part , is that right? How to predict multiple steps?

  • @dbgm12
    @dbgm12 ปีที่แล้ว +1

    Hi Prof why is decoder not required in time series prediction? Thks so much

  • @edgetrading2
    @edgetrading2 ปีที่แล้ว

    Yes this was useful to me. Thank you for sharing.

  • @TheUltimateBaccaratApp
    @TheUltimateBaccaratApp ปีที่แล้ว

    Thank you Jeff! Question, can this be used for text (non-numeric) sequences? For example, pizza observed sequence of events 🙂 {Dough Sauce Toppings Cheese Bake Cut Box Deliver}. We prompt Dough Sauce Toppings Cheese .... we should get bake, not Cut Box Deliver. Thank you!

  • @lzdddd
    @lzdddd 2 ปีที่แล้ว +2

    Thank you for the nice video. I have a question. When using the function to_sequences(), you discarded first x observations where x = sequence length, right? So if we choose sequence length = 100, we will discard the first 100 data points for both train and test sets. Is there any way to keep those data points? Thank you

    • @randomhkkid
      @randomhkkid 2 ปีที่แล้ว

      The loop he has in line 7 of that cell is written to create a range of 0 --> end of the sequence - length 100. No data should be lost here.

  • @priyankatomar6636
    @priyankatomar6636 ปีที่แล้ว

    Hey Jeff
    I want to ask u something regarding the code .
    I am facing error in the building n training model part .
    And error says that tensorflow.keras.layers has no attribute LayerNormalization .
    Pls help me with this .
    Thanku so much

  • @ButchCassidyAndSundanceKid
    @ButchCassidyAndSundanceKid ปีที่แล้ว +1

    What about the decoder part ?

  • @allalzaid1872
    @allalzaid1872 ปีที่แล้ว

    Thanks for the well explained video

  • @allalzaid1872
    @allalzaid1872 ปีที่แล้ว +1

    Can we have the decoder part

  • @isaacgroen3692
    @isaacgroen3692 ปีที่แล้ว

    Hey Jeff, can you comment on your validation loss being significantly lower than your training loss? This intuitively makes no sense to me. I've seen this come up in the past and it wasn't a big issue but I can't find a satisfying explanation for it.

    • @HeatonResearch
      @HeatonResearch  ปีที่แล้ว +1

      I agree it is odd. It is often a side effect of dropout. This captures it pretty well. towardsdatascience.com/what-your-validation-loss-is-lower-than-your-training-loss-this-is-why-5e92e0b1747e

    • @isaacgroen3692
      @isaacgroen3692 ปีที่แล้ว

      @@HeatonResearch Thank you!

  • @viktorql364
    @viktorql364 ปีที่แล้ว

    I am in the process of building a transformer to be able to forecast a time serie of sales of differents stores I have about 500 timeseries, in my case I am trying to predict the sales of 3 months in my case what do you think should change with the implementation shown in this video any advice would help, Thanks.

  • @shaheerzaman620
    @shaheerzaman620 2 ปีที่แล้ว

    fantastic!

  • @joshuakessler8346
    @joshuakessler8346 ปีที่แล้ว

    you would still be teaching this course. The beach gets boring after about a month

  • @Alan-hs2wn
    @Alan-hs2wn ปีที่แล้ว +1

    i think you missed the position embedding layers

    • @saeedrahman8362
      @saeedrahman8362 ปีที่แล้ว

      yes, I dont see the position embedding either

    • @Alan-hs2wn
      @Alan-hs2wn ปีที่แล้ว

      @@saeedrahman8362 actually, I’m wondering if we really need positional encoding layers for signal study.

  • @lennard4454
    @lennard4454 ปีที่แล้ว

    what if I have multiple time series as input

    • @pengyue6131
      @pengyue6131 ปีที่แล้ว

      do you have any idea of multiple time series? thanks a alot for your reply

    • @knowledgelover2736
      @knowledgelover2736 ปีที่แล้ว

      What do you mean by multiple time series?

    • @trueToastedCode
      @trueToastedCode ปีที่แล้ว

      @@knowledgelover2736 Wan't to input MxN matrix instead of 1xN

  • @benripka6977
    @benripka6977 ปีที่แล้ว +3

    Sinkwince...

    • @ianl5560
      @ianl5560 7 หลายเดือนก่อน

      It took me a while to figure out that a sinkwence is just a sequence

  • @SimonZimmermann82
    @SimonZimmermann82 ปีที่แล้ว

    You don't seem confident in what you're telling us and the decoder is still missing. Did you steal the code from somewhere?!