Time Series Forecasting with XGBoost - Advanced Methods

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • This video is a continuation of the previous video on the topic where we cover time series forecasting with xgboost. In this video we cover more advanced methods such as outlier removal, time series cross validation, lag features, and a bonus feature!
    Check out part 1 here: • Time Series Forecastin...
    The notebook used in this video here: www.kaggle.com/code/robikscub...
    Timeline:
    00:00 Start
    01:05 Outline
    02:20 Outlier Removal
    04:25 Time Series Cross Validation
    10:15 Lag Features
    13:15 Training Cross Validation
    14:52 Predicting the Future
    20:09 Bonus!
    Follow me on twitch for live coding streams: / medallionstallion_
    My other videos:
    Speed Up Your Pandas Code: • Make Your Pandas Code ...
    Speed up Pandas Code: • Make Your Pandas Code ...
    Intro to Pandas video: • A Gentle Introduction ...
    Exploratory Data Analysis Video: • Exploratory Data Analy...
    Working with Audio data in Python: • Audio Data Processing ...
    Efficient Pandas Dataframes: • Speed Up Your Pandas D...
    * TH-cam: youtube.com/@robmulla?sub_con...
    * Discord: / discord
    * Twitch: / medallionstallion_
    * Twitter: / rob_mulla
    * Kaggle: www.kaggle.com/robikscube
    #xgboost #python #machinelearning

ความคิดเห็น • 307

  • @hasanovmaqsud
    @hasanovmaqsud ปีที่แล้ว +150

    Believe it or not, but is potentially the best tutorial about time series forecasting out there. Definitely worth attention. Please keep up the good work👍

    • @robmulla
      @robmulla  ปีที่แล้ว +20

      Wow, that means a lot to me! Glad you found it so helpful. I have no plans of slowing down any time soon!

    • @yassssssssssss
      @yassssssssssss ปีที่แล้ว +3

      I second what @Maqsud Hasanov. Thanks for sharing and more about TS please 😁

    • @GeorgeCherian7
      @GeorgeCherian7 9 หลายเดือนก่อน

      😊

  • @al_s_
    @al_s_ ปีที่แล้ว +9

    These are such good videos Rob. You cover so much material quickly and clearly, with straightforward language that a novice like me can understand. Much appreciated 👍

  • @lashlarue7924
    @lashlarue7924 6 หลายเดือนก่อน +5

    Hands down the best how-to I have yet seen. THANK YOU.

  • @amedeotalks
    @amedeotalks ปีที่แล้ว

    Every single vids are inspiring, helpful, and informative. You are wonderful. Thank you so much for everything man.

  • @gabrielmoreno2554
    @gabrielmoreno2554 ปีที่แล้ว +2

    Nice to see you made a second part to your video. Awesome job!!!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks. Glad you liked it!

  • @TimelyTimeSeries
    @TimelyTimeSeries 4 หลายเดือนก่อน +1

    Thank you for showing how to train a forecasting model with cross validation. I've never truly understood it, until I saw your video. I'll apply it to my own projects!

  • @rohitvenkatesan1895
    @rohitvenkatesan1895 ปีที่แล้ว

    I am glad I found your channel, every morning before I start my work (Work From Home Days) I watch at least one of your videos and damn! my productivity and skills have improved by a lot (especially pandas data pipelines) at work. Thanks Rob! Keep up the good work!

  • @lolmatt9
    @lolmatt9 4 หลายเดือนก่อน

    Really good tutorial. Easy to follow but also quick and succinct. Thanks!

  • @welverd
    @welverd 19 วันที่ผ่านมา

    Man, thanks for the content. This gave me real good insights.

  • @EndikaMT
    @EndikaMT 6 หลายเดือนก่อน

    I am currently dealing with timeseries and this is one of the best videotutorial in youtube regarding this topic. Amazing work, thanks!

  • @DataDeepDive-yh4rf
    @DataDeepDive-yh4rf 3 หลายเดือนก่อน +1

    Great video, thanks Rob!

  • @hasanovmaqsud
    @hasanovmaqsud ปีที่แล้ว +1

    Oh, thank you very much, Rob ! Thanks for answering the inquiry about time series cross validation! It makes things much clearer now! God bless you, man!

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Glad you found it helpful. Time series validation can become more of an art than a science especially when you have non-stable trends. Time series is hard!

  • @davoodastaraky7608
    @davoodastaraky7608 ปีที่แล้ว +4

    Amazing work as always. Your contents are super helpful. Please keep making videos. Thanks for spending your time to make these videos.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks, will do! I have ideas for more videos coming out soon!

  • @mpfiesty
    @mpfiesty 7 หลายเดือนก่อน +2

    Love your teaching style. Practical and to the point, it makes it really easy to understand these features.
    My mind is exploding with ideas on how to apply this.
    I would love to see this used but across multiple categories in a data set. For example, for financial data, creating predictions for COGS, expenses, revenue, assets, liabilities, etc., even adding future data like a large asset being purchased or an acquisition of some sort to create financial statements for years to come.
    Thank you so much for the video!

  • @user-gg6tr9ic5d
    @user-gg6tr9ic5d 2 หลายเดือนก่อน +1

    This is gold! Thank you :) This validates my approach for my current time series project.

    • @robmulla
      @robmulla  2 หลายเดือนก่อน

      Glad to hear that. Thanks for commenting.

  • @mangeshmehendale4139
    @mangeshmehendale4139 4 หลายเดือนก่อน

    This video is, in equal parts, fascinating and concerning. Fascinating with how easy you make it look. Concerning because there is nowhere to hide in terms of how far i need to go.........

  • @CarlosReyes-ku6ub
    @CarlosReyes-ku6ub ปีที่แล้ว +1

    I'm so glad I just discovered you, keep the great work.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Glad you found me too! Let me know if you have any feedback and share my videos anywhere you think others might appeciate them.

  • @nicholasbeaton2940
    @nicholasbeaton2940 ปีที่แล้ว +1

    Love your explanations. Thanks!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks so much for watching. Share it with anyone you think might learn from it!

  • @PandemicGameplay
    @PandemicGameplay ปีที่แล้ว

    Buddy your videos are excellent, good stuff.

  • @jakstrike1
    @jakstrike1 ปีที่แล้ว +2

    Great vid man! Perfect intro for someone with other ML experience.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you enjoyed it! Thanks for leaving a comment.

  • @danielleshelton6706
    @danielleshelton6706 หลายเดือนก่อน +1

    As a person making a career change from an entirely different industry I really appreciate your videos. Finishing up class, with your help! Will be back to learn on my own this summer...thanks again!

  • @yBlade05
    @yBlade05 ปีที่แล้ว +2

    Just watched both parts, and I have got to say this was a very good tutorial.

    • @robmulla
      @robmulla  ปีที่แล้ว

      So glad you found it helpful. Please share it anywhere you think people might find it helpful.

  • @niloufarmsv3815
    @niloufarmsv3815 ปีที่แล้ว

    Thank you so much for your awesome and clear tutorials. I have learned a lot!

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Great to hear!

  • @JuanManuelBerros
    @JuanManuelBerros ปีที่แล้ว +1

    Amazing vids Rob!! I'm carefully following part 1 & 2, learning lots of minor tricks along the way, love them. Quick question: any reason why the XGBRegressor objective is "reg:linear", and not just the default? Other params seem non-defaults as well.

  • @PG-iq6zv
    @PG-iq6zv ปีที่แล้ว +1

    This is pure G O L D ! You deserve more views and subs. Subbed!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks so much for the feedback. Just share it with 99k friends and have them subscribe 😉

  • @massimothormann272
    @massimothormann272 ปีที่แล้ว +5

    Once again: great video, thanks a lot!
    I would love to see you showcasing some more advanced feature engineering and hyperparameter tuning. Where to start? Which rule of thumbs do you use (for example for the starting hyperparameters)? What do you look for while tuning, etc.
    However: can't wait to see your next video!
    Have a nice day :)

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Great suggestion! I have considered a xgboost tuning guide however it really depends on the dataset. There are some packages out there that can help automate that process. For feature engineering you will be limited for a dataset like this because we only have the target value, fun feature engineering would come if you also had other correlated features (like weather forecasts, etc).

  • @kristinaarsova5946
    @kristinaarsova5946 4 หลายเดือนก่อน

    Really nice video indeed, I am currently using the XGBoost model for time sries prediction of water consumption and it does better job than the ARIMA family so far. Thank you for sharing how to continue with the prediction, this is really hard to find anywhere as info. Great videos and keep up with the good work, i will definately follow :)

  • @TylerMacClane
    @TylerMacClane ปีที่แล้ว

    Thanks a lot
    As always, great video
    👾

  • @vadimshatov9935
    @vadimshatov9935 ปีที่แล้ว +2

    You saved a huge amount of my time. Thank you so much

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks Vadim. Glad the video helped you out.

  • @suabsakulgururatana9151
    @suabsakulgururatana9151 ปีที่แล้ว +1

    Million thanks for your tutorials... Superb 👍👍

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      My pleasure 😊

  • @elahe7702
    @elahe7702 ปีที่แล้ว

    Thanks a lot for making the second part of the video.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for watching. Maybe there will be a part 3!

  • @deepakramani05
    @deepakramani05 ปีที่แล้ว +1

    Excellent video. Thank you so much.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you found it helpful, Deepak!

  • @ZINALOUDJANI-uk9ft
    @ZINALOUDJANI-uk9ft 11 หลายเดือนก่อน

    Definitely, your tutorial is the best

  • @dagobertocifuentes6845
    @dagobertocifuentes6845 ปีที่แล้ว +1

    What an awesome explanation!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for watching! 🧐

  • @kennethodhiambo1803
    @kennethodhiambo1803 4 หลายเดือนก่อน +1

    Hi Rob. Thank you for this. You are very good! Regards from Nairobi, Kenya.

    • @robmulla
      @robmulla  4 หลายเดือนก่อน +1

      Glad you liked it b!

  • @Arkajyoti
    @Arkajyoti ปีที่แล้ว +6

    Very nicely done. Crisp explanations, optimum amount of details and plenty of crumbs to follow should one care to deep dive into specific areas. This is the second video on forecasting and I'd love to see more videos on the series. How about a tutorial on forecasting with RNN/LSTM?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Glad it was helpful! I try to make the videos less fluff but sometime I worry if I actually strike the right balance. I definately have plans for a RNN/LSTM forecasting video although I am not a big supporter of using them in applications like this. Also thinking about making a video about something like facebook's prophet model.

    • @felixakwerh5189
      @felixakwerh5189 ปีที่แล้ว +1

      @@robmulla waiting for the fb prophet model tutorials

  • @THE8SFN
    @THE8SFN ปีที่แล้ว +1

    please continue making these great tutorials

    • @robmulla
      @robmulla  ปีที่แล้ว

      I will if you keep watching them!

  • @sami3592
    @sami3592 ปีที่แล้ว +1

    Thanks a lot. Very fluent and Good explanation.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you liked it! Thanks Sami.

  • @nurlannurmash4155
    @nurlannurmash4155 ปีที่แล้ว +1

    Awesome. Thanks for this tutorial.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for watching!

  • @Pedrommelos
    @Pedrommelos ปีที่แล้ว +8

    The part 1 and 2 are such a tremendous job. I managed to do some forecasting and nowcasting for european inflation using your guidance. You should consider creating a course. You are excelent!!

    • @robmulla
      @robmulla  ปีที่แล้ว +5

      Really apprecaite that feedback! Still planning on a few more for this series.

    • @Pedrommelos
      @Pedrommelos ปีที่แล้ว

      @@robmulla I'm hungry for more! Thank you for sharing this knowledge with us!

  • @massoudkadivar8758
    @massoudkadivar8758 11 หลายเดือนก่อน +2

    Great job!

    • @robmulla
      @robmulla  11 หลายเดือนก่อน

      Thank you! Cheers!

  • @abdogassar9246
    @abdogassar9246 ปีที่แล้ว +1

    Great job, thank you so much.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      You're very welcome! Thanks for warching.

  • @prometeo34
    @prometeo34 ปีที่แล้ว +2

    Rob, this video is the reason that I am learning python, I am a hardcore R person, but this is amazing. Great work!

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I love this! Glad you are starting your python journey here!

  • @titiQd
    @titiQd ปีที่แล้ว +1

    please keep making video tutorials as your videos help beginners like me (career changers) to finish their projects or assignment. i already follow your kaggle and YT.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I will try my best! Thanks for the feedback and glad to hear you've found them helpful.

    • @titiQd
      @titiQd ปีที่แล้ว

      @@robmulla can you make tutorial regarding this forecast or any forecast compare between XGBoost and ARIMA please as follow up series for this Time Series Project.

  • @rod653
    @rod653 11 หลายเดือนก่อน +1

    This video kicked ASS! thanks rob i loved it can't wait for more.

    • @robmulla
      @robmulla  11 หลายเดือนก่อน

      I apprecaite that. Check out my other content and share it on any platform you think people might learn from. Have you already watched part 1?

    • @rod653
      @rod653 11 หลายเดือนก่อน

      @@robmulla I did and i must say a second part was needed.
      there's stil a lot i don't know.
      im learning how to use pycharm to apply to the tensorflow certification just to flex on my homies see you at the peak! 😎

  • @hasijasanskar
    @hasijasanskar ปีที่แล้ว +4

    Part 2!! Lets goo 🚀

  • @charlesnwevo2706
    @charlesnwevo2706 ปีที่แล้ว +1

    Your videos are always explained with such clarity, well done. I was wondering if we could predict the future using the method you implemented here in the model from the previous video? Thank you.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Really appreciate that feedback. We use a fairly similar model from the previous video in this one. Just validation pipeline is different.

  • @poonsimon670
    @poonsimon670 ปีที่แล้ว +1

    Great Tutorial.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad it was helpful!

  • @business_central
    @business_central ปีที่แล้ว +31

    Great one!
    Can we have a part 3 with more model tuning, by adding weather data and other external factors impacting the energy consumption?

    • @robmulla
      @robmulla  ปีที่แล้ว +24

      I am planning on making more. I have one about the prophet model and then also working on one using LSTMs

    • @qdupontulb
      @qdupontulb 6 หลายเดือนก่อน

      When could we expect to see it ? I would Looove to continue to scale up my skills in Python thanks to your videos !@@robmulla

  • @vlplbl85
    @vlplbl85 ปีที่แล้ว +1

    Great video. Thanks

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you liked it! Share with a friend :D

  • @brandondavis9305
    @brandondavis9305 ปีที่แล้ว +1

    Solid work!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you enjoyed it Brandon.

  • @key_advice
    @key_advice ปีที่แล้ว

    great video, it would be really great if you do a continuation and show us how to upload the model and connect it to a GUI so that it can be used by everyday users.

  • @code2compass
    @code2compass 3 หลายเดือนก่อน

    İnteresting video Rob. You're a hero
    Here are my two cents on this video for people struggling with timeseries forecasting.
    1. Feature engineering is extremely essential. Make sure to get thr right features before training your data.
    2. Instead of using base features try using derived ones such as "mean, median, std, var, rolling mean, rolling std, rolling median etc."
    3. Use preprocessing to clean your data and make sure to interpolate your missing values instead of dropping them.
    4. Never mix things. Forcast trends with trends and linear with linear.

  • @user-wc8ez2hg5o
    @user-wc8ez2hg5o ปีที่แล้ว +2

    Awesome and informative work with really creative visuals. The graphs and plots were impressive and visually appealing. As a suggestion, though, you may improve the structure by adding headers to separate each section.

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks. That’s great feedback and I’ll keep that in mind in the future.

  • @abdogassar9246
    @abdogassar9246 ปีที่แล้ว +2

    Please Please! If you have time. We would like to see another lecture comparing the performance of XGBoost with deep neural network (DNN) and Multiple Linear regression (MLR) models in time Series Forecasting for energy (in the same topic).We will be very grateful to you, because there are no clear and useful lectures as you have provided, in a very easy and useful way.

    • @robmulla
      @robmulla  ปีที่แล้ว +2

      Thanks! I get asked about this a lot so I’ll definitely put it in my list of future videos to make. Thanks for watching.

    • @abdogassar9246
      @abdogassar9246 ปีที่แล้ว

      @@robmulla Thank you so much for your efforts.

  • @agustinkashiraja401
    @agustinkashiraja401 ปีที่แล้ว +1

    Hi. From argentina, excelent video. 1 and 2. Congrats!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Hello there! Glad you have you watching from Argentina. Thanks for the congrats.

  • @fardinahmadpor1225
    @fardinahmadpor1225 9 หลายเดือนก่อน

    It was great thanks 😊

  • @pablobandeira5461
    @pablobandeira5461 8 หลายเดือนก่อน

    Un genio, gracias!

  • @flel2514
    @flel2514 ปีที่แล้ว +1

    Very good!

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Appreciate that!

  • @newdata
    @newdata ปีที่แล้ว +1

    super helpful and inspiring . would be great if can see how the mysterious parameters are tuned

    • @robmulla
      @robmulla  ปีที่แล้ว

      Glad you found it inspiring. Parameter tuning isn't always worth spending a lot of time on. At least at first. You can use an auto-tuner like Optuna, once you've got your model setup and it will tune parameters automatically.

  • @a.h.s.3006
    @a.h.s.3006 ปีที่แล้ว +3

    Again. Amazing work. I want to use your tutorial at work for our interns in the future.
    Note that there is also a lag function you can use in pandas (shift), but it is based on "previous rows" rather than "previous unit of time", so it will not be as accurate as your mapping (when some rows are missing)
    Additionally, you can add any other feature that is available in your dataset (multivariate dataset), this example is purely timeseries, but if you have for example a column for "county", you could experiment with making several models per county or one big model with "county" as a feature.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Wow. That's a big honor that you want to use my tutorial to help train people. Please do it!
      Yes, I made the lag features in this way to ensure we didn't have any issues if certain rows were missing from the dataset, but you can also use something like `shift` in pandas. This dataset doesn't have detailed features but I agree it's good to add them if available.

  • @Arkantosi
    @Arkantosi ปีที่แล้ว +1

    Damn. Wish I found this guy sooner, could have spared me so many days and nights of headache. Best python teacher ever.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Thanks so much!

    • @Arkantosi
      @Arkantosi ปีที่แล้ว +1

      @@robmulla Sorry for asking something like this but a tutorial on time series forecasting with Pytorches Temporal Fusion Transformer (TFT) from you would be awesome!

  • @TheThunder005
    @TheThunder005 ปีที่แล้ว +3

    Excellent work. Keep the content coming! Build a model to predict future topics people with ask, there is an API to pull data from YT.
    But seriously, nice work, love the advanced deep dives. How i got this far was i found a TH-cam short from you explaining a topic that popped in my feed, then you had a 10 minute general knowledge video, then a deep dive series... that structure i really liked.

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Thanks for the feedback, I really do apprecaite it. Cool to hear that you first found me via the shorts. I've been meaning to make some more of those!

  • @eduardomanotas7403
    @eduardomanotas7403 11 หลายเดือนก่อน

    This is amazing! Very great video. Can you do one with other correlated features and feature importance, Or another one for pre post treatment.

  • @vzinko
    @vzinko 8 หลายเดือนก่อน +4

    When fitting the model in the walk forward validation, the 'eval_set' parameter should not include test data as this is data leakage

  • @user-vl7oq5wy4f
    @user-vl7oq5wy4f 2 หลายเดือนก่อน

    Hey Rob, thank you so much for this great video! It's helped me a lot! If you could update the code file to include the holiday feature, that would be fantastic!

  • @user-im6pc9qe6e
    @user-im6pc9qe6e ปีที่แล้ว

    Thanks for your wonderful tutorials! I have some questions; 1) is it possible to put other features kind of building physical factors(because target feature is building energy use)? 2) if it is possible to add other features, is it possible to make PDPs(partial dependence plots) or SHAP? GBDT or XGBoost model can show the feature importance and PDPs to provide the relationship between target feature and input feature. But I don't know it is possible in this model.

  • @alexkychen
    @alexkychen ปีที่แล้ว +1

    Great tutorial! Learned a lot from these time series forecasting videos. Could you please also make a video on how to detect data anomaly with this time series data? Such as using the trained model to detect possible outliers of energy use in your example data set. Thanks!!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks. Good idea. Usually abnormalities on data like this are assessed by checking the values outside some distance from the mean value. 3 standard deviations is typical for a true outlier. It depends what data you want to subset to (like a certain month or hour) to determine what mean and std to use.

    • @CharlesDibsdale
      @CharlesDibsdale ปีที่แล้ว

      @@robmulla Using mean assumes a symmetric normal distribution? - where you have a non-symetric, non-normal distribution, median and percentiles may be better?

  • @analfabetorockebens
    @analfabetorockebens ปีที่แล้ว

    top, thanks!!

  • @Aristocle
    @Aristocle 9 หลายเดือนก่อน

    11:22 The result of 364/7 is the number of week in a year. Ty for this video.

  • @blackbke
    @blackbke ปีที่แล้ว

    Absolutely great tutorial (thanks!), but I still have questions, although they are more general questions and not particularly related to XGBoost:
    1) How to cope with data that has repetitions, like repetitions of the same date (pivot tables, but I have difficulty coping with pivots when the dataframe already has multiple features)
    2) How does XGBoost (or any other model) cope with categories in data, for example (and applied to this tutorial): what if the electricity usage data would include regions, how can you integrate those in a forecasting model? This also implies repetitions: date1 -> region1 data; date1 -> region2 data; date1 -> region3 data, etc...
    I'll eventually find out how to work with this, but if you wanted some input for a follow up video, here it is :D

    • @user-sx7gr2jv3i
      @user-sx7gr2jv3i 11 หลายเดือนก่อน

      I got same problems

  • @mariodelgadillo7762
    @mariodelgadillo7762 ปีที่แล้ว +1

    Incredible work! Thanks so much for sharing! Do you have in your channel a video with time series using Random Forest algorithm??

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      I have a video that goes over time series with xgboost, which is essentially a smarter implementation of a multi-tree based model.

  • @ahmedtambal2560
    @ahmedtambal2560 ปีที่แล้ว +1

    Great tutorial and wonderful job , I wish you all the best + Thank you for your work
    I have a question ...
    what does lag1, lag2 and lag3 represent ?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Thanks for watching. The lag variables represent what the target value was in the past. I just called them lag1, lag2 etc. But if you watch at 12:19 when I create them I create them for a certain number of days in the past. Hope that helps.

  • @Depaxa
    @Depaxa 6 หลายเดือนก่อน

    Thanks

  • @XartakoNP
    @XartakoNP ปีที่แล้ว

    Great video. I think you should've explained that the purpose of doing cross-validation is to do hyper-parameter tuning and/or change your features to improve your results

    • @robmulla
      @robmulla  ปีที่แล้ว

      Great point! I thought that was explained but maybe I wasn't clear enough about it.

  • @thibautsaah3379
    @thibautsaah3379 ปีที่แล้ว +1

    Amazing work Master! does XGBoost support missing values? or NaN values of lags features are replaced by zero?.

    • @robmulla
      @robmulla  ปีที่แล้ว

      XGBoost and most GBM models work fine with null values. Because it's a tree based algorithm it is able to split the null values out. However, I've found that imputing values can sometimes help.

  • @RL-ng2vp
    @RL-ng2vp ปีที่แล้ว

    Great vid. Regarding adding lags. Does it not introduce data leakage into the test data - thus will give very optimistic results but in practice will not?

  • @Alexweno07
    @Alexweno07 ปีที่แล้ว +2

    This is great! Thank you! Question: Why didn't you use a feature scaling function? Isn't it better to normalize the features for gradient descent algorithms?

    • @robmulla
      @robmulla  ปีที่แล้ว +3

      Great question. For many algorithms that is the case. However for tree based algorithms like xgboost it will just split a branch somewhere in the feature. So scaling isn’t necessary.

  • @francoli2281
    @francoli2281 ปีที่แล้ว +1

    Great video! i'm just wondering if your lagging features were like 1 hour, then can you still predict multiple steps? Thanks!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for the feedback. If you have a lag feature of 1 hour then you would only be able to predict 1 hour in the future, because beyond that you would not be able to compute a lag feature (because those lag features would not have occured yet).

  • @SedaStepanyan
    @SedaStepanyan 6 หลายเดือนก่อน

    Thanks a lot for the video. Can you please talk about stationarity check as well as model accuracy?

  • @Asparuh.Emilov
    @Asparuh.Emilov 11 หลายเดือนก่อน

    Thanks again for the great tutorial, it is a stepping stone to understand how to use XGBoost and some key techniques to apply when dealing with timeseries data. However I think there is one conceptional problem with using lagged versions of the data. Imagine you use just lag 1. During the training process you always have the actual previous value (lag 1) but when you forecast into the unknown future even at the 2nd timestamp you already don't have the actual data for the lag1 for this unknown future, hence the trained parameter is not possible to be applied in this case during all the future steps, so everything less than the forecast horizon might be misleading in my opinion since the model will just not apply the trained relationship for those lags as they don't exists in each step. Please correct me if I am wrong, I am developing a Pipeline for TimeSeries forecasting with my own data and I really want to achieve best possible outcome.

    • @OskarBienko
      @OskarBienko 10 หลายเดือนก่อน

      You misunderstood the whole concept of lagging. Try using pandas shift() method to in order to understand it.

  • @MMM-yc7xu
    @MMM-yc7xu ปีที่แล้ว +2

    I have never enjoyed time series forecasting, I failed learning it like 7 times past year, and now I’m starting again and not letting it go this time.
    Your 2 videos (40 mins) are so helpful and the way you explain things are amazing, I think if you continue the series, you would blast YT since there is no much content about it.
    You made my day tho 😂🎉

    • @kinduvabigdeal100
      @kinduvabigdeal100 7 หลายเดือนก่อน

      hey, curious to know how you've been doing with your forecasting?

    • @MMM-yc7xu
      @MMM-yc7xu 7 หลายเดือนก่อน

      Hi,@@kinduvabigdeal100
      Pretty good, I'm using time series to forecast insurance claims.

  • @michaeltownsend1459
    @michaeltownsend1459 ปีที่แล้ว +2

    I really enjoy your videos. Any chance you could do one exploring dimensionality reduction techniques for time series? Particularly non-linear techniques such as VAE

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Great suggestion! I might include it in a future video but I'd need to learn a bit more about it first. Thanks for watching.

  • @ayuumi7926
    @ayuumi7926 ปีที่แล้ว +1

    Nice video. Can we have another followup video on if incoporating additional pool of time series features that we believe that can be used as independent variables (e.g. weather series, price series), how to determine the lag to be used in those features and how to do feature selections from the feature sets?

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for the suggestions. That would be a good future video. Using additional features is tricky because you need to be sure to only use the feature values you have on the day predicting. So for weather data you actually would need to use the forecast from X days prior to the predicting date. Otherwise you will introduce a data leak into the model training.

  • @costadekiko
    @costadekiko ปีที่แล้ว +1

    Great video! A simpler way to implement the lag features would be:
    df['lag1'] = df['PJME_MW'].shift(364).fillna(0)

    • @robmulla
      @robmulla  ปีที่แล้ว +3

      Thanks for the feedback! Using the shift method can be really handy, however you need to be careful and ensure that there is no gaps in your data. The reason I did the mapping the way it's done in this video is because there are some timestamps that are missing and shifting would then make the values not align correctly. Hopefully that makes sense.

    • @costadekiko
      @costadekiko ปีที่แล้ว

      That makes sense. Indeed, I did assume that there are no gaps in the timeseries, because that step is usually done in a preprocessing step beforehand, if necessary (so that there are no gaps in general, not only for the lag features). But of course that was still an assumption from my side. Thanks for the reply!

  • @stay-amazed1295
    @stay-amazed1295 ปีที่แล้ว +1

    Excellent video,👍👏 pls make videos like TS forecasting, xgb classification model

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for the positive note. What do you mean by time series classification? Is there a data example you know of that I could look into?

    • @stay-amazed1295
      @stay-amazed1295 ปีที่แล้ว

      @@robmulla means next candle prediction based on historical cdl data upto pre.cdl whether next cdl is green then model give 1 if it's red 0. Like this forex data eurusd..... i'm newbie to py facing difficulty in building perfect model. I'm getting accuracy of 54-60 % Thanks in advance👍

  • @peterwang7774
    @peterwang7774 ปีที่แล้ว +1

    Hello, thank you for making such a great teaching video. I would like to ask if you will make a teaching video on how to use xgboost for multi-step time series prediction? Thank you very much!!!

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for watching. I'm not sure what you mean by multi-step- but if I understand it correctly, you would just re-train and predict for each new time period when you are forecasting.

  • @aaltinozz
    @aaltinozz ปีที่แล้ว +1

    Thanks for sharing amazing work. I just want to ask about features created from datetimeindex(year, month, dayofyear etc.) shouldn't we change type of them to category ?

    • @robmulla
      @robmulla  ปีที่แล้ว

      Apprecaite the feedback. You could try making these categorical, however they aren't completely ordinal, especially day of year.. The xgboost should find splits for these features in trees based on where it determines the best break points to be. But its always worth trying and seeing what performs best using the validation setup.

  • @BravePrune
    @BravePrune ปีที่แล้ว +1

    well done, absolutely incredible
    can you do one of these with a Prophet model?
    it takes away so many headaches that you were dealing with in the feature creation and cross validation

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Thanks so much. Yes, I plan to do one on Prophet and LSTMs at some point.

  • @shahryar.s
    @shahryar.s หลายเดือนก่อน

    Great video! One question - if you’re using a feature for which future values don’t exist, let’s say the stock price of a company, then how would you create the future data frame ?

  • @zericardo182
    @zericardo182 7 หลายเดือนก่อน +1

    Hi, Great video! I watched both parts and the prohet one too.
    I have a question : if I understood it correctly, I shouldn't create lag features greater than my horizon prediction window. It's not so clear to me why I shouldn't do that. Could you please share more details about that part ?

    • @ivanb.2914
      @ivanb.2914 4 หลายเดือนก่อน

      I'm struggling to grasp this concept because the example involves lag1, lag2, and lag3 even though he is forecasting 1yr into the future. For instance, on '01-01-2015,' you have information to calculate the energy consumption from three years prior (on '01-01-2012').

  • @rizkiatthoriq5925
    @rizkiatthoriq5925 9 หลายเดือนก่อน

    Thank you for your complete tutorial!
    I have tried your code on kaggle and it made a question popped up in my mind, my question is, how can the lags values in future_w_features variable can apprear in "isFutre == True" while in 'PJME_MW ' is empty?

  • @mohamednedal
    @mohamednedal 7 หลายเดือนก่อน

    Hi Rob, Great tutorial! Could you please make a tutorial on how to use Shapley values to interpret LSTM models for timeseries forecasting?

  • @rayhanabyasa5013
    @rayhanabyasa5013 หลายเดือนก่อน

    great video Rob! helps me a lot! i wanna ask something, so my project uses a lot of features (x) that the values is still unknown in the future unlike in your video that the features already has a values like the month, days of weeks, etc. so, is it impossible for me to predict the future because of the features is still unknown? cause i already try it and the result show a straight line with the same values :( Thanks in advance!

  • @fbrand
    @fbrand ปีที่แล้ว +1

    Subbed....great and thorough and practical explanation. I guess using LightGBM will be very much the same and only some parameters change?

    • @robmulla
      @robmulla  ปีที่แล้ว

      Thanks for the sub! Yes, LightGBM and XGBoost are very similar with just slightly different names for the parameters. If you use the sklearn api they are almost the same.

  • @paitosilva9459
    @paitosilva9459 ปีที่แล้ว +1

    Hi, thanks for sharing the information is very useful. I was wondering if it is possible to predict the future when the target is a categorical variable, and I don't have any quantitative variable in the dataset.

    • @robmulla
      @robmulla  ปีที่แล้ว

      That’s a great question. You can train xgboost with a categorical target variable. Just use XGBoost classifier instead of the regressor.

  • @danieleborgna9501
    @danieleborgna9501 หลายเดือนก่อน

    Hello, really helpful video but I have just a question. If I have features that are unknown in the future, what’s the best way to predict that and use that for the forecasting of the main target feature? For example, you mentioned in the last video the weather feature, but to generate the future_df you don’t know the values of this feature. Thank you!

  • @dieyu6374
    @dieyu6374 11 หลายเดือนก่อน +1

    But will the time of the lag feature be too close to produce cumulative errors? How should this be done, such as delaying the previous point but predicting the next 24 points

  • @tobiasmuenchow9884
    @tobiasmuenchow9884 7 หลายเดือนก่อน

    Your videos really helped me Unterstand ML. Perfect explanation.
    Is it possible to use a normalize flow for PLF (probabilistic load forecasting) like bayessch Model in xgb?

  • @Boy90547
    @Boy90547 ปีที่แล้ว +1

    this is very impressive. do you have the lecture for trading analysis ?

    • @robmulla
      @robmulla  ปีที่แล้ว +1

      Glad you liked it. I have a few other videos on time series you can check out.