Time Series Forecasting with Xgboost

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ต.ค. 2024

ความคิดเห็น • 58

  • @abhinavkhushraj5487
    @abhinavkhushraj5487 3 ปีที่แล้ว +10

    Great video. Hands on code but each step explained really well. You can be a good teacher!

  • @shivam_dxrk
    @shivam_dxrk 5 หลายเดือนก่อน

    The Best Creator for DS I've found, Thanks a lot!

  • @arianvc8239
    @arianvc8239 3 ปีที่แล้ว +2

    Great video! Thank you!
    I look forward to see what you do with prophet.

  • @DrJohnnyStalker
    @DrJohnnyStalker 2 ปีที่แล้ว

    you can have a custom objective function to optimize for mae.
    def mae_loss(y_pred, y_val):
    grad = np.sign(y_val-y_pred)*np.repeat(1,y_val.shape[0])
    hess = np.repeat(0,y_val.shape[0])
    return grad, hess

  • @user-pt3gb8dt2p
    @user-pt3gb8dt2p 3 ปีที่แล้ว +2

    Great video Ajay. In regards to your comment on shuffling the training set leading to data leakage, I don't think that would be an issue. XGB isn't regressing on the sequence of prior values, only the 2 features based on past data you are giving it. It will would still predict the same values (assuming random seed is fixed to ensure xgb runs are deterministic) regardless of the order of the table. AFAIK this is only problematic in traditional TS models. Let me know if I'm way off base, interested to hear your thoughts.

    • @CodeEmporium
      @CodeEmporium  3 ปีที่แล้ว +2

      Great point. My main concern here the training set can still pick up on trend even though every sample is technically independent of each other. For example, say in January 2020, there was a major policy change that affected order volume of the restaurant. The 2 features for any sample post January 2020 will be influenced by the policy change. But those before January 2020 would not have been. Shuffling can thus make out model perform better than it actually would have (since it has seen the affects of the policy change in the form of those 2 features) when it probably shouldn't have been able to do this

    • @abhirama
      @abhirama 2 ปีที่แล้ว

      @@CodeEmporium Can we not just construct a new feature out of this? policy_change = 0/1?

  • @VatsaVIPL
    @VatsaVIPL 2 หลายเดือนก่อน

    Very articulate and clear. Great

  • @0xsuperman
    @0xsuperman 2 ปีที่แล้ว +3

    Can you also add a feature such as number of orders from last week (autoregressive)? And if you do that, should you use the actual value from last week, or the predicted values within the test set? If you use actual values, does it consider data leakage?

  • @kabeerjaffri4015
    @kabeerjaffri4015 3 ปีที่แล้ว +1

    Just in time also if u want please document time-series databases

  • @richarda1630
    @richarda1630 3 ปีที่แล้ว +3

    If I do this, I'm going to get hungry :P give me some chicken Tikka Masala or some mutton :D

  • @zakiakmal85
    @zakiakmal85 2 ปีที่แล้ว

    Really loved your explanation. Thanks. Also, I can see only few Time Series related videos, would love to see more content on this particular topic.

    • @zakiakmal85
      @zakiakmal85 2 ปีที่แล้ว

      One question though. In ML approach, do we not take care of stationarity component like we do in traditional forecasting?

  • @MrRugbyferdinand
    @MrRugbyferdinand 3 ปีที่แล้ว

    Thanks for the video and the presented code, very intuitive.
    Regarding your general underprediction at the end of the presented timeframe: Instead of replacing your NaN values with zeros, you could replace them with an average of previous orders of the weeks before whereas the exact number of weeks would underly again hyperparameter optimization. Maybe that solves the issue.

  • @salaisivamal7465
    @salaisivamal7465 2 ปีที่แล้ว

    wonderful explanation. great details to understand the Boosting and how it fits for time series.

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว

      Super glad this is helpful

  • @kumarkushagra7054
    @kumarkushagra7054 2 ปีที่แล้ว

    Great Video!! How can we implement this for predicting for future values.. as in this case we need to predicted the next-to-next week sales based on the predicted sales of next week. I hope u understand my question. Thnaks!!

  • @吴吉人
    @吴吉人 3 ปีที่แล้ว +3

    can you explain how a xgboost model makes predictions later?

  • @NierAutomata2B
    @NierAutomata2B 3 ปีที่แล้ว +4

    Thanks for the educational video! I'm new to time series forecasting so I have a naive question: I see you labeled each row (weekly order count) with its next week's order count. And for each row, the features you used are [order_count_7_day, order_count_30_day]. In that case, it seems that for each row, the model only has the two features for that row to make a prediction. How can we leverage more past signals? I'm thinking like for time t, we can use the numbers all the way from [t-k, t-k+1, t-k+2... t-1]. Is that a better way? But the features for each row will have a lot of overlaps vs. the surrounding rows, not sure what's a reasonable way of feature engineering for this. Any suggestions?

    • @donnik7064
      @donnik7064 3 ปีที่แล้ว +1

      I have the exact same question! Looking forward to an answer :)

    • @MrDjRoKoLoKo
      @MrDjRoKoLoKo 2 ปีที่แล้ว +1

      More variables will not necessarily improve model performance. If adding more past instances of the target (auto-regressive terms) are informative of future values of the time series, then it makes sense to add them, otherwise it can lead to overtraining your model on spurious correlations with the new terms. EDA and iterating between different variables can shed some light into what yields the best results. ACF and PACF plots can help determine a starting point for the right amount of auto regressive terms.

  • @shwetakulkarni815
    @shwetakulkarni815 3 ปีที่แล้ว

    Great Video...how to do global XGboost time series forecasting for Multiple time series?

  • @salaisivamal7465
    @salaisivamal7465 2 ปีที่แล้ว +1

    clear and crisp.

  • @patite3103
    @patite3103 3 ปีที่แล้ว

    You're a ML guru! Great video!
    I don't understand the point of using the query function. Could you do a video on this topic?

    • @gazergaming1248
      @gazergaming1248 2 ปีที่แล้ว

      this is a little late but in case you're still wondering - the idea behind the query function is to make code that is more adaptable to what a typical data scientist' data and environment would look like. Most companies primarily store their open data throgh SQL (or other databases) not in the form of downloadable csv or excel files. Accessing individual files like that is overall pretty inefficient and would result in a messy notebook that would mess up if any files are moved around on your personal hardrive. It also then makes the code dependent on that hardrive, which makes it so your notebook or code is not cross compatible with other devices or other employees who might want to work on it or take it over from you. It's an overall better practice.

  • @pratiknarkhede1287
    @pratiknarkhede1287 2 ปีที่แล้ว +1

    We didn't predict for next week's ,you just tested your predictions on test data.
    If I want to predict next week's orders how can I do that ?

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว +1

      Nice question. You get the number of dales that happened 7 days ago till today, 90 days ago till today and pass through the model to get next week's order count projection

  • @nara.titan28
    @nara.titan28 ปีที่แล้ว

    Hello!!! Where is the case 2 with the prophet model? Can you share me the video :)

  • @joapen
    @joapen 3 ปีที่แล้ว

    very nice explanation, many thanks for the video!!

    • @CodeEmporium
      @CodeEmporium  3 ปีที่แล้ว

      Anytime. Thanks for watching

  • @philwebb59
    @philwebb59 2 ปีที่แล้ว

    You need to increase your volume. The commercials are much, much louder than your video.

  • @sabinkhdk
    @sabinkhdk 3 ปีที่แล้ว +1

    Great video. How do we use this model to forecast beyond 2019-12-06?

    • @shivangiraj9822
      @shivangiraj9822 3 ปีที่แล้ว

      Yes same question

    • @MrDjRoKoLoKo
      @MrDjRoKoLoKo 2 ปีที่แล้ว

      One solution is to use your forecasted values, as inputs (Xs) for the next batch of forecasts and so on. That being said, predictions have errors, so this will yield a noisy and likely inaccurate forecast if this extrapolated for a lot data points into the future.

  • @tobiasmuenchow9884
    @tobiasmuenchow9884 11 หลายเดือนก่อน

    How can i improve the accuracy? I want to add things like holydays, days where people get their salary and other factors. Is this possible with xgb?

  • @ccuuttww
    @ccuuttww 3 ปีที่แล้ว +2

    Hey can we do cross validate in this case?

    • @MrDanituga
      @MrDanituga 3 ปีที่แล้ว

      You can but you need to be careful to not include past data in the validation set. You should divide it having time in consideration

  • @emmanuel3047
    @emmanuel3047 3 ปีที่แล้ว

    Will it not be easier to create new features by lagging the labels plus how do you predict future values?

  • @deepanshudashora5887
    @deepanshudashora5887 3 ปีที่แล้ว

    It is clear mr. .....

  • @riosaputra2979
    @riosaputra2979 2 ปีที่แล้ว

    Hi, I checked the data of the restaurant order and I found that the last order is only up to 03/08/2019 (3rd of August 2019). I couldn't find anymore data beyond 03/08/2019, but how come the last date on the daily number of order graph is 2019-12-7 (7th December 2019).
    Also on the weekly number of orders the last date is 2019-12-02 (2nd December 2019)
    Is there any mistake on the calculation of datetime or timeframe in the pandas sql? Thanks

  • @adityarajora7219
    @adityarajora7219 2 ปีที่แล้ว

    What do you do for living?

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว

      I'm just your friendly neighborhood Data Scientist :)

    • @adityarajora7219
      @adityarajora7219 2 ปีที่แล้ว

      @@CodeEmporium haha : ), still I wanna know [-_-]

  • @sooryaprakash6390
    @sooryaprakash6390 3 ปีที่แล้ว

    I don't think this can be used for multi-step forecasting. Am I right or is it possible?

    • @CodeEmporium
      @CodeEmporium  3 ปีที่แล้ว +1

      You're gonna have to add a categorical variable that signifies the number of days out you want to forecast. So it doesn't do multistep forecasts in the traditional sense, but you can hack your way around it

    • @siddhant17khare
      @siddhant17khare 2 ปีที่แล้ว

      @@CodeEmporium So does it mean, there will be a 3rd feature in the training dataset : No. of days to forecast ?
      Can you please elaborate a bit more on the hack you mentioned here

  • @factsschoolofficial
    @factsschoolofficial 2 ปีที่แล้ว

    Dude how to extrapolate the future data?

    • @CodeEmporium
      @CodeEmporium  2 ปีที่แล้ว +1

      If you have features like in the video with "num orders last 7/30 days". Then you compute the number orders there were in the past 7 days ago from today. Also compute the number of orders in the last 30 days (also from today). These are the features you pass into the model to get results for the future.

    • @siddhant17khare
      @siddhant17khare 2 ปีที่แล้ว

      @@CodeEmporium Thanks for the explanation. However, this will only help us predict orders on t+1 day, right?
      How do we do it for t+2, t+3 ...t+n days?
      Does it mean, there will be 1 model each for each time step ?

    • @MD-uy5bo
      @MD-uy5bo 2 ปีที่แล้ว

      @@siddhant17khare we have to use walk forward validation so that forecasts value get added to training datasets for further forecasts

  • @midnight6371
    @midnight6371 3 ปีที่แล้ว

    0:00 hi sentdex

  • @WonPeace94
    @WonPeace94 ปีที่แล้ว

    please talk next time without this strong accent /s