Kishan Manani - Feature Engineering for Time Series Forecasting | PyData London 2022

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.พ. 2025

ความคิดเห็น • 55

  • @lashlarue7924
    @lashlarue7924 ปีที่แล้ว +10

    Thank you. This was 43 minutes very well spent.

  • @ninjaturtle205
    @ninjaturtle205 ปีที่แล้ว +38

    thank you thank you. This information is skipped out in most machine learning courses, and no one will teach you this. In practice, a lot of data has temporal nature, while all along you only learned how to classify cats and dogs, and regress house pricess.

    • @TraininData
      @TraininData 4 หลายเดือนก่อน +1

      We do :)

  • @julien957
    @julien957 ปีที่แล้ว +12

    Genius. He Makes python and time series almost easy to understand.

  • @anirudhsharma3879
    @anirudhsharma3879 ปีที่แล้ว +1

    Amazing dump of knowledge, I have multiple times came back to this video

  • @calebterrelorellana2478
    @calebterrelorellana2478 3 หลายเดือนก่อน

    Solo un maestro explica en sencillo y de forma visual temas complejos! gracias! la mejor exposición de forecasting!

  • @HEYTHERE-ko6we
    @HEYTHERE-ko6we 2 ปีที่แล้ว +10

    This is by far one of the best wholesome videos on time series forecasting!!! loved it

    • @hp5072
      @hp5072 ปีที่แล้ว +2

      The word wholesome doesn't mean what you think it means :) Did you mean comprehensive or extensive?

    • @wexwexexort
      @wexwexexort ปีที่แล้ว

      ​@@hp5072what does it mean

  • @olegkazanskyi9752
    @olegkazanskyi9752 ปีที่แล้ว

    This is a truly useful session. Thank you for sharing the knowledge!

  • @youknowmyname12345
    @youknowmyname12345 2 ปีที่แล้ว +4

    Very good talk. The presenter is a great teacher!

  • @ChandanNayak-i8s
    @ChandanNayak-i8s ปีที่แล้ว +2

    Excellent presentation. Great work Kishan

  • @zakkyang6476
    @zakkyang6476 ปีที่แล้ว

    finally, someone can articulate this topic well...

  • @wolpumba4099
    @wolpumba4099 10 หลายเดือนก่อน +1

    *Abstract*
    This talk explores how to adapt machine learning models for time
    series forecasting by transforming time series data into tabular
    datasets with features and target variables. Kishan Manani discusses
    the advantages of using machine learning for forecasting, including
    its ability to handle complex data structures and incorporate
    exogenous variables. He then dives into the specifics of feature
    engineering for time series, covering topics like lag features, window
    features, and static features. The talk emphasizes the importance of
    avoiding data leakage and highlights the differences between machine
    learning workflows for classification/regression and forecasting
    tasks. Finally, Manani introduces useful libraries like Darts and
    sktime that facilitate time series forecasting with tabular data and
    provides practical examples.
    *Summary*
    *Why use machine learning for forecasting? (**1:25**)*
    - Machine learning models can learn across many related time series.
    - They can effectively incorporate exogenous variables.
    - They offer access to techniques like sample weights and custom loss functions.
    *Don't neglect simple baselines though! (**3:45**)*
    - Simple statistical models can be surprisingly effective.
    - Ensure the uplift from machine learning justifies the added complexity.
    *Forecasting with machine learning (**4:15**)*
    - Convert time series data into a table with features and a target variable.
    - Use past values of the target variable as features, ensuring no data leakage from the future.
    - Include features with known past and future values (e.g., marketing spend).
    - Handle features with only past values (e.g., weather) by using alternative forecasts or lagged versions.
    - Consider static features (metadata) to capture differences between groups of time series.
    *Multi-step forecasting (**8:07**)*
    - Direct forecasting: Train separate models for each forecast step.
    - Recursive forecasting: Train a one-step ahead model and use it repeatedly, plugging forecasts back into the target series.
    *Cross-validation: Tabular vs Time series (**11:32**)*
    - Randomly splitting data is inappropriate for time series due to temporal dependence.
    - Split data by time, replicating the forecasting process for accurate performance evaluation.
    *Machine learning workflow (**13:00**)*
    - Time series forecasting workflow differs significantly from classification/regression tasks.
    - Feature engineering and handling vary at predict time depending on the multi-step forecasting approach.
    *Feature engineering for time series forecasting (**14:47**)*
    - Lag features: Use past values of target and features, including seasonal lags.
    - Window features: Compute summary statistics (e.g., mean, standard deviation) over past windows.
    - Nested window features: Capture differences in various time scales.
    - Static features: Encode categorical metadata using target encoding, being mindful of potential target leakage.
    *Overview of some useful libraries (**27:01**)*
    - tsfresh: Creates numerous time series features from a data frame.
    - Darts and sktime: Facilitate forecasting with tabular data and offer functionalities like recursive forecasting and time series cross-validation.
    *Forecasting with tabular data using Darts (**28:04**)*
    - Example demonstrates forecasting with lag features and future known features on single and multiple time series.
    disclaimer: i used gemini 1.5 pro to summarize the youtube transcript.

  • @蔡传泽
    @蔡传泽 ปีที่แล้ว +1

    dude is a PhD for a reason, awesome stuff god damn

  • @duscio
    @duscio 2 ปีที่แล้ว +2

    Great Presentation ! Interesting and clear

  • @laizerLL572
    @laizerLL572 ปีที่แล้ว +2

    Hi Am so grateful for this tutorial

  • @onuragmaji
    @onuragmaji ปีที่แล้ว

    Great talk hope will get more contents like that on Practical TS

  • @蔡传泽
    @蔡传泽 ปีที่แล้ว +1

    this is some sysly good stuff!

  • @5112vivek
    @5112vivek 2 ปีที่แล้ว

    I will checkout these libraries. Very informative, thanks

  • @shivamgoel0897
    @shivamgoel0897 ปีที่แล้ว

    Amazing! So easy to understand.

  • @14loosecannon
    @14loosecannon 2 ปีที่แล้ว +2

    Really informative talk!

  • @bitzelcortez4011
    @bitzelcortez4011 2 ปีที่แล้ว +1

    Excellent talk!

  • @wexwexexort
    @wexwexexort 2 ปีที่แล้ว +1

    Fantastic!

  • @Xaphanius
    @Xaphanius 2 ปีที่แล้ว +1

    Great presentation!

  • @georgiosmitrentsis6171
    @georgiosmitrentsis6171 2 ปีที่แล้ว +1

    Great talk!

  • @yuh850321
    @yuh850321 ปีที่แล้ว +1

    Great talk

  • @adityaghuse374
    @adityaghuse374 6 หลายเดือนก่อน

    Great work👍👍

  • @KauphusmanHuor
    @KauphusmanHuor หลายเดือนก่อน

    very clear thank you

  • @DiscomongoEGE
    @DiscomongoEGE 2 ปีที่แล้ว

    Thank you very much. Great talk

  • @mingilin1317
    @mingilin1317 2 ปีที่แล้ว +3

    I have a question. If I have a time series data for a market, and the data is from 2012 to 2022.
    now I need to forcast the number of customer that visit the store.
    But from 2020 to 2022 ,because of COVID19, the number of customer has drop a lot.
    for this case, If I use last 30% data(from 2019 to 2022) to testing.
    Model can't get any data that influences by COVID19 when model training (all of them use to test)
    Isn't that make forcast mape very high? how should I do for this case? (sorry for my poor english)

  • @satyakiray8588
    @satyakiray8588 2 ปีที่แล้ว +1

    excellent and very informative presentation. Will definitely checkout darts and sktime

  • @neo_otaku_gamer
    @neo_otaku_gamer 2 ปีที่แล้ว +1

    thoughts on using TFT model for multi time series forecasting

  • @GandelfTheGrey
    @GandelfTheGrey 2 ปีที่แล้ว +1

    Great talk thanks

  • @aliwaheed906
    @aliwaheed906 2 ปีที่แล้ว +6

    Very informative and intriguing talk.
    I've been using SARMIAX and things like fbprophet for time series forecast.
    I have a question about the value of the ML approach. Considering there is a host of things you need to account for while modeling a time-series problem as an ML problem, is it actually that significantly better than traditional algorithms? Is this production-grade stuff or is this in early experimental stages?
    I must admit the ML approach sounds way more interesting than what I've been doing for the past few years.

    • @umitkaanusta
      @umitkaanusta 2 ปีที่แล้ว

      *by ML models, I mean the tree based ML models here

    • @TraininData
      @TraininData 4 หลายเดือนก่อน +1

      We basically use xgboost and lightGBMs for forecasting, or even linear regression. This models are therefore fit for production. ML models have the advantage that they allow you to enrich the features that you extract from the time series, with features from external resources, and hence, they are in general more versatile than the classical forecasting models like arima, which make many assumptions about the data and do not incorporate features very well.

  • @kaidendubois
    @kaidendubois 2 ปีที่แล้ว +2

    Super helpful presentation, thank you, will definitely be checking out your course!

    • @TraininData
      @TraininData 2 ปีที่แล้ว

      Here is the link, just in case ;) www.trainindata.com/p/feature-engineering-for-forecasting

  • @onlineschoolofmath37
    @onlineschoolofmath37 ปีที่แล้ว

    Awesome lecture! I just have one question @32:38, Kishan mentions that we may have different time indexes for different groups can be different which is fine. But the original consolidated data (all groups included) has continuous time stamps whereas when we consider different groups, there may be gaps in the time stamps. Would you still consider them as time series? Will the rest of the process work normally under these circumstances?

  • @aakashnandrajog7035
    @aakashnandrajog7035 2 ปีที่แล้ว +1

    Amazing

  • @5112vivek
    @5112vivek 2 ปีที่แล้ว +1

    how is y_train_all defined in the last example?

  • @Neilstube356
    @Neilstube356 2 ปีที่แล้ว +2

    Great talk! How would account for availability in your model? For example let’s say a SKU was out of stock for a portion of the training period. This could result in the sale lag feature being low for the out of stock SKU and high for substitute SKUs that were in stock.

    • @hurfable
      @hurfable 2 ปีที่แล้ว +2

      you can create a dummy boolean variable feature.

  • @py.master
    @py.master ปีที่แล้ว

    if you are imputing mean from your training set in place of a missing datapoint, does that mean that the imputed datapoint does not change your model estimation anyway as predicted model passes through mean of variables anyway? I dont think it is information leakage in this way, it is just saying ignore this datapoint

  • @solvem_probler
    @solvem_probler 10 หลายเดือนก่อน

    Nice talk

  • @AhmedThahir2002
    @AhmedThahir2002 ปีที่แล้ว

    Hi, does anyone know how to implement the recursive forecasting that he did in Darts using sktime. I couldn't really find an intuitive explanation online.

  • @yogiekusumah1148
    @yogiekusumah1148 2 ปีที่แล้ว +1

    Is anybody ever compared model result using same dataset and same parameters from sktime and Darts? for example ARIMA model from both packages.
    I've try it, and both models gave a different MAPE result. I hope i have made a mistake in my code.

  • @pranavkhatri9564
    @pranavkhatri9564 ปีที่แล้ว

    can we perform this with stock data with models such as Linear Regression ?

    • @TraininData
      @TraininData 4 หลายเดือนก่อน

      Yes you can!

  • @BrentMalice
    @BrentMalice 6 หลายเดือนก่อน

    this dude could be a voice actor

  • @D4nte-RN
    @D4nte-RN 10 หลายเดือนก่อน

    Enyone tried to apply this DART model on real world data? My MAPE score show me 26% ;-(

  • @marciamarquene5753
    @marciamarquene5753 ปีที่แล้ว

    1:41 gente vê a gente vê a gente

  • @manthanrathod1046
    @manthanrathod1046 4 หลายเดือนก่อน

    23:40 I don't really understand how it would cause data to leak in the train set? can anybody please explain with an example?

  • @masaeed44
    @masaeed44 ปีที่แล้ว

    Have you used Darts ever? From Darts I got "ValueError: `lags` must be strictly positive. Given: -1."