Advanced missing values imputation technique to supercharge your training data.

แชร์
ฝัง
  • เผยแพร่เมื่อ 3 ก.ย. 2023
  • Get the most out of your data for machine learning by adopting this advanced data preprocessing trick.
    verstack package documentation - verstack.readthedocs.io/en/la...

ความคิดเห็น • 18

  • @soccerdadsg
    @soccerdadsg 10 หลายเดือนก่อน

    Absolutely love this library!

  • @AlexErdem-lo5rz
    @AlexErdem-lo5rz 2 หลายเดือนก่อน

    Thank you!

    • @lifecrunch
      @lifecrunch  2 หลายเดือนก่อน

      Welcome!

  • @likhithp9934
    @likhithp9934 3 หลายเดือนก่อน

    Nice Work man

    • @lifecrunch
      @lifecrunch  3 หลายเดือนก่อน

      Thanks 🔥

  • @tnwu4350
    @tnwu4350 3 หลายเดือนก่อน

    Hi there this is an awesome approch for imputation. How would you go about validating this though? It would be helpful to demonstrate that its more accurate than methods like simple or iterative imputer

    • @lifecrunch
      @lifecrunch  3 หลายเดือนก่อน

      I have benchmarked this approach to iterative imputer along with all statistical methods. Every time verstack.NaNImputer gave better results, especially comparing to statistical methods. And there's really no magic - a sophisticated model like lightgbm is a golden standard when it comes to tabular data.

  • @akmalmir8531
    @akmalmir8531 10 หลายเดือนก่อน

    Danil thank you for sharing, interesting library, one idea would be best if next time we could compare like :
    1) mean imputation
    2) dropping
    3) ML
    and then fit and predict any model to data at the end we can compare in which imputation RMSE is in minimum

    • @lifecrunch
      @lifecrunch  10 หลายเดือนก่อน

      Did such comparison many times. Although it is very much dependent on the data, but on average the ML missing values imputation yields better results.

    • @akmalmir8531
      @akmalmir8531 10 หลายเดือนก่อน

      @@lifecrunch Yes agree, that's why i am writing to show to you viewers that you idea works better than simple imputation, like you are giving gold to them, it would ne better if you give comparison at the end

    • @lifecrunch
      @lifecrunch  10 หลายเดือนก่อน

      Agree, this would be a great illustration of the concept.

  • @yolomc2
    @yolomc2 4 หลายเดือนก่อน

    is possible to get copy of the code to study sir ? thanks in advnance 👌👍

    • @lifecrunch
      @lifecrunch  3 หลายเดือนก่อน +1

      Unfortunately didn't save the code from this video... You can code along, the script is not very complicated.

    • @yolomc2
      @yolomc2 3 หลายเดือนก่อน

      @@lifecrunch 👍

  • @nawaz_haider
    @nawaz_haider 7 หลายเดือนก่อน

    I'm learning Data Science, and most tutorials just use the mean value. This didn't make any sense to me. I was wondering how on earth their model works in the real world with all these wrong values that have been used during training. Now I see what pros do.

    • @lifecrunch
      @lifecrunch  7 หลายเดือนก่อน

      Yeah, the naive (mean) approach just works technically. It’s used to fill in the blanks so the models which can’t handle NaN could train. But the volume of incorrectly filled missing values will directly reflect the model’s generalization.

  • @kalyanchatterjee8624
    @kalyanchatterjee8624 28 วันที่ผ่านมา

    Great, but I am not the right audience. Too fast.

    • @lifecrunch
      @lifecrunch  23 วันที่ผ่านมา

      You’ll get there…