Seven ways to select columns using ColumnTransformer

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ธ.ค. 2024

ความคิดเห็น • 34

  • @dataschool
    @dataschool  4 ปีที่แล้ว +1

    If you're not already familiar with ColumnTransformer, go back and watch tip #1 here: th-cam.com/video/NGq8wnH5VSo/w-d-xo.html
    Let me know if you have any questions, and thanks as always for watching! 🙌

    • @alenjose3903
      @alenjose3903 4 ปีที่แล้ว +2

      Yes that is helpful ❤️

    • @JoaoVitorBRgomes
      @JoaoVitorBRgomes 3 ปีที่แล้ว

      @data school , Ok, but what if I want to select 3 columns. 2 applying a transformation e.g. Embarked and Sex but Fare keeping at it is and excluding Age from my transformation?!

    • @JoaoVitorBRgomes
      @JoaoVitorBRgomes 3 ปีที่แล้ว

      what I did was to use passthrough with a slice of x_train, but isnt there a way built int sklearn?

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      @@JoaoVitorBRgomes You only pass the columns you want to use to the fit_transform method. So, you define your X as Embarked, Sex, and Fare. Then, you create a ColumnTransformer with a single tuple specifying the transformation for Embarked and Sex, and you set remainder='passthrough' so that Fare is passed through unmodified. Hope that helps!

  • @Airborne_Insight
    @Airborne_Insight 3 ปีที่แล้ว +2

    you are a real life saver.

  • @satyatej8280
    @satyatej8280 3 ปีที่แล้ว +2

    Thank you so much, Kevin. You are making everything easy.

    • @dataschool
      @dataschool  3 ปีที่แล้ว

      Happy to hear that!

  • @adamyatripathi2743
    @adamyatripathi2743 4 ปีที่แล้ว +3

    Absolutely Beautiful! I was working on the same problem.

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Thank you so much for your kind words! 🙏

  • @jaysoni7812
    @jaysoni7812 4 ปีที่แล้ว +2

    Finally you made explanation video on scikit learn thank you so much 👍

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      You're very welcome! I also have more scikit-learn videos here: th-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html

    • @alenjose3903
      @alenjose3903 4 ปีที่แล้ว +1

      @@dataschool will you be adding more to this playlist, im planning on starting ML now and im really looking forward to your lessons . Big Fan ❤️ keep posting ⚡️

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      I'm not currently planning to add to that playlist, though I have many more hours of content in my ML courses: www.dataschool.io/ml-courses/ Hope that helps!

  • @jaikishank
    @jaikishank 4 ปีที่แล้ว +2

    Thank you. Is quite handy and useful

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      Glad it was helpful!

  • @manarma7536
    @manarma7536 ปีที่แล้ว

    slice() worked with me , thanks

    • @dataschool
      @dataschool  11 หลายเดือนก่อน

      Great to hear!

  • @DrJohnnyStalker
    @DrJohnnyStalker 4 ปีที่แล้ว +2

    You can also pass a callable (e.g. a self defined helper function that excpects the column series as input and returns a TRUE/FALSE) to the columns parameter. If the callable returns false the column will not be processed, if the callable returns true the column will be processed. Example: pasteboard.co/JC5zccR.png
    This way allows a very granual column selection and distinction e.g. seperate high cardinality columns in a different pipeline.

    • @dataschool
      @dataschool  4 ปีที่แล้ว +1

      Wow, thank you so much for sharing, I look forward to checking it out!

  • @MOHAMEDSaid-tz2jx
    @MOHAMEDSaid-tz2jx 3 ปีที่แล้ว +1

    Hi, kevin i have a question about the Column Transformer, how do you manage to avoid redundancy?
    for example when you perform one hot enconding on sex column, normally we kept only one column instead of 2,
    how do you manage that? thanks and great job

    • @ramasai1475
      @ramasai1475 3 ปีที่แล้ว +1

      ohe = OneHotEncoder(drop = 'First') will drop the first column of each encoded feature.

    • @dataschool
      @dataschool  3 ปีที่แล้ว +1

      You can drop the first column using the drop='first' parameter of OneHotEncoder, but in general, I recommend avoiding that. Here's why:
      1. Multicollinearity is rarely an issue with scikit-learn models
      2. drop='first' is incompatible with handle_unknown='ignore'
      3. May be problematic if you standardize all features or use a regularized model

  • @hogobi
    @hogobi 4 ปีที่แล้ว +1

    thank you u explain sklearn how to work with model and every thing fine the next step (production )

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      You're welcome! One of my courses might be useful to you: www.dataschool.io/ml-courses/

  • @jaikishank
    @jaikishank 4 ปีที่แล้ว +1

    Is it possible to inverse transform the ct object to see the (orginal to encoded) transition. In the docs of SCIKIT learn i could find for the single data type transition (onehot,label) but not for the heterogenous column transformer. can you help please....

    • @dataschool
      @dataschool  4 ปีที่แล้ว +1

      Unfortunately, inverse_transform is not currently available for ColumnTransformer. You can see the open issue here: github.com/scikit-learn/scikit-learn/issues/11463

    • @jaikishank
      @jaikishank 4 ปีที่แล้ว +1

      @@dataschool Thanks for your feed back and update

  • @deepakvyas3424
    @deepakvyas3424 4 ปีที่แล้ว +1

    Sir, pls make video on Time series (Full topic cover in single tutorial)

    • @dataschool
      @dataschool  4 ปีที่แล้ว +1

      Thanks for your suggestion!

    • @deepakvyas3424
      @deepakvyas3424 4 ปีที่แล้ว

      @@dataschool I will be looking forward to the video😇
      Actually I am stuck in the time series project, I have the last 8 years of stock data and have to predict for the next 4 years

  • @asraramostofa5846
    @asraramostofa5846 4 ปีที่แล้ว

    Problem :
    > install.packages("dplyr")
    Installing package into ‘C:/Users/Nobody/Documents/R/win-library/4.0’
    (as ‘lib’ is unspecified)
    trying URL 'cran.rstudio.com/bin/windows/contrib/4.0/dplyr_1.0.2.zip'
    Content type 'application/zip' length 1299904 bytes (1.2 MB)
    downloaded 1.2 MB
    Error in install.packages : cannot open file 'C:/Users/Nobody/Documents/R/win-library/4.0/file2f04521f5516/dplyr/help/figures/logo.png': Permission denied

    • @dataschool
      @dataschool  4 ปีที่แล้ว

      I won't be able to help you troubleshoot, I'm sorry!