Multiclass Learning for Scikit Learn

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ต.ค. 2024
  • We learn how to deal with multi class classification, multi-label and multiple output classification and regression.
    Associated Github Commit:
    github.com/kna...
    Associated Scikit Links:
    scikit-learn.or...

ความคิดเห็น • 22

  • @ejmurray72
    @ejmurray72 6 ปีที่แล้ว +1

    YES. I was missing some of the info that you mentioned when learning to do this type of classification. You've just filled in the gaps.

  • @isayiyasnigatu564
    @isayiyasnigatu564 5 ปีที่แล้ว

    Thanks for the nice presentation sir. You said there is next episode about fit.transform(), but i couldn't find it. Actually my question is about the inverse_transform(yt). In your example , y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]], after the multi-label encoding to get back the y, we apply inverse_transform(yt) right? I did that and got y_t = [(2, 3, 4), (2,), (0, 1, 3), (0, 1, 2, 3, 4)]. It looks fine but the repeated classes are missing. Note that, the first row was [2,3,4,2] while the inverse is [2,3,4]. My question is, how to get back the list of classes we encoded, like y in your case ?

    • @DataTalks
      @DataTalks  5 ปีที่แล้ว

      Multi-label encoding is probably not what you are looking for. It really would not make sense to have a duplicated label be anything but a data error (think labeling an animal as: dog, spotted, dog - that extra dog does not make sense). Your labels probably represent ownership (like a picture containing: dog, cat, and dog). In that case you need to have a model that takes variable number of inputs (or summary stats of those inputs). Or outputs variable number of outputs. This would be like a bounding box NN or an LSTM. You should probably be looking at NN architectures if that is the case. Hope that helps!

    • @isayiyasnigatu564
      @isayiyasnigatu564 5 ปีที่แล้ว

      @@DataTalks Thanks for pointing out that. You are right, I realized it late. I will use other methods of encoding.

  • @tareknahool
    @tareknahool 6 ปีที่แล้ว

    Thanks a lot for the great lesson, but actually I didn't get that problem of multi output regression
    since i have the same problem and i need more clear example

  • @Ken-lr6pn
    @Ken-lr6pn 6 ปีที่แล้ว

    Thanks you for the informative video. As for the setting label and train sets(X and Y), where can I find the way ? I think it's skipped because you already mentioned before.

    • @DataTalks
      @DataTalks  6 ปีที่แล้ว

      We generally split our train sets using the train_test_split function from sklearn (though it depends on the dataset). If you are interested in the basics, definitely check out my introduction to data science series! Thanks for watching :)

  • @SSNU706
    @SSNU706 5 ปีที่แล้ว

    Hi, Can you pls clarify as when exactly we can use MLPRegressor and MultiOutputRegressor and how to score them using R^2 as by default R^2 gives the 'uniformed_average_score' which may not be correct? I have a multi target regression problem of predicting y1,y2 & y3 based on x1,x2...x20. I would appreciate if you can post a video covering in detail this multioutput regression problem with the best algorithm asap. Thanks a ton!

    • @DataTalks
      @DataTalks  5 ปีที่แล้ว

      No problem. MLP regressor is a multilayer perceptron, aka a Neural Network. The multioutput regressor is an sklearn meta estimator. It basically fits a single model per target.
      There are a few of considerations that go into choosing between the two. Perhaps the most important ones are: how many data points you have and how related are y1, y2, and y3. Data points is simple. If you have a lot of data points I'd use a neural network to start off. You are already dealing with an exotic problem (multioutput) so the complexity of going from tree based to neural networks based will be counterbalanced by only having one model instead of three (which is what you'd do with the multioutput case). The second question is less important. But if the ys are not related, then having the same model predict all three might not be the best approach. This advice is more intuition than hard evidence here though. Often times people will have side outputs for neural networks to facilitate the training, but only if those outputs are related.
      Hope this helps!

    • @SSNU706
      @SSNU706 5 ปีที่แล้ว

      @@DataTalks Wow, Thanks a lot for your quick reply. Really appreciate it. I have few questions and appreciate if you can clarify:
      1)MultiOutputRegressor: If this algorithm builds models separately for each target variable then can you please clarify whether the model does it internally and combines and comes up with the best model or do we have to take those individual models and figure out a way to combine them so that we have 1 final model which is really a difficult task and may not be the best solution? I really need a 1 final model but not individual models or else I think we can go back to regular simple regression of building models for each target variable separately.
      2)MLPRegressor: a) first of all, do we have to compute all the weights and biases etc and supply or the algorithm takes care of it? what all parameters are required when instantiating the model?
      b)Does this algorithm automatically checks internally all possible cases and comes up with 1 best final model? For ex: As part of testing, let's say we build models individually for each target variable and for ex: there is one i/p var x3 that has +ve linear relationship with target y1 and we need to keep it in model1 but let's say this same variable x3 does not have any linear relationship with y2 and we ideally need to drop in model2 as per this case etc in all those kind of cases, how does the algorithm handle? does it drop or keep the x3 variable when it builds one final model?
      3)Lastly, could you please share code for implementing both algorithms?
      Sorry, for asking too many questions but I couldn't find details of these 2 algorithms in detail anywhere. May I request you to post a video clarifying the details regarding these 2 algorithms which would be a great favor to online community as there are many like me who are looking for it and will get benefited.
      Thanks again and appreciate your help.

    • @DataTalks
      @DataTalks  5 ปีที่แล้ว

      (feel free to email me too about this - as we are getting into the long comment territory)
      1) The multi-output regressor just trains multiple models under the hood, and presents them too you as if they are a single model. So they are never really combined, they are just packaged.
      2)a) The alg takes care of all of it
      b) it trains a single NN with three outputs. So it can capture non-linear relationships. The loss of the network is loss from those three outputs added up. In your example the weights in the network going from x3 to y1 would be stronger than those leading to y2.
      3) These are great questions, but might be a little too specific for a YT video. Is there an open source dataset that you think would be interesting to see me work on that is similar to this? I think that could be a good format.
      I'm happy to chat more but I'm thinking that it might be better in an email chain or over skype, so feel free to reach out!

    • @SSNU706
      @SSNU706 5 ปีที่แล้ว

      @@DataTalks Hi, Thanks a lot for your replies and your willingness to help me. I sent an email to you with details of my problem case with the subject "Reg Multi Output Regression problem chat on your youtube channel". Could you please check it out and let me know your thoughts when you have a moment? Thanks again!

  • @pial2461
    @pial2461 5 ปีที่แล้ว +1

    is that your kitchen!

    • @pial2461
      @pial2461 5 ปีที่แล้ว

      Oh by the way, your talks are just GOLD to be honest. So charming! (your keras tuts are just great.)

  • @qqaadir
    @qqaadir 7 ปีที่แล้ว

    How do you deal if the dataset is larger, for example 40K samples. Fit function does not work it takes forever. Would you please suggest any strategy for that ?

    • @DataTalks
      @DataTalks  7 ปีที่แล้ว +2

      Great question! If your machine has multiple cores the easiest thing to do is to set n_jobs in the fit method to the number of cores. Next steps is to move to cloud computing with larger ram (40k samples is not too much so this should probably do). Final step is to move to either distributed compute or massive machines (aka spark).
      Hope that helps, and I might do some data eng tutorials in the future!

  • @NattapongPUN
    @NattapongPUN 7 ปีที่แล้ว

    Thanks

  • @张开顺-n8u
    @张开顺-n8u 6 ปีที่แล้ว

    hello , can you tell me what is the tools Using a browser?

    • @samdavepollard
      @samdavepollard 6 ปีที่แล้ว

      Jupyter Notebook
      jupyter.org/

    • @张开顺-n8u
      @张开顺-n8u 6 ปีที่แล้ว

      ok, thank you very much.

  • @rafambarrancos
    @rafambarrancos 5 ปีที่แล้ว

    You didn't talk about multiclass.

    • @DataTalks
      @DataTalks  5 ปีที่แล้ว +1

      Thanks for your comment Rafael and sorry the video was not clear. The two strategies that I mentioned above for classification in a multi class setting are one-vs-rest or one-vs-all (you can read some more here: en.wikipedia.org/wiki/Multiclass_classification#Transformation_to_binary). I'll keep working to improve and thanks for the comment!