Tutorial 28-MultiCollinearity In Linear Regression- Part 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 พ.ย. 2024

ความคิดเห็น • 93

  • @karangupta6402
    @karangupta6402 3 ปีที่แล้ว +18

    Thanks, Krish. One way that you suggested is to Drop the feature.
    There is one more thing that we can do is combine the features which we also called them as an interaction term (like multiply Age and years of experience, and then run your OLS model). This also seems to control collinearity to some extent

  • @ibrahimmondal9104
    @ibrahimmondal9104 2 ปีที่แล้ว +1

    Now the concept of Multicollinearity is totally clear....thanks bhaiya🙂👍✌️

  • @prijwalrajavat5811
    @prijwalrajavat5811 4 ปีที่แล้ว +4

    I was searching for the theory behind backward elimination ....everywhere just the procedure was given but why was explained by you ...Thank you, sir

  • @bonnyphilip8022
    @bonnyphilip8022 2 ปีที่แล้ว

    You are simply awesome... Nothing else to say.. Thank you for your contribution and service...

  • @bhagwatchate7511
    @bhagwatchate7511 3 ปีที่แล้ว +6

    Hello Krish,
    Thanks for the amazing explanation.
    I have one question as - Is it necessary to check summary and then conclude about multicollinearity. Can we proceed only with corr() function and conclude about multicollinearity.
    Thanks in advance, :)
    #multicollinearity #Krish Naik #Regression

  • @gargisingh9279
    @gargisingh9279 3 ปีที่แล้ว +8

    @KrishNaik Sir !! What about another technique called as VIF(Variance inflation factor)?That is the technique also used mostly for multicollinearity

  • @StyleTrick
    @StyleTrick 4 ปีที่แล้ว +8

    Krish, could you explain why AGE has the high P value and not Years of Experience? If you remove Years of Experience from the table, then Age has a P value of 0, as this makes sense as in the data as Age increases, Salary increases accordingly.

  • @rankerhill2335
    @rankerhill2335 4 ปีที่แล้ว +1

    Unless you calculate Variance Inflation Factor, you cannot be certain that multicollinearity is nor present. Ideally you should be calculating partial correlation coefficient and compute (1/1-partial correlation coefficient) and see if this is greater than 2

  • @ganeshdevare7360
    @ganeshdevare7360 3 ปีที่แล้ว

    Thank you very much sir for explanation at 00:00

  • @DataInsights2001
    @DataInsights2001 3 ปีที่แล้ว +1

    Nice presentation! Dropping a feature is the one solution, sometimes you can’t drop and you may need that feature in the analysis what would you do? Combine features or division or multiplication? Or use factor analysis? Mostly the cases like Marketing Mix modeling the similar issues arise often. Please Try videos like Marketing Mix modeling, Optimization, and forecasting estimating ROI, IROI, response curves....

  • @muhammadiqbalbazmi9275
    @muhammadiqbalbazmi9275 3 ปีที่แล้ว +1

    I think, instead of removing any feature we can combine both correlated features (in this context you talked about)
    Age+YearsOfExperience=> Seniority level.
    Problem solved. (domain knowledge required for feature engineering).

  • @likithliki1160
    @likithliki1160 3 ปีที่แล้ว +1

    amazing Explanation

  • @ashwininaidu2302
    @ashwininaidu2302 4 ปีที่แล้ว +3

    Hi Krish, Nice explaination of the concept. If you could make a video on Assumptions of Linear Regression ( theory and practical) , that would be of great help for us.

  • @shashankkhare1023
    @shashankkhare1023 4 ปีที่แล้ว +6

    Hey Krish, do we really need to look at Std. error? I think looking at p-vaue is enough as p-value is calculated based on t-statistic, which is calculated based on coeff. and std. error. Correct me if I am wrong, thanks

  • @ruturajjadhav8905
    @ruturajjadhav8905 3 ปีที่แล้ว +1

    1.Linearity
    2.Homoscedascity
    3.Multivariate normality
    4.Indepence of errors.
    5.Lack of multicollinearity.
    Please do a video on it.
    I am not getting these concepts.

  • @sahilgaikwad98
    @sahilgaikwad98 4 ปีที่แล้ว +1

    thankyou for such a wonderful explanation sir

  • @aifarmerokay
    @aifarmerokay 4 ปีที่แล้ว

    Thanks sir your explanation is really good

  • @Katarahul
    @Katarahul 4 ปีที่แล้ว +8

    plotting a correlation matrix and analyzing it is a faster option right?

  • @abdullahshafi8865
    @abdullahshafi8865 4 ปีที่แล้ว

    Intuition King is back.

  • @sachinborgave8094
    @sachinborgave8094 4 ปีที่แล้ว +1

    Thanks Krish,
    Please upload further Deep Learning videos.

  • @peeyushyadav4991
    @peeyushyadav4991 3 ปีที่แล้ว +1

    Wait if the newspaper column/feature is not correlated then how do we come to the conclusion that it can be dropped as the P-value is high? Are we making the conclusions based on the T-test values here as well?

  • @0SIGMA
    @0SIGMA 3 ปีที่แล้ว

    WAH! beautiful sir..

  • @unezkazi4349
    @unezkazi4349 3 ปีที่แล้ว +2

    So that means we should now have 0 expenditure on newspaper? And after removing the newspaper feature, if we do OLS again, can there be any other feature with p-value less than 0.5 in the new OLS model?

  • @jatin7836
    @jatin7836 4 ปีที่แล้ว

    An imp ques, if anyone knows, please answer ---> "Why we give OLS, the X value(which is constant) and not the x values(which are our independent features)? if we are not giving the independent values to OLS, then how can it is showing the table in output(with X and y?)"

  • @shubhamkundu2228
    @shubhamkundu2228 3 ปีที่แล้ว

    So to sum up, to identify multicollinearity, we use OLS (ordinary least squares) method to check the model summary for std error (values should be less not high), Rsquare value (should tend towards 1), p value (should

  • @kakranChitra
    @kakranChitra 3 ปีที่แล้ว

    Thankss, good elaboration!!

  • @madhavilathamandaleeka5953
    @madhavilathamandaleeka5953 3 ปีที่แล้ว

    What is the difference between OLS model and Linear regression with MSE ??......Do both give same ??.... plzz clear it 🙏🙏

  • @VishalPatel-cd1hq
    @VishalPatel-cd1hq 4 ปีที่แล้ว +1

    Hi Sir,
    can we directly find covariance matrix of the feature data and find the determinant of the covariance matrix if it is Zero then we can directly say that our covariance matrix is singular and if it is singular then it having dependent variable. so there is Multi-Collinearity in our data ...

  • @ajaykushwaha4233
    @ajaykushwaha4233 2 ปีที่แล้ว

    Will this approach work on data with huge features?

  • @raedalshehri7969
    @raedalshehri7969 2 ปีที่แล้ว

    Thank you so much

  • @sumeersaifi6354
    @sumeersaifi6354 3 ปีที่แล้ว

    at the last, you said 1 unit decrease in newspaper exp will result in one unit increase in sales.
    can you plzz explain it. because according to me it will result in change in profit but not in sales

  • @shreyasb.s3819
    @shreyasb.s3819 4 ปีที่แล้ว

    Good explained. Thanks a lot

  • @maskman9630
    @maskman9630 2 ปีที่แล้ว

    can we use the same process for logistic regression brother....?

  • @alexandremaillot803
    @alexandremaillot803 4 ปีที่แล้ว +4

    Please After making a model with python , training, testing and saving it, how can I put new data through it for predictions ? Thanks for the content

    • @amansrivastava3081
      @amansrivastava3081 3 ปีที่แล้ว +1

      for ex if here is how you have done made the regressor:
      reg=LinearRegression()
      reg.fit(x_train,y_train)
      then you can predict it like:
      reg.predict([[100]]) and you will get the predicted value
      if you have trained for multiple features, then use like reg.predict([[100,200]]) according to the dataframe you have trained!

  • @saumyagupta2606
    @saumyagupta2606 2 ปีที่แล้ว

    Is it a good practice to find multicollinearity for other ml models also apart from linear? Is it necessary for other models as well?

  • @akramhossain9576
    @akramhossain9576 4 ปีที่แล้ว +1

    I was looking for Multicollinearity problem in interaction terms. How to solve this problem? One suggestion is centring the variable! But still confused! When one of my interaction variables is dichotomous and coding =0 is important for me, then if I use centering, 0 coding will not be there anymore! So how to solve this? Can you please clarify this for me? Thanks

  • @adhvaithstudio6412
    @adhvaithstudio6412 4 ปีที่แล้ว

    Can you please explain why p values is biased towards in Age variable why it is not for years of experience.

  • @Datacrunch777
    @Datacrunch777 3 ปีที่แล้ว

    Sir kindly upload vdio on ridge regression for lambda value and how can simulate with respect to the basic estimators or formulas

  • @avinashmishra6783
    @avinashmishra6783 3 ปีที่แล้ว

    Why multicollinearity arises?
    What happens that reduces adj Rsq?

  • @K-mk6pc
    @K-mk6pc 2 ปีที่แล้ว

    Can anyone explain why we are adding constant 1 in the predictor variables?

  • @RitikSingh-ub7kc
    @RitikSingh-ub7kc 4 ปีที่แล้ว

    Can we just use principle component analaysis to see if all variance is covered by reducing number of features here?

  • @shubhamchoudhary5461
    @shubhamchoudhary5461 3 ปีที่แล้ว

    Is it like hypothesis testing??

  • @slowhanduchiha
    @slowhanduchiha 4 ปีที่แล้ว

    In the 2nd example wasn't scaling important??

  • @РайымбекТолеугазиев
    @РайымбекТолеугазиев 4 ปีที่แล้ว

    Cool video, but about heteroskedasticity?

  • @galymzhankenesbekov2924
    @galymzhankenesbekov2924 4 ปีที่แล้ว

    very good

  • @JP-fi1bz
    @JP-fi1bz 4 ปีที่แล้ว

    Isn't P value used for null hypothesis??

  • @adityakumar-sp4ki
    @adityakumar-sp4ki 3 ปีที่แล้ว

    while Importing this library -> import statsmodels.api as sm, I'm getting this error -> module 'pandas' has no attribute 'Panel'

  • @abhishekverma549
    @abhishekverma549 4 ปีที่แล้ว +1

    Sir what is P>[t]

  • @nagarjunakanneganti5953
    @nagarjunakanneganti5953 3 ปีที่แล้ว

    If I remove years of exp instead of age just by correlation matrix. It will work right? I don't have to check for p values as if I can construct a LR model with just age could change the p value? Any thoughts @channel

  • @guneet556
    @guneet556 4 ปีที่แล้ว

    Hi krish kindly please answer to this question of mine
    for under and oversampling data the techniques u have mentioned is applied on numneric datatypes
    for categorical datatypes first we have to encode in some 0,1 then further SMOTETomek and neamiss wil apply that is the proper way to deal ??
    and if we use trees approach then it will straight away deal this imbalance and encoding thing itself ??
    Please krish reply to this question it will be a big help!

    • @krishnaik06
      @krishnaik06  4 ปีที่แล้ว

      Yes tree approach will solve that problem. But understand tree techniques usually require lot of time

    • @guneet556
      @guneet556 4 ปีที่แล้ว

      @@krishnaik06 sir thanks a lot i m just a beginner follow ur each video thoroughly . Sir can u make a video on how to built ur resume for guys looking for transition in ds.

  • @chitranshaagarwal4676
    @chitranshaagarwal4676 ปีที่แล้ว

    Please make the topics in order, getting confused

  • @amitkhandelwal8030
    @amitkhandelwal8030 3 ปีที่แล้ว

    sir why you cannot choose tv as B0.??

  • @abhi9raj776
    @abhi9raj776 4 ปีที่แล้ว

    thanx a lot sir !

  • @deepaktripathi892
    @deepaktripathi892 4 ปีที่แล้ว

    As year and age are related how to decide which one to drop?

  • @Mohit-im1rp
    @Mohit-im1rp 4 ปีที่แล้ว

    what is the p value in above explanation?

  • @harshstrum
    @harshstrum 4 ปีที่แล้ว

    Hi bhaiya, didn't get why you choose to drop age feature over year of experience.

    • @krishnaik06
      @krishnaik06  4 ปีที่แล้ว +1

      because age and experience are highly correlated and we could see the p value of age was greater than 0.05

  • @syedhamzajamil4490
    @syedhamzajamil4490 4 ปีที่แล้ว

    Sir Can i use PCA for ignoring Multi_ colinearily

  • @abhishekchanda4002
    @abhishekchanda4002 4 ปีที่แล้ว

    Hi Krish,
    I got a doubt. If i ignore multicollinearity in the second example, how will it affect the prediction?

  • @SayantanSenBony
    @SayantanSenBony 4 ปีที่แล้ว

    Hi Karish, i have one question, which i am facing daily basic in realtime, How to detect heteroscedasticity and what is the method to rectify it in python ?

    • @simonelgarrad
      @simonelgarrad 3 ปีที่แล้ว

      There are many tests available. You can research on Bartlett's test. It assumes that the samples come from populations having same variances. There is another called Goldfeld quandt test. To rectify in python, what I Believe is that when working with MLR, if the normality assumption is satisfied then, you shouldnt get heteroscedasticity. So if we have variables that aren't following normal distribution, they can be transformed (refer Krish's video for that too) and then I dont think heteroscedasticity should be present.

  • @nithinmamidala
    @nithinmamidala 4 ปีที่แล้ว

    correlation part 1 video is missing please upload and rearrange with sequence

  • @adhvaithstudio6412
    @adhvaithstudio6412 4 ปีที่แล้ว

    why you are ignoring age only why can't we ignore years of expereince?

  • @dhirajbaruah9888
    @dhirajbaruah9888 4 ปีที่แล้ว

    What does the p value mean??

  • @dikshagupta3276
    @dikshagupta3276 2 ปีที่แล้ว

    Pls add the link of colinearity

  • @sejalchandra2114
    @sejalchandra2114 4 ปีที่แล้ว +1

    Hi sir, I have a doubt. Could we use this if we have one hot encoded features in the dataset?

  • @nidhipandey8004
    @nidhipandey8004 3 ปีที่แล้ว

    What is the difference in OLS and gradient descent?

    • @simonelgarrad
      @simonelgarrad 3 ปีที่แล้ว

      What i have read on it and understood as of now is, that OLS is a good method for simple linear regression. Gradient descent is better method when working with many Independent variables. You can understand more on this by watching Josh Stramer's Gradient descent video

  • @SatnamSingh-cm2vt
    @SatnamSingh-cm2vt 4 ปีที่แล้ว

    what does p-value signifies?

  • @shivanshsingh5555
    @shivanshsingh5555 4 ปีที่แล้ว

    why u r keep saying 0.5 again and again sir. This is not cleared. Plz send the link for the reference coz i m watching this playlist step by step but im still not getting this 0.5 criteria

  • @jyothinkjayan6508
    @jyothinkjayan6508 4 ปีที่แล้ว

    When will the next deep learning batch begins

  • @Vignesh0206
    @Vignesh0206 4 ปีที่แล้ว

    may i know is there a part one of this video?

  • @jatashukla6891
    @jatashukla6891 4 ปีที่แล้ว

    hi krish i had bought your package of 300 rs on youtube , just to get in connect with you .i need your time to answer some of my doubts.Its regarding switching my career into datascience.Please let me know way to connect with you directly

  • @raghavchhabra4783
    @raghavchhabra4783 4 ปีที่แล้ว

    dekhlia, pr samjh nhi aya agar 150+ columns honge to kaise krna hai!!

    • @sahilgaikwad98
      @sahilgaikwad98 4 ปีที่แล้ว

      use iloc to get independent variables and store it in X and store target in Y and use OLS model

  • @namratarajput7092
    @namratarajput7092 4 ปีที่แล้ว

    hi i am getting error after running this line "import statsmodels.api as sm"
    ImportError: cannot import name 'factorial'
    i am beginner so plz help

    • @tanweerkhan3020
      @tanweerkhan3020 4 ปีที่แล้ว

      You need to downgrade your scipy or install statsmodels from master. You can check this in stackoverflow

  • @sushilchauhan2586
    @sushilchauhan2586 4 ปีที่แล้ว

    What if our feature are in 1000's ? pls reply any one who knows can answer

    • @krishnaik06
      @krishnaik06  4 ปีที่แล้ว +2

      For that we will apply dimensionality reduction

    • @sushilchauhan2586
      @sushilchauhan2586 4 ปีที่แล้ว

      @@krishnaik06 Kabhi socha nahi tha, ki BHAI khud reply denge

    • @sushilchauhan2586
      @sushilchauhan2586 4 ปีที่แล้ว

      @@krishnaik06 sorry we cant apply dimensionality reduction as its only deal with high variance and has no relation with our class output... we will be using randomized lasso regression or randomized logistic regression.. thank you krish bhai

  • @guneet556
    @guneet556 4 ปีที่แล้ว

    Sorry but have to ask again in my previous comment can someone correct me ??

  • @a_wise_person
    @a_wise_person 4 ปีที่แล้ว +1

    How is Dell ispiron i5 8th gen 2tb HDD for data science and programming work.

    • @srujohn652
      @srujohn652 4 ปีที่แล้ว +1

      Its pretty enough though you can use google colab for your ML projects

  • @adityapathania3618
    @adityapathania3618 3 ปีที่แล้ว

    bhai data set to de dia kro

  • @pushkarasharma3746
    @pushkarasharma3746 4 ปีที่แล้ว

    everything is fine Krish sir but pls try not to say particular in every sentence...it is very annoying

  • @ayushasati2110
    @ayushasati2110 3 ปีที่แล้ว

    your voice is not that much clear