Gradient descent simple explanation|gradient descent machine learning|gradient descent algorithm

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 ก.ย. 2024
  • Gradient descent simple explanation|gradient descent machine learning|gradient descent algorithm
    #gradientdescent #unfolddatascience
    Hello All,
    My name is Aman and I am a data scientist. In this video I explain gradient descent piece by piece. In this video, my intention is to make gradient descent extremely simple to understand. Gradient descent being a very important algorithm for machine learning and deep learning is a must know topic for every data scientist. Below questions are answered in this video:
    1. What is gradient descent?
    2. How gradient descent works?
    3. Gradient descent algorithm?
    4. What is gradient descent in machine learning?
    5. What is gradient descent in deep learning?
    6. How gradient descent algorithm works?
    About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.
    Join Facebook group :
    www.facebook.c...
    Follow on medium : / amanrai77
    Follow on quora: www.quora.com/...
    Follow on twitter : @unfoldds
    Get connected on LinkedIn : / aman-kumar-b4881440
    Follow on Instagram : unfolddatascience
    Watch Introduction to Data Science full playlist here : • Data Science In 15 Min...
    Watch python for data science playlist here:
    • Python Basics For Data...
    Watch statistics and mathematics playlist here :
    • Measures of Central Te...
    Watch End to End Implementation of a simple machine learning model in Python here:
    • How Does Machine Learn...
    Learn Ensemble Model, Bagging and Boosting here:
    • Introduction to Ensemb...
    Access all my codes here:
    drive.google.c...
    Have question for me? Ask me here : docs.google.co...
    My Music: www.bensound.c...

ความคิดเห็น • 481

  • @DS_AIML
    @DS_AIML 4 ปีที่แล้ว +34

    My question is when we calculate Partial derivative with respect to 'c' and 'm' ,we should consider one as constant.For example
    to calculate partial derivative of cost function J with respect to c ∂J/∂c ,we should consider 'm' as constant .So the above calculation should be like this. -2[2 - (c+m)] + (-2)[4-(c+3m)] => -2[2-(c)]+(-2)[4-(c)] => -2[2] -2[4] =>-4-8=> -12.
    Please confirm

    • @diobrando1253
      @diobrando1253 4 ปีที่แล้ว +1

      Yep when we calculate w.r.t c m is const and vice versa.

    • @brajeshanand
      @brajeshanand 3 ปีที่แล้ว +10

      Hi Anjani...why it is -2[2-(c+m)] as derivative od [2-(c+m.1)]^2 . dont you think it should be 2[2-(c+m)] from the derivation rule.

    • @soumyapatil4991
      @soumyapatil4991 3 ปีที่แล้ว +16

      why -2 i still didnt get it... it should be ... 2[2-(c)]+2[4-(c)] right?

    • @manisharvinds895
      @manisharvinds895 3 ปีที่แล้ว +1

      Could you please elaborate on your derivative method @Anjani Kumar . I guess the value -4 in the video is correct

    • @sajjadabdulmalik4265
      @sajjadabdulmalik4265 3 ปีที่แล้ว +1

      Why -2?

  • @pankajgoikar4158
    @pankajgoikar4158 11 หลายเดือนก่อน +5

    This is the first time I'm learning about Gredient Desent, and I understood how algorithms work. This video is amazing. Thank you so much.

  • @aparnasingh4096
    @aparnasingh4096 3 ปีที่แล้ว +26

    Went through lots of articles but didn't understand the core. But your video made it clear within 15 minutes :) Just awesome keep up the good work :)

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว +2

      Thanks Aparna, your comments are very valuable for me.

  • @paonsgraphics9750
    @paonsgraphics9750 2 ปีที่แล้ว +2

    You're just amazing! Anyone can understand gradient descend by watching this video. Thanks!

  • @ManiKandan-lx9zg
    @ManiKandan-lx9zg 3 ปีที่แล้ว +2

    Great great lecture for gradient descent I have ever seen....thank u so much for sharing ur knowledge sir ❤️

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      So nice of you Mani. your comments mean a lot.

  • @DURGASREECHOWDHARYKOMMINI
    @DURGASREECHOWDHARYKOMMINI 9 หลายเดือนก่อน

    I have gone through lots of explanations and it was not understood. But through this video, i got the confidence in continue my learning forward. supeerr sir, thank you

  • @himanshirawat1806
    @himanshirawat1806 2 ปีที่แล้ว

    Simply wow. After a month I understood today Gradient Descent. Thank you soo much for the video 😊

  • @elcheapo9444
    @elcheapo9444 ปีที่แล้ว +1

    Your teaching method is masterful! None of the books I read go to such depths. Thanks!!

  • @GayatriDuwarah
    @GayatriDuwarah 3 ปีที่แล้ว +1

    Well explained tomorrow I have my project

  • @surajitlahiri
    @surajitlahiri 8 หลายเดือนก่อน

    One of the best explanation of gradient descent. Thank you so much. Very informative

  • @Elementiah
    @Elementiah 3 หลายเดือนก่อน +1

    Amazing video! How did you work out the slope value to be -4?

  • @UsmanKhan-tc4sk
    @UsmanKhan-tc4sk 3 ปีที่แล้ว +1

    very beneficial video thnk you so much love form pakistan

  • @suresh9031
    @suresh9031 5 หลายเดือนก่อน +1

    great ... simple and best. Thank you.

  • @bahkeith7357
    @bahkeith7357 2 ปีที่แล้ว

    i don't know what this world will be without indians youtubers. thank you very much at least i got something

  • @yoyomovieclips8813
    @yoyomovieclips8813 4 ปีที่แล้ว +1

    Great work sir,i finally understood from your video.Thanks a lot

  • @buragohainmadhurima
    @buragohainmadhurima 3 ปีที่แล้ว +1

    Awesome video. This is the best explanation. Please make more videos.

  • @RAJI11000
    @RAJI11000 2 ปีที่แล้ว

    Sir plz post vedioes on deep learning.u do a great job sir. Amazing vedioes sir.

    • @UnfoldDataScience
      @UnfoldDataScience  2 ปีที่แล้ว

      Thanks for your positive feedback. Please share with others as well who could be benefited from such content.

  • @Shailendrakumar-cv4yv
    @Shailendrakumar-cv4yv 16 วันที่ผ่านมา

    Amazing explanation !

  • @chibuzorobiefuna4090
    @chibuzorobiefuna4090 2 ปีที่แล้ว

    I don't know if i have bitten more than i could chew by deciding learn machine learning, this gradient decent is giving me a hard time understanding. i am learning it on coursera same issue, i will keep reading hopefully i get to understand it one day.

  • @omeshamisuanigala4635
    @omeshamisuanigala4635 ปีที่แล้ว +1

    When computing the partial derivative where did you get the negative 2 from the exponent 2 is positive how is it that it is negative when differentiating?

  • @jagadisheslavath4578
    @jagadisheslavath4578 3 ปีที่แล้ว +1

    Thank you for the detailed explanation with simple example :)

  • @sayansen11
    @sayansen11 4 ปีที่แล้ว +1

    Awesome ... very simple explanation

  • @gnanaselvannallathambi9019
    @gnanaselvannallathambi9019 3 ปีที่แล้ว

    I like your explain is very smart

  • @143balug
    @143balug 4 ปีที่แล้ว +11

    This is one of the best explanation video about Gradient Descent, I like your detailed explaination. Looking forward for more videos on various Optimizers.
    Thank you

    • @UnfoldDataScience
      @UnfoldDataScience  4 ปีที่แล้ว

      Thanks a lot Bala. Yes will create on those topics as well.

  • @thisiswhy793
    @thisiswhy793 3 ปีที่แล้ว +1

    sir a video on gradient checking plzzz btw amazing explanation plz keep it up

  • @ashwinikumar6461
    @ashwinikumar6461 ปีที่แล้ว

    Great job my dear Aman .... nice and crispy lecture understood everything except one dark area . could you please elaborate ...why -2 in derivation of c ,rather, it should be ... 2[2-(c)]+2[4-(c)] right?.my apologies in advance if this is due to my ignorance .. please light up my ignorance...

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว

      Hi Ashwini, thank you. This -2 thing has been discussed before. Please see the pinned comments on the top.

  • @vcjayan8206
    @vcjayan8206 3 ปีที่แล้ว +1

    Hi, simply explained, thanks

  • @pravinpoojari007
    @pravinpoojari007 ปีที่แล้ว

    Thank you for such a informative video,
    Am from bcom background, wo all of these are like new to me,
    Have a doubt,
    Have seen other video as well to understand how we calculate derivatives,
    But in your example, y we are multiplying with -2 rather then 2

  • @9700784176
    @9700784176 3 ปีที่แล้ว

    5:36
    what is the last video that you mentioned about derivative. providing that link would be a great help. Need to mention, your way of teaching is of top notch.
    keep up the good work sir.

    • @UnfoldDataScience
      @UnfoldDataScience  2 ปีที่แล้ว

      Thanks.
      Please watch this video:
      th-cam.com/video/WCp1D-wSolo/w-d-xo.html

  • @SHIVAMGUPTA-wb5mw
    @SHIVAMGUPTA-wb5mw หลายเดือนก่อน

    Bhai , maja a gya .
    🫡🫡

  • @goundosidibe9964
    @goundosidibe9964 3 ปีที่แล้ว

    Amazing. Thank you soo much for this video . You included everything and its very well explained.

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      You're very welcome Goundo. Please share the link within data science groups. Thank you.

  • @FarhanAhmed-xq3zx
    @FarhanAhmed-xq3zx 3 ปีที่แล้ว +1

    Thanks a lot..Awesome explanation
    💥💥👌👌👌

  • @mandarmore.9635
    @mandarmore.9635 ปีที่แล้ว +1

    thank you so much for making this video you are amazing

  • @bslnada9248
    @bslnada9248 ปีที่แล้ว

    thank you so much !!!! you are a great teacher

  • @jayendramanikumar9211
    @jayendramanikumar9211 ปีที่แล้ว

    Best explanation in the short time

  • @shahriaralom4547
    @shahriaralom4547 3 ปีที่แล้ว +1

    thank you so much sir

  • @ShaidaMuhammad
    @ShaidaMuhammad 4 ปีที่แล้ว +1

    I already know Gradient Descent, but still going to watch the whole video for some new insights of GD.

  • @arunmehta8234
    @arunmehta8234 3 ปีที่แล้ว +1

    thanks sir ! Very helpful video.

  • @ajaykushwaha4233
    @ajaykushwaha4233 3 ปีที่แล้ว +1

    Awesome explanation.

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      Thanks Ajay, hope u are doing good and staying safe.

  • @mohitkumarprajapati1914
    @mohitkumarprajapati1914 ปีที่แล้ว

    (1/2) factor is missing i.g while discussing about cost function of Linear Regression. Pls correct me if I am wrong...

  • @kalyanchakri8469
    @kalyanchakri8469 3 ปีที่แล้ว

    Very nice video ,Please make a video on what are the other optimization techniques and compare ,that will be very helpful.

  • @yatinarora9650
    @yatinarora9650 2 ปีที่แล้ว +1

    super

  • @SumanDas-fx5vu
    @SumanDas-fx5vu ปีที่แล้ว

    great video

  • @ChaitanyaKadari
    @ChaitanyaKadari 10 หลายเดือนก่อน

    hello every one here u are asking about -c am i right? yes so here we calculate partiakl derivative with respect to c so [2-(c+m)]^2 is = 2[2-(c+m)] d/dx(-c) becoz here derivative with respect to c so we get d/dx(-c)=-1 then multiply -1 to 2[2-(c+m)] then -2[2-(c+m)] . is it clear

  • @dr.himanimittal9880
    @dr.himanimittal9880 3 ปีที่แล้ว +1

    Brilliant. Thank you

  • @samsricatjaidee405
    @samsricatjaidee405 5 หลายเดือนก่อน

    Thank you. You make me understand this.

  • @araeneangela
    @araeneangela 3 ปีที่แล้ว

    Your videos are gold. Thank you for all your efforts

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      Your comments motivate me. Much appreciated!

  • @ridoychandraray2413
    @ridoychandraray2413 ปีที่แล้ว

    Bro next time what will be value of m how we will get the value?

  • @NikhilPodShorts
    @NikhilPodShorts 3 ปีที่แล้ว

    Sir, in last example of partial differentiation why we take -ve sign. Can you please tell me.

  • @tejaspatil3978
    @tejaspatil3978 3 ปีที่แล้ว

    Very thank you sir...

  • @raghuprasadkonandur-ekashalya
    @raghuprasadkonandur-ekashalya 4 ปีที่แล้ว

    Very good video. Clear explanation

  • @BDKMITAnilkumar
    @BDKMITAnilkumar 2 ปีที่แล้ว +1

    Good explanation onto the point 😄

  • @zakikhurshid3843
    @zakikhurshid3843 ปีที่แล้ว

    Great explanation. Thank you for this.

  • @Person-hb3dv
    @Person-hb3dv 2 ปีที่แล้ว

    Well explained sir. Thank you

  • @ToshioKhan
    @ToshioKhan ปีที่แล้ว

    Thanks

  • @sainathnaik3933
    @sainathnaik3933 4 ปีที่แล้ว

    One of the best video on GD. Thank you very much.

    • @EagleYin
      @EagleYin 4 ปีที่แล้ว

      I really enjoyed this video but it's lack of code. I got a great video here implement sgd using Python!!! Feel free to check it out!! th-cam.com/video/uXuBUkW_0tA/w-d-xo.html

    • @UnfoldDataScience
      @UnfoldDataScience  4 ปีที่แล้ว

      Glad it was helpful Sainath.

  • @edeabgetachew6054
    @edeabgetachew6054 2 ปีที่แล้ว

    Good job

  • @bhavya2301
    @bhavya2301 3 ปีที่แล้ว +1

    The Best Video. 😀

  • @sadisticnepal9567
    @sadisticnepal9567 2 ปีที่แล้ว

    Can you make a video on nelder mead downhill simplex for local minimization?

  • @uzairsiyal-b9p
    @uzairsiyal-b9p ปีที่แล้ว +1

    I HAVE A QUESTION IAM JUST A BEGINNER ANYONES ANSWER WILL BE HIGHLY APPRICIATED SO HERE IS WHAT I NEED TO KNOW IF WE HAVE COST FUNCTION FOR SIMPLE LINEAR REGRESSION SO WHAT IS THE NEED OF GRADIENT DESENT WHAT I THINK OF THIS IS SIMPLE LINEAR REGRESSION DOESNT GIVES THE OUTPUT CLOSE TO THE LOCAL MINIMUM BUT THEN WHAT IS THE USE OF COST FUNCTION ?

  • @jeromemthembisi8812
    @jeromemthembisi8812 11 หลายเดือนก่อน

    hi sir did you also assume learning rate??

  • @vinayvinnu7608
    @vinayvinnu7608 2 ปีที่แล้ว

    Love you bro
    Thank you soo much

  • @hemanthvokkaliga
    @hemanthvokkaliga 2 ปีที่แล้ว

    Hy Aman i, i have a doubt regarding Gradient Descent, if we have Local minima , how gradient Descent handles it to find Global minima? , Could you please explain it in depth?

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว +1

      Yes Hemanth, it uses different concepts, I will cover in separate video

  • @umairnazir5579
    @umairnazir5579 4 หลายเดือนก่อน

    Great sir 👍

  • @iapplepro668
    @iapplepro668 ปีที่แล้ว +1

    4:29 the graph you drawn is parabola and you have taken x^2 to plot this graph which is Exponential graph. If we take x = -1 and then square it we will get +1 so the graph will never move in negative direction.

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว +1

      Yes..there be no negative y - how thats an issue?

    • @iapplepro668
      @iapplepro668 ปีที่แล้ว

      @@UnfoldDataScience yes sir that’s what I am saying there will be no negative..but you drawn the graph on board with negative values😊 correct me if I am wrong 😊

  • @sunzarora
    @sunzarora 3 ปีที่แล้ว

    Nice Video,what will be new m,initially as you assumed m=1,c=0

  • @sahilgarg7284
    @sahilgarg7284 3 ปีที่แล้ว +1

    Brilliant!

  • @harshmashru
    @harshmashru 3 ปีที่แล้ว +1

    My question is how did you get the Learning rate value.?

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      Hi Harsh, the recommeded value in industry is suggested in a range.

  • @avinashbhad
    @avinashbhad 8 หลายเดือนก่อน

    brother you didn't took the value of 1/N before summation in MSE formula

  • @vijayalaxmiyalavigi6232
    @vijayalaxmiyalavigi6232 4 ปีที่แล้ว

    I understood sir, thank you very much

  • @nirupamsuraj1634
    @nirupamsuraj1634 ปีที่แล้ว

    Thank you for the explanation. I have a doubt, shouldn't we multiply with 1/2n in the cost function equation ,where n is number of data given(2 in this case) ???

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว

      I can check that once. Thanks for pointing out.

  • @yaminikommi5406
    @yaminikommi5406 2 ปีที่แล้ว

    Sir can we take initial assumptions any numbers or we have to take 0and 1

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว

      Not sure which part of the video or which param you are asking about

  • @kuppuswamyr4360
    @kuppuswamyr4360 4 ปีที่แล้ว

    Awesome sir.thank u for the valuable content 😀

  • @mmovie5035
    @mmovie5035 2 ปีที่แล้ว

    Thank you

  • @gowithgrade
    @gowithgrade ปีที่แล้ว

    great brother thanks.

  • @BharathKumar-vs8fm
    @BharathKumar-vs8fm 3 ปีที่แล้ว

    This explanation is amazing!

  • @ideas4951
    @ideas4951 ปีที่แล้ว

    Sir you are great!!

  • @satyanarayanbarik1701
    @satyanarayanbarik1701 2 หลายเดือนก่อน

    Thank you sir

  • @srivinaykatana4323
    @srivinaykatana4323 2 ปีที่แล้ว

    Good explanation.. I have a doubt whether we can use the gradient descent method to maximize a function also..? If it's the case, the formula at 3:54 might not hold good I think, it will go in wrong direction.. please clarify my doubt whether it can be used for maximizing the function or not

    • @UnfoldDataScience
      @UnfoldDataScience  ปีที่แล้ว

      We never maximize cost function - why would we need to do so

    • @srivinaykatana4323
      @srivinaykatana4323 ปีที่แล้ว

      @@UnfoldDataScience For some scenarios, there would be cases where we need to maximize the objective function.. then we need to go in the direction of slope unlike travelling against the slope as given in the video (so we can add it rather than subtracting at 3:54)

  • @naruto5437
    @naruto5437 ปีที่แล้ว

    this is amazing stuff

  • @minakshimishra4213
    @minakshimishra4213 2 หลายเดือนก่อน

    lol samjte samjhte I reached "raam kaun the?" stage lol ye hi bhul gyai ki ye find out kya karwa rhae hain 🤔 coz x, y ka table relate hi ni hua puri ram kahani me :( But good initial explanation about gradient descent and the derivative.

  • @princysujesh8212
    @princysujesh8212 ปีที่แล้ว

    Great explanation

  • @brendawilliams8062
    @brendawilliams8062 4 ปีที่แล้ว

    Thankyou

  • @akshaypai2096
    @akshaypai2096 3 ปีที่แล้ว

    Hi, amazing intuition but can you please explain how the derivative indicates the direction to go in

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      Thanks Akshay , very good question, This video is a must watch for you:
      th-cam.com/video/WCp1D-wSolo/w-d-xo.html

  • @prachishewale9457
    @prachishewale9457 2 หลายเดือนก่อน

    Can we always take 0.001 LR

  • @brendawilliams8062
    @brendawilliams8062 4 ปีที่แล้ว

    Thankyou.

  • @joelbharatmonis
    @joelbharatmonis 3 ปีที่แล้ว

    Hi, in Gradient descent since we subtract LR*slope to reach the min sloution, would we, in Max ascent, add the LR * Slope to the old value?

    • @UnfoldDataScience
      @UnfoldDataScience  3 ปีที่แล้ว

      No, just the sign for slope will reverse automatically. (Negative slope). Watch this video to understand this better:
      th-cam.com/video/WCp1D-wSolo/w-d-xo.html

  • @dsa_telugu
    @dsa_telugu 4 ปีที่แล้ว

    nice video... keep going sir.

  • @HasanKarakus
    @HasanKarakus ปีที่แล้ว

    Great job

  • @nazmussaquibkhan
    @nazmussaquibkhan 4 หลายเดือนก่อน

    Thank you.

  • @r1tw1ck46
    @r1tw1ck46 ปีที่แล้ว

    sir love se tane 🥰🥰

  • @iamonkara
    @iamonkara 4 ปีที่แล้ว

    What's the theory/reasoning behind minimizing the function? And can someone pls elaborate on where did Learning Rate originate from pure mathematics not machine learning point of view?

    • @UnfoldDataScience
      @UnfoldDataScience  4 ปีที่แล้ว

      What is the reasoning behind minimizing loss function, watch below videos:
      th-cam.com/video/2-Cg_1FtHk8/w-d-xo.html
      th-cam.com/video/hSAQkeMOdiI/w-d-xo.html
      About Learning rate:
      In plain English, it decides at what speed you want to change your assumptions. For example, Lets say you started at x=15, gradient decent says, x need to be increased, so u want to make it 17 or 20
      This "shift" or technically "step size" is decided by Learning rate.

  • @goelnikhils
    @goelnikhils ปีที่แล้ว

    Amazing Content

  • @suchitranair683
    @suchitranair683 2 ปีที่แล้ว

    Hello Sir,
    my question is, since we have the freedom to choose LR as per our needs, cant i always keep it "1"? or will the change in value of LR lead to different answers that may be wrong?. I want to know what would be the impact of changing LR? Would we get incorrect/less accurate results? If yes, then can you suggest a value that i can choose for LR?

    • @UnfoldDataScience
      @UnfoldDataScience  2 ปีที่แล้ว +2

      Good question, there are few things which are suggested by experimental process outcomes.
      For example p value boundary of 0.5
      Similarly LR range is suggested as decimal number like, We should start with a large value like 0.1, then try exponentially lower values: 0.01, 0.001 etc etc.

  • @mishisareen4925
    @mishisareen4925 3 ปีที่แล้ว

    Awesome!

  • @hse13166
    @hse13166 6 หลายเดือนก่อน

    Is cost function and loss function the same

    • @UnfoldDataScience
      @UnfoldDataScience  6 หลายเดือนก่อน +1

      No - search for "Unfold data science cost function vs loss function" in TH-cam to know the difference.

    • @hse13166
      @hse13166 6 หลายเดือนก่อน

      @@UnfoldDataScience OK Thankyou very much. I saw the video. I got the difference b/w the two.. is at what level we are trying to calculate and optimize these errors. the cost function is aimed at model level, whereas the loss function is for a data point or for an observation, as you say..

  • @CampfireCrucifix
    @CampfireCrucifix ปีที่แล้ว

    My head hurts but I learned a lot.

  • @sadhnarai8757
    @sadhnarai8757 4 ปีที่แล้ว

    Very good Aman

  • @pramod3469
    @pramod3469 4 ปีที่แล้ว

    sir why cannot we directly made partial derivative equal to 0 and then calculate the value of c and m

    • @UnfoldDataScience
      @UnfoldDataScience  4 ปีที่แล้ว

      Hi Pramod, we need to take care of both "m" and "c" hence we take this approach of calculating individually, one for "m" and other for "c"

  • @vyshnavi4751
    @vyshnavi4751 4 ปีที่แล้ว

    Hello sir!! On what basis you have taken learning rate as 0.04? Can you please make it clear for me....

    • @UnfoldDataScience
      @UnfoldDataScience  4 ปีที่แล้ว +1

      Hello, Learning rate is usually taken in the range 0.001 to 0.9 depending on how aggressively we want to converge. For lower learning rate, convergence is slower but it can converge better, on the other hand if we talk a higher learning rate lets say 0.8, our convergence might be fast but we must have risk of missing the global minima. Here 0.4 is taken just as an example. Hope it gives the answer. Happy Learning. tc