Bayesian Curve Fitting - Your First Baby Steps!

Kapil Sachdeva

มุมมอง 7 246

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 ธ.ค. 2024

ความคิดเห็น •

@TallminJ 5 หลายเดือนก่อน ⁺¹
Your videos are amazing, very clear and concise explanations!
@riadhbennessib3961 2 ปีที่แล้ว ⁺³
Thank you so much for the videos lessons, it encourage me to re-see the hard book of Bishop!
@KapilSachdeva 2 ปีที่แล้ว
🙏
@wsylovezx ปีที่แล้ว ⁺¹
greatly appreciate your super clear video. I have an question at 5:48, by Bayesian formula, p(w|x,t) ∝ p(x,t|w)*p(w), it is different with the expression p(w|x,t) ∝ p(t,x|w)*p(w) from your slide.
@lakex24 ปีที่แล้ว ⁺¹
should be: it is different with the expression p(w|x,t) ∝ p(t|x, w)*p(w) from your slide.
@bodwiser100 6 หลายเดือนก่อน
Thank you! One request -- can you explain the reason behind the equivalence between assuming that the target variable is normally distributed and the assumption that the errors are normally distributed. While I understand that the two assumptions are simply the two sides of the same coin, the mathematical equivalence between them appeared to me like something that is implicitly assumed in moving from part 2 video to part 3 video.
@vi5hnupradeep 3 ปีที่แล้ว ⁺⁴
thank you so much for your videos! they are really good at explaining concepts
@KapilSachdeva 3 ปีที่แล้ว
🙏
@yeo2octave27 3 ปีที่แล้ว
Thank you for the video! I am currently reading up on manifold regularization and I am curious about applying Bayesian methods with MR.
12:11 for elastic net/manifold regularization we introduce a second regularization term to our analytical solution, could we simply express the prior as being conditioned on the 2 hyperparameters, i.e. p(w | \alpha, \gamma), by applying the Bayes theorem? How then could we arrive at an expression of the distribution of w? i.e. w ~ N(0, \alpha^(-1) * I)
@adesiph.d.journal461 3 ปีที่แล้ว
Sorry for spamming with questions. In terms of programming when we say p(w) is a prior is this equivalent to initializing the weight with random a Gaussian, like in PyTorch "torch.nn.init.xavier_uniform(m.weight)"
@KapilSachdeva 3 ปีที่แล้ว ⁺¹
Please do not hesitate to ask questions.
Prior is “your belief” about the value of a random variable ( w in this case).
“Your belief” is your (data analysts/scientists) domain knowledge about w that you express as a random variable.
Let’s take a concrete example. You were modeling the distribution of heights of adult males in India. Even before you go and collect the dataset you would have a belief about the height of adult males in India. Based on your experience you would say that it could be anything in between 5’ and 6’.
If you think that all the values between 5’ and 6’ are equally likely then you would say that my prior is the uniform distribution with support from 5’ to 6’
Now coming to your pytorch expression, it is creating a tensor whose values are uniformly distributed (between 0 and 1). In neural networks, typically you fill in the “random” values to initialize your weights. You donot typically express your domain knowledge (aka the prior as in Bayesian statistics)
Based on above, philosophically, the answer is no prior is not equivalent to your expression however implicitly it your belief (albeit completely random) about the “initial” values of weights.
@adesiph.d.journal461 3 ปีที่แล้ว ⁺¹
@@KapilSachdeva thank you so much! This makes total sense. I went on to watch the videos a few times to make sure the concepts sync in completely before I advance and every iteration of the video things are becoming more clear and I am able to connect things! Thanks!
@KapilSachdeva 3 ปีที่แล้ว
@@adesiph.d.journal461 🙏
@sbarrios93 3 ปีที่แล้ว ⁺²
This video is pure gold. Thank you!!
@KapilSachdeva 3 ปีที่แล้ว
🙏🙏
@YT-yt-yt-3 4 หลายเดือนก่อน
P(w|x) - what is x mean her exactly. Probability of weight given different data within training or different training set or something else?
@adesiph.d.journal461 3 ปีที่แล้ว
I came for this part from the book and you nailed it! Thanks.
A quick question how are you differentiating in terms of notation between Conditional Probability and Likelihood. I find it confusing in PRML. To my understanding, Conditional Probability is a scalar value that indicates the chance of an event (the one in the numerator (I understand this is not a numerator but to convey my point)) given the the events in (denominator) have occurred. While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value. I might be wrong! happy to be corrected :) The confusion is ideally raised because in the previous part we had p(t|x,w,beta) here we wanted to find optimal w,beta to "maximize the likelihood of t". While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability. They maybe naive questions! Sorry!
@KapilSachdeva 3 ปีที่แล้ว ⁺²
No not a naive question. The difference between probability and likelihood has bothered many people. Your confusion stems from the fact that overloaded and inconsistent usage of notations and terminology is one of the root causes of why learning maths & science is difficult.
Unfortunately, the notation of likelihood is the same as that of conditional probability. The "given" is indicated using the "|" operator. Both likelihood and conditional probabilities you have "given" operators. In some literature & context, the symbol "L" is used (with parameter and data flipped)
> While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value.
> While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability.
Here is one way to see all this and make sense of the terminology. In MLE, your objective is to find the values of parameters (mu, etc) keeping the data fixed. The outcome is what we call - likelihood. This likelihood is kind of a relative plausibility (probability) or proportional to probability.
Now when we treat "parameters" as Random Variables then we seek their "probability distributions". A parameter (RV) could depend on another parameter (RV or Scalar) and hence these probability distributions take the form of conditional prob distributions.
Hope this makes sense.
@adesiph.d.journal461 3 ปีที่แล้ว ⁺¹
@@KapilSachdeva Thank you so much for such a detailed response. Yes my confusion did come from the fact my previous knowledge of Likelihood had L as the notation and reversed notation for the distribution. This makes sense thank you!
@KapilSachdeva 3 ปีที่แล้ว ⁺¹
🙏
@yogeshdhingra4070 2 ปีที่แล้ว ⁺¹
Your lectures are gems..there is so much to learn here! Thanks for such a great explanation.
@KapilSachdeva 2 ปีที่แล้ว
🙏
@zgbjnnw9306 3 ปีที่แล้ว
why the posterior p(w | x t) uses the likelihood p(t | x w B) instead of p( x t | w )? Why there's B in the likelihood?
@KapilSachdeva 3 ปีที่แล้ว
This is the inconsistency of the notation that I talk about. Normally we would think that whatever goes after “|” (given) is a probability distribution but the notation allows to use scalar/hyper parameter/point estimates as well.
Logically it is ok as even though in this exercise we are not treating beta as prob distribution, the likelihood still depends on it. Hence it is okay to include it in the notation.
This is what makes me sad. The inconsistency in notation in the literature and books.
@zgbjnnw9306 3 ปีที่แล้ว
Kapil Sachdeva Thanks for your help! So beta and X are both considered ‘constant’ like alpha?
@KapilSachdeva 3 ปีที่แล้ว
@@zgbjnnw9306 you can see it like that. Nothing wrong with it, however a better way of saying that would be -:
Beta is either a hyper parameter (something u guess or set it based on your domain expertise) or a point estimate that you obtain using frequentists methods.
@goedel. 2 ปีที่แล้ว ⁺¹
Thank you!
@KapilSachdeva 2 ปีที่แล้ว
🙏
@zgbjnnw9306 3 ปีที่แล้ว
at 12:38, if you make two equations equal, lambda is not calculated as the ration of alpha/beta... the equation of lambda includes the sum of deviation and W^tW...
@KapilSachdeva 3 ปีที่แล้ว
The value of lambda is not obtained by equating 2 equations. It’s purpose is to show that the hyper parameter (lambda) in ridge regression can be seen as a ratio of alpha and beta. In other words, the MAP equation is scaled by 1/beta.
@zgbjnnw9306 3 ปีที่แล้ว
@@KapilSachdeva Thanks! Where I can see the derivation of lambda written as alpha/beta? Could I find it in the book by Bishop?
@KapilSachdeva 3 ปีที่แล้ว
@@zgbjnnw9306 section 1.2.5 of Bishop …the very last lines …
@SANJUKUMARI-vr5nz 2 ปีที่แล้ว ⁺¹
Very nice vedio
@KapilSachdeva 2 ปีที่แล้ว
🙏
@pythonerdhanabanshi4554 2 ปีที่แล้ว ⁺¹
I would push multiple likes if available...so satisfying...
@KapilSachdeva 2 ปีที่แล้ว
🙏
@stkyriakoulisdr 3 ปีที่แล้ว ⁺¹
The only mistake in this video is that "a posteriori" is latin and not french. cheers!
@KapilSachdeva 3 ปีที่แล้ว ⁺¹
You are absolutely correct. Many thanks for spotting it and informing me.
@stkyriakoulisdr 3 ปีที่แล้ว ⁺¹
@@KapilSachdeva I meant it as a compliment. Since the rest of the video was so well-explained
@KapilSachdeva 3 ปีที่แล้ว
I understood it :) ... but I am genuinely thankful for this correction because so far I had thought it is French. Your feedback will help me not make this mistake again.
@sujathaontheweb3740 23 วันที่ผ่านมา
@kapil
How did you think of formulating the problem as p(w|x, t)?

ต่อไป

เล่นอัตโนมัติ

Sum Rule, Product Rule, Joint & Marginal Probability - CLEARLY EXPLAINED with EXAMPLES!