the contextual bandit is a simple concept, but i get confused with the mathematical abstraction and subscripts etc,, at points the indexes gets tangled up and are inconsistent, in both Bernoulli and linear models you could just use 1 toy example like the coin example to completely illustrate how the algorithm works end to end.
I think there is an error in the posterior predictive distribution in 59:52. First, I suppose that is the posterior predictive of the mean reward instead of the reward because an additional σ^2 is missing from the Covariance of the posterior predictive. But my main concern is on the mean of the posterior predictive distribution. I think it should be x * μ instead of σ^2 * x * μ. Any insights?
Beautiful! Thank you so much!!
the math is very clearly explained, really good one.
the contextual bandit is a simple concept, but i get confused with the mathematical abstraction and subscripts etc,, at points the indexes gets tangled up and are inconsistent, in both Bernoulli and linear models you could just use 1 toy example like the coin example to completely illustrate how the algorithm works end to end.
Thank you so much, explained very well
Thank you professor great lecture
great professor
I think there is an error in the posterior predictive distribution in 59:52. First, I suppose that is the posterior predictive of the mean reward instead of the reward because an additional σ^2 is missing from the Covariance of the posterior predictive. But my main concern is on the mean of the posterior predictive distribution. I think it should be x * μ instead of σ^2 * x * μ. Any insights?
he is really good.
how to do contextual exploration in case of neural network approximation?