I have always struggled with the motivation behind choosing the performance metric as the deviation between the estimator (estimate) and the unknown parameter \theta, for both Bayesian and now this classical setting. Why don't you use an error formulation that directly evaluates the error in the observation themselves (like in regression) since you do not know what \theta's value is? What is the point of minimizing the deviation from a theta that you do not know? Minimizing the probability of error by using MAP or maximum likelihood estimation makes sense to me since it directly uses the observations X. But the mean squared and mean error formulations for the estimate and the unknown is hard for me to justify and it does not direclty relate to the error in the observations
that is interesting. I never noticed that in the bayesian framework for parameter estimation there was a "single best" way to do estimation -- while the frequentist way allowed multiple approaches. Curious why of this ill-poseness difference.
My hunch is that although frequentist is none-unique, due to subjectiveness of choosing priors and crazyness of the mathematics in computing the integrals (numerically & in practice) it also makes the bayesian way impractical/none-unique.
I watched 100+ videos during 2 days, you are my savior john, God bless you.
best lectures on probability ever created
Thank you for sharing this marvellous lecture
crystal clear!
I have always struggled with the motivation behind choosing the performance metric as the deviation between the estimator (estimate) and the unknown parameter \theta, for both Bayesian and now this classical setting. Why don't you use an error formulation that directly evaluates the error in the observation themselves (like in regression) since you do not know what \theta's value is? What is the point of minimizing the deviation from a theta that you do not know? Minimizing the probability of error by using MAP or maximum likelihood estimation makes sense to me since it directly uses the observations X. But the mean squared and mean error formulations for the estimate and the unknown is hard for me to justify and it does not direclty relate to the error in the observations
that is interesting. I never noticed that in the bayesian framework for parameter estimation there was a "single best" way to do estimation -- while the frequentist way allowed multiple approaches. Curious why of this ill-poseness difference.
My hunch is that although frequentist is none-unique, due to subjectiveness of choosing priors and crazyness of the mathematics in computing the integrals (numerically & in practice) it also makes the bayesian way impractical/none-unique.