To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/VeryNormal. You’ll also get 20% off an annual premium subscription.
Hello, Sir Christian. I am a fresh graduate from highschool and I will take BS Stat for my college in August this year. I am not a professional statistician, but I have worked in related fields in Stat because I have interest in it. I successfully proved that the sampling distribution is N(mean, variance/n). I used convolutions in the Algebra of Random Variables. What I think is the reason why so many people have little intuition about Stat is that we are taught wrong from the start. The main object in the study of Stat is the Random Variable. However, we are introduced to the Random Variable using the Kolmogorov Axioms. This is backward and I hate that axiom. ZFC is not great in modelling randomness, since logic is deterministic. Here's how I thought of random variables. All random variables are just transformations of a uniform random variable. This transformation is the function called the inverse transform. Watch this... Let X be a normal variable sampled from N(0,1). Find the probability that X is less than 1. The CDF is Phi(x) Consider the inverse of the CDP, invPhi(x). We can create an equality X = invPhi(u), where u is a uniform random variable. P(X
It just became easier when every random variables are just transformed uniform r.v.s. The reason why I think this is often encountered late is that many people think that Algebra of Random Variables is a higher course than the Set Theory used in the Kolmogorov Axioms. Algebra of Random Variables only need calculus and a knowledge of function. Kolmogorov Functions are very confusing for me. I can understand it, like the measure space, but I don't agree with it. It is not concrete and it is not a great model for Random Variables. When your very basic fundamentals, Random Variables, are not fully understood, how can students understand higher concepts like the Chi-distribution. Also, I hate this notation X~ N(0,1). I hate the symbol ~ because it is not concrete. I hated how floaty Statistics is done in most lectures. They are not rigorous. I hate it so much that I created my own model of how to think of Random variables. I posted some of my own notes about the subject in my fb page under my real name, Christian Jake S. Austria. I am trying to make Stat more rigorous step by step. As of now, I proved using just convolutions that the sum of two normally i.r.v.s is also a normal with the mean and variance added. I then used this fact to create a series of convolution which led me to prove that the sampling distribution of the mean is N(mean, variance/n). I think that this is how Stats should be introduced to us students. We must be taught using the Inverse Transform and not through Kolmogorov Axioms. We must also not use any Stat formulas like test-statistics and sum of normals without even proving it yourself.
1. Use existing schemas or analogies: relate new material to old 2. Develop a generalization or abstraction to relate materials: look for patterns to overcome ambiguity 3. Learn a programming language to speed up calculations & visualizations, tinker with concepts 4. Ask and answer your own questions, the challenge and process help troubleshooting and learning 5. Most assumptions cant be verified, but they can be shared Justin Sung covers some of the more general learning tips here, I highly recommend his channel!
my tip is when you’re doing textbook exercises don’t be afraid to start with the answer then work out the method/how it’s obtained. in all the stats classes i’ve taken, knowing HOW to get the correct answer is far more important than whether or not your math is mistake free. and you will love your cheat sheet a lot more if you use it for the homework’s as well. my stats 1 professor said it should be your best friend for the entire course and that advice has served me well
I’m studying psychology and lots of statistics is involved Therefore I’m happy I found your channel At first conceptual learning was very difficult for me because I wasn’t used to it But over time and playing around with comparing and seeing differences and similarities it makes sense So over time one gets better The tricky part in my case is that I also have to know lots of details and facts But to have a structure a visual map of the subject reduces the facts to memorize
I remember really struggling with the idea of skewed distributions as it seemed to me that a skewed distribution would have a greater portion of the CDF in its half than the other, however it was always the other way around. Eventually I accepted my intuition was wrong and to this day, I remember it as whatever isn’t intuitive
I felt similarly during my probability course for my masters in engineering. Instead of memorizing each type of random variable, I simply learned the 2 important processes: bernoulli and poisson. Discrete distributions (binomial, geometric, pascal) stem from the bernoulli process and the continuous ones (poisson, exponential, erlang) stem from a poisson process.
@@very-normalYou got a PhD in what...exactly? You got a PhD in Statistics? Or...you are a scientific researcher who learned a lot of Statistics. You weren't very clear on that.
I used to do all of these things but kind of got out of habit with them since I've started working. Gonna watch this video again tomorrow and try to implement them a bit more again. Thanks, very normal!
The only class I took was mathematical statistics class. I liked and could understand the theory itself but I found the application really hard. I took the class as third year math major, and I was little bit aware of the probability theory and measure theory so understanding the theories was not bad at all, but , but the homework problems were real world applications questions, rather than proofs, so I had really hard time with it. I find statistics really interesting but I find it very difficult compared to classes like analysis, topology, algebra.
Great video! I just took exam MAS-I (aguably the hardest general statistics exam out there). Wish I had seen this before I began studying😂. No cheat sheet allowed, so it's really important to find these patterns and understand what's really going on (understanding is a lot easier than memorizing hundreds of formula). Keep up the great work!
I honestly didn’t even touch null hypothesis until like a year into stats with ANOVA and started with distribution types, probability and random variables, summary mathematical statistics and then finally ANOVA. Honestly, with a solid foundation in math and stats to begin with, I would recommend the strategy to anyone
Good points! One thing I did in my masters that I didn't do in my undergrad was to sit down with the textbooks and actually read the formulas step-by-step instead of glossing over them and assuming I understood how they got from step 1 to step 7. Even found one or two errors this way! Which just goes to show how often people actually 'read' the maths!
Multicollinearity,to my understanding,renders the design matrix approximately non invertible,since it approximately has dependent rows (correlation between variables= almost linear relation between their respective columns-vectors inside the design matrix.
Yeah that’s another good way of looking at it! Then you can connect it back to eigenvalues and having even more ways for sniffing out multicollinearity, giving you more options
A very encouraging video! I've always struggled with stats because I've always been taught "use this method in X situation", but never really understand how the equation is derived or why it it includes certain values. I'm a very bottom up thinker, so I really need to see how everything is connected. Your example at 6:28 really helped connect the dots for me!
The way you’re putting it its the way my statistical modules are organised. Monday we do pure theory and rigorous statistical proofs and learn the abstract concepts. Tuesday is the theory behind the application of the concepts and applying them by writing down. Wednesday it’s coding the concepts from Tuesday. Thursday is theory and coding on just multivariate analysis and Friday We work on research 🤣I guess my department really did well in organising the modules so we have a solid understanding of the theory and practical applications
As a final year student in statistics, i have already passes the courses of mathematical statistics(estimation and test science),but still i dont get clear about it. I look many youtube videos(especually Mit full.course of introduction to mathematical statistics by professor Rigollet)but i feel science of constructing test and estimation algorithms are very deep and goes over my mind, i am trying to learn bayesianly(updating my understanding by rewatching this videos...),but i know my time is limited as i have to be job-ready within year I hope you will also cover these topics as i can build a solid intuition and get motivated to dive deeper
Ran your code myself (just a few small differences, but mostly identical), the std error plots look very similar, but my coefficient estimate plots are very spiky, they're not smooth like yours are. Are you just taking the mean estimate across all 1000 replicates for each level of rho or are you doing something else? I took a peak at your github, but doesn't look like you've committed anything public there in a while.
Oh sorry, I forgot to update the GitHub with this code. I’ll get back to this later, but I can give you my first impression of where the spiky plot is coming from. I am taking the mean estimates across the 1000 simulations for each rho, like you described. Did you group your plot by parameter? It’s possible that your plot is plotting lines of all three parameters simultaneously. Maybe not, but this is the most common reason I get spiky plots
@@very-normal Here's my code: I'm grouping by rho and term (the estimates produced by the tidy function in tidyr) so I should not be mixing terms I don't think. results mutate( sim = map(rho, function(p) { set.seed(runif(1,1,10000) |> ceiling()) R clean_names() |> dplyr::select(-c(statistic, p_value)) |> filter(term != "(Intercept)") |> group_by(rho, term) |> summarise( mean_est = mean(estimate), mean_std = mean(std_error) ) agg_data |> ggplot(aes(x=rho, y=mean_est, colour=term)) + geom_line() agg_data |> ggplot(aes(x=rho, y=mean_std)) + geom_line() + facet_wrap(~term)
@@very-normal If you run that code, I bet you'll see the spikiness I'm talking about. The std err has a little bit of it, but not much and looks much more like your plots.
Great video on motivation for those of us who didn't had pure maths background. Nevertheless, you know why you rely solely on a single theorem as CLT? That's right it's pure mathematics what you were thinking of... you was using their way of thinking and that's why normally it's so Hard for us. Cuz we were never teached like that.
why not julia? afaik it's way faster than both python and r, it have more things out of the box than python (for things like linear algebra and statistics) and i like its syntax more
I might pick it up later, I remember someone else mentioning it would be good. but statistician positions don’t usually list Julia as a qualification so it’ll have to wait lol
Mutually orthogonal principle components. Mom tells child to clean room, I.I.D condition very weakly valid. Mom tells teen same thing, other way. Excellent real world application Christian.
Let X be your design matrix and and y the vector of the dependent variable. Then (X'X)^{-1}(X'y) is the OLSE. Call it bhat. Taking the expectation yields E[bhat] = E[E[(X'X)^{-1}(X'y)|X]] = E[(X'X)^{-1}(X'E[y|X]] = E[(X'X)^{-1}(X'Xb + E[u|X])] = b + 0 = b provided that your model is correct (i.e., y = Xb + u and conditional mean independence, i.e., E[u|X] = 0). Note that I did not make any assumption about multicollinearity. Of course, perfect multicollinearity is ruled out as otherwise (X'X)^{-1} does not exist
Right, you’re saying that the OLS estimators are unbiased for the true parameters, given the model is correctly specified. But unbiasedness doesn’t mean you still get good estimates when you actually run it on a dataset. Multicollinearity explodes your standard errors, which can greatly influence your estimated values.
@@very-normal that's true. However, this is only due to the fact that numerical solvers have a hard time inverting (X'X) if multicollinearity is present. In the near future, this may no longer be a problem given the advances in numerical linear algebra :)
oooh okay I kinda see what you’re seeing, sorry about that. It’s supposed to be someone in front of a chalkboard. I won’t use the logo from here on out
Estimator, moments, hypothesis, power, parametric, covariance, statistic, kurtosis, inference, probabiliyt, regression... Just too much jargon and theory. :(
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/VeryNormal. You’ll also get 20% off an annual premium subscription.
Hello, Sir Christian. I am a fresh graduate from highschool and I will take BS Stat for my college in August this year.
I am not a professional statistician, but I have worked in related fields in Stat because I have interest in it.
I successfully proved that the sampling distribution is N(mean, variance/n). I used convolutions in the Algebra of Random Variables. What I think is the reason why so many people have little intuition about Stat is that we are taught wrong from the start.
The main object in the study of Stat is the Random Variable. However, we are introduced to the Random Variable using the Kolmogorov Axioms. This is backward and I hate that axiom. ZFC is not great in modelling randomness, since logic is deterministic.
Here's how I thought of random variables. All random variables are just transformations of a uniform random variable. This transformation is the function called the inverse transform.
Watch this...
Let X be a normal variable sampled from N(0,1). Find the probability that X is less than 1.
The CDF is Phi(x)
Consider the inverse of the CDP, invPhi(x).
We can create an equality
X = invPhi(u), where u is a uniform random variable.
P(X
It just became easier when every random variables are just transformed uniform r.v.s. The reason why I think this is often encountered late is that many people think that Algebra of Random Variables is a higher course than the Set Theory used in the Kolmogorov Axioms. Algebra of Random Variables only need calculus and a knowledge of function.
Kolmogorov Functions are very confusing for me. I can understand it, like the measure space, but I don't agree with it. It is not concrete and it is not a great model for Random Variables. When your very basic fundamentals, Random Variables, are not fully understood, how can students understand higher concepts like the Chi-distribution.
Also, I hate this notation X~ N(0,1). I hate the symbol ~ because it is not concrete. I hated how floaty Statistics is done in most lectures. They are not rigorous. I hate it so much that I created my own model of how to think of Random variables.
I posted some of my own notes about the subject in my fb page under my real name, Christian Jake S. Austria.
I am trying to make Stat more rigorous step by step. As of now, I proved using just convolutions that the sum of two normally i.r.v.s is also a normal with the mean and variance added. I then used this fact to create a series of convolution which led me to prove that the sampling distribution of the mean is N(mean, variance/n). I think that this is how Stats should be introduced to us students. We must be taught using the Inverse Transform and not through Kolmogorov Axioms. We must also not use any Stat formulas like test-statistics and sum of normals without even proving it yourself.
1. Use existing schemas or analogies: relate new material to old
2. Develop a generalization or abstraction to relate materials: look for patterns to overcome ambiguity
3. Learn a programming language to speed up calculations & visualizations, tinker with concepts
4. Ask and answer your own questions, the challenge and process help troubleshooting and learning
5. Most assumptions cant be verified, but they can be shared
Justin Sung covers some of the more general learning tips here, I highly recommend his channel!
Justin Sung is my hero
@@very-normal I should've known as soon as I saw those Anki cards lmao Thanks for all the great material dude
Baby, stop the Montecarlo. Very normal got a new video!
Hilarious! For whom the Bell tolls...
my tip is when you’re doing textbook exercises don’t be afraid to start with the answer then work out the method/how it’s obtained. in all the stats classes i’ve taken, knowing HOW to get the correct answer is far more important than whether or not your math is mistake free.
and you will love your cheat sheet a lot more if you use it for the homework’s as well. my stats 1 professor said it should be your best friend for the entire course and that advice has served me well
Great tips! I’ve always been an advocate for focusing on the process of the answer, rather than the answer itself
I’m studying psychology and lots of statistics is involved
Therefore I’m happy I found your channel
At first conceptual learning was very difficult for me because I wasn’t used to it
But over time and playing around with comparing and seeing differences and similarities it makes sense
So over time one gets better
The tricky part in my case is that I also have to know lots of details and facts
But to have a structure a visual map of the subject reduces the facts to memorize
loved the tips! Will definetely follow them
I remember really struggling with the idea of skewed distributions as it seemed to me that a skewed distribution would have a greater portion of the CDF in its half than the other, however it was always the other way around. Eventually I accepted my intuition was wrong and to this day, I remember it as whatever isn’t intuitive
I felt similarly during my probability course for my masters in engineering. Instead of memorizing each type of random variable, I simply learned the 2 important processes: bernoulli and poisson. Discrete distributions (binomial, geometric, pascal) stem from the bernoulli process and the continuous ones (poisson, exponential, erlang) stem from a poisson process.
Fish
Obsidian is really good for interconnecting ideas!
Tip from me: understand linear algebra first
this tip is so true it hurts
@@very-normalYou got a PhD in what...exactly? You got a PhD in Statistics? Or...you are a scientific researcher who learned a lot of Statistics. You weren't very clear on that.
@mathematicaleconomist4943 I’m currently a Ph.D candidate in Biostatistics
@@very-normal OH. OK. Thanks! I am still watching...
I appreciate you giving me some of your time 🙌🏽
Thank you for the breakdown on how multicollinearity harms the model. That ended up being a question on my final
I used to do all of these things but kind of got out of habit with them since I've started working. Gonna watch this video again tomorrow and try to implement them a bit more again. Thanks, very normal!
The only class I took was mathematical statistics class. I liked and could understand the theory itself but I found the application really hard. I took the class as third year math major, and I was little bit aware of the probability theory and measure theory so understanding the theories was not bad at all, but , but the homework problems were real world applications questions, rather than proofs, so I had really hard time with it. I find statistics really interesting but I find it very difficult compared to classes like analysis, topology, algebra.
One of the most helpful and enjoyable videos I've ever seen! Thanks!
Your videos are saving the stats section on my maths A level lmao. Thanks so much!
Great video! I just took exam MAS-I (aguably the hardest general statistics exam out there). Wish I had seen this before I began studying😂. No cheat sheet allowed, so it's really important to find these patterns and understand what's really going on (understanding is a lot easier than memorizing hundreds of formula). Keep up the great work!
I'm currently studying the goodness of fit test and love how you explained it, it really helped me understand it that bit more.
Such a good video man, thank you so much for your time
I honestly didn’t even touch null hypothesis until like a year into stats with ANOVA and started with distribution types, probability and random variables, summary mathematical statistics and then finally ANOVA. Honestly, with a solid foundation in math and stats to begin with, I would recommend the strategy to anyone
13:06 - "You need to know the exact limits of your understanding, and you get that the fastest by just struggling with the ideas." ✊
Good points! One thing I did in my masters that I didn't do in my undergrad was to sit down with the textbooks and actually read the formulas step-by-step instead of glossing over them and assuming I understood how they got from step 1 to step 7. Even found one or two errors this way! Which just goes to show how often people actually 'read' the maths!
Multicollinearity,to my understanding,renders the design matrix approximately non invertible,since it approximately has dependent rows (correlation between variables= almost linear relation between their respective columns-vectors inside the design matrix.
Yeah that’s another good way of looking at it! Then you can connect it back to eigenvalues and having even more ways for sniffing out multicollinearity, giving you more options
@@very-normal I don't understand what the heck you've said but the video motivated me to learn how to learn.
I could say the same about your “fish” comment elsewhere, but hell yeah go gettem
A very encouraging video! I've always struggled with stats because I've always been taught "use this method in X situation", but never really understand how the equation is derived or why it it includes certain values. I'm a very bottom up thinker, so I really need to see how everything is connected. Your example at 6:28 really helped connect the dots for me!
I hope that you get more subs bro! I like your content.
Such a helpful vid, thank u!!
Oh, Very Normal channel, you're surelly in the upper quartile of a channels quality distribution
The way you’re putting it its the way my statistical modules are organised. Monday we do pure theory and rigorous statistical proofs and learn the abstract concepts. Tuesday is the theory behind the application of the concepts and applying them by writing down. Wednesday it’s coding the concepts from Tuesday. Thursday is theory and coding on just multivariate analysis and Friday We work on research 🤣I guess my department really did well in organising the modules so we have a solid understanding of the theory and practical applications
Really appreciate your videos, keep up the great work my guy!
Good summary
Good comment
What a time to be alive!
As a final year student in statistics, i have already passes the courses of mathematical statistics(estimation and test science),but still i dont get clear about it.
I look many youtube videos(especually Mit full.course of introduction to mathematical statistics by professor Rigollet)but i feel science of constructing test and estimation algorithms are very deep and goes over my mind, i am trying to learn bayesianly(updating my understanding by rewatching this videos...),but i know my time is limited as i have to be job-ready within year
I hope you will also cover these topics as i can build a solid intuition and get motivated to dive deeper
Best of luck with your job search! I have to search soon too
Ran your code myself (just a few small differences, but mostly identical), the std error plots look very similar, but my coefficient estimate plots are very spiky, they're not smooth like yours are. Are you just taking the mean estimate across all 1000 replicates for each level of rho or are you doing something else? I took a peak at your github, but doesn't look like you've committed anything public there in a while.
Oh sorry, I forgot to update the GitHub with this code. I’ll get back to this later, but I can give you my first impression of where the spiky plot is coming from. I am taking the mean estimates across the 1000 simulations for each rho, like you described.
Did you group your plot by parameter? It’s possible that your plot is plotting lines of all three parameters simultaneously. Maybe not, but this is the most common reason I get spiky plots
@@very-normal Here's my code:
I'm grouping by rho and term (the estimates produced by the tidy function in tidyr) so I should not be mixing terms I don't think.
results
mutate(
sim = map(rho, function(p) {
set.seed(runif(1,1,10000) |> ceiling())
R
clean_names() |>
dplyr::select(-c(statistic, p_value)) |>
filter(term != "(Intercept)") |>
group_by(rho, term) |>
summarise(
mean_est = mean(estimate),
mean_std = mean(std_error)
)
agg_data |>
ggplot(aes(x=rho, y=mean_est, colour=term)) +
geom_line()
agg_data |>
ggplot(aes(x=rho, y=mean_std)) +
geom_line() +
facet_wrap(~term)
@@very-normal If you run that code, I bet you'll see the spikiness I'm talking about. The std err has a little bit of it, but not much and looks much more like your plots.
Wow. Amazing video.
Great video on motivation for those of us who didn't had pure maths background. Nevertheless, you know why you rely solely on a single theorem as CLT? That's right it's pure mathematics what you were thinking of... you was using their way of thinking and that's why normally it's so Hard for us. Cuz we were never teached like that.
Love ur vids man!
Thank you!
why not julia? afaik it's way faster than both python and r, it have more things out of the box than python (for things like linear algebra and statistics) and i like its syntax more
I might pick it up later, I remember someone else mentioning it would be good. but statistician positions don’t usually list Julia as a qualification so it’ll have to wait lol
Ah I remember seeing chi squared and being really confused it wasn’t x^2
GREAT VIDEO LETS GO
l e t s g o o o o o o o
What softwares do you use to produce these videos?
Final Cut Pro for editing, manim for some math animations, and figma+midjourney for design
@@very-normal Awesome! Thsnk you for the quick reply!
I haven’t even started watching the video, but the like button is already pressed.
no takebacks sir
@@very-normal Definitely not, sir 🫡
Time to open the "Very Normal" vault and take some notes.
Okay nice as someone not in the field, I get almost 4 on the list 😂
Mutually orthogonal principle components. Mom tells child to clean room, I.I.D condition very weakly valid. Mom tells teen same thing, other way. Excellent real world application Christian.
I constantly lost the meaning of all sorts of conditional distributions in an AI model, after being tumbled in the Bayes formula ...
multicollinearity does not bias your results
Let X be your design matrix and and y the vector of the dependent variable. Then (X'X)^{-1}(X'y) is the OLSE. Call it bhat. Taking the expectation yields E[bhat] = E[E[(X'X)^{-1}(X'y)|X]] = E[(X'X)^{-1}(X'E[y|X]] = E[(X'X)^{-1}(X'Xb + E[u|X])] = b + 0 = b provided that your model is correct (i.e., y = Xb + u and conditional mean independence, i.e., E[u|X] = 0). Note that I did not make any assumption about multicollinearity. Of course, perfect multicollinearity is ruled out as otherwise (X'X)^{-1} does not exist
Right, you’re saying that the OLS estimators are unbiased for the true parameters, given the model is correctly specified. But unbiasedness doesn’t mean you still get good estimates when you actually run it on a dataset.
Multicollinearity explodes your standard errors, which can greatly influence your estimated values.
@@very-normal that's true. However, this is only due to the fact that numerical solvers have a hard time inverting (X'X) if multicollinearity is present. In the near future, this may no longer be a problem given the advances in numerical linear algebra :)
putting pepes in your video is an easy way to get me to close the tab. edit: I made it to the first wojack.
to each their own
What does harm mean 😂
Nice Video, as a German I do feel uncomfortable with the "Office Hours" icon though 😅 Please tell me I'm not the only one who sees this...
oooh okay I kinda see what you’re seeing, sorry about that. It’s supposed to be someone in front of a chalkboard. I won’t use the logo from here on out
Estimator, moments, hypothesis, power, parametric, covariance, statistic, kurtosis, inference, probabiliyt, regression... Just too much jargon and theory. :(