I understand your focus is on STEM, but do you think that a deep understanding of linear algebra is necessary for a Data Analyst/Economist to do his job properly? Or is it purely instrumental for these majors? I'm exploring my options, and while it's not a deal breaker for me, I'd like to know if diving into this subject is the best use of my time.
I'm a mathematician/statistician. I can say that in many applications, the assumptions of your models need to be verified and justified. A large part of why there is a reproducibility crisis in science right now is because statisticians are focused on finding new methodologies, and data analysts/scientists (for some reason) don't have the mathematical background to fully understand even the most common models (or just want to be lazy and not check their model's validity). Here is a nice example in regression analysis (multiple regression in this case): In general, the mean function in matrix notation is E(Y|X)=Xb (I'm replacing beta with b here). Now, Y is the vector of responses and X is the n by (p+1) matrix whose ith row is x'_i. To find the ordinary least squares estimator beta hat, you can use the formula (X'X)^(-1)(X'Y) which is a function of the sufficient statistics X'X and X'Y. The KEY thing about this formula is that the inverse of X'X has to exist. That existence is a question whose answer requires knowledge of linear algebra, and immense computational power when the vector X is large enough, to obtain. It doesn't matter how good you are at data analysis if your method isn't justified mathematically. Similar story for variance and the like in the general sense as an extension to what I have said here. There are MANY more examples of simple, but highly important applications of linear algebra in data analysis. Econometrics is no different in this regard. The difference between a statistician and a data analyst/scientist (largely) is that statisticians better understand, mathematically and concretely, the assumptions we need to apply to data in order to model it (fundamental assumptions of simple linear regression which must be satisfied for the model to hold, for example). Data is useless on its own, and you must be able to interpret numerical information accurately. You do this by understanding and studying the relationships present in this numerical information. Since mathematics is, in part, the study of the relationships between objects.... I think you can see where I'm going with this. TL;DR: Yes, linear algebra is EXTREMELY worth it and EXTREMELY important. The most common models for data are LINEAR, or extensions of generalized linear models. Linear Algebra is essential for machine learning, Econometrics, Survey statistics, statistical physics and much, much more. You don't need to master the subject, but you need to be able to understand the mathematics behind what you're doing or you will become just another data analyst/scientist who contributes to the reproducibility crisis because you don't understand the mathematical justifications needed to validate your models of data sets. This is exactly why most PhD programs in statistics/data science REQUIRE a mathematical background.
@@voidzennullspace Thanks for your input. I must admit I've never enjoyed math nor been good at it, but in the end I did manage to grind it out and get into my favored university. I suppose I can make it through - I'm having it much easier than my colleagues who decided to pursue a STEM degree, but I certainly wouldn't want to graduate with the bare minimum knowledge in my field. I want to either have practical skills in place to find a job as Data Analyst after I major or pursue a PhD in Economics if I find that field sufficiently interesting. I understand it's thinking quite far ahead into the future but so far all is well and my desire to be highly specialized or an expert in some area has always been quite strong. Either way I'll be working hard as I need to secure a high GPA to maintain my scholarships and maybe get some other. I picked mandarin as my second foreign language in case I want to work remotely in Asia in the future and joined a science club, but I'm afraid to take anything else as it may prove to be too much for me. I guess I should focus on these mathematical fundamentals the most but I was afraid I would miss out on the chance to get ahead. As Jonathan said so well in the previous video, taking on too many responsibilities could actually make me fall behind, so I guess I should seriously consider spreading these things out across the years. I mean, I am planning to major after all...
Your insights emphasize a crucial distinction in data science that often gets overlooked: the necessity of understanding foundational mathematical principles to validate models properly. Without verifying model assumptions-whether it’s invertibility in regression, variance assumptions, or linear independence-data analysis can yield misleading or non-reproducible results. Your example using multiple regression and the ordinary least squares estimator perfectly illustrates this; if the inverse of \(X'X\) doesn't exist, then the model simply collapses, no matter how "clean" the data might seem. This issue is indeed central to the reproducibility crisis. When analysts skip validating assumptions or treat model outputs as inherently correct without mathematical scrutiny, the reliability of findings diminishes. As you mentioned, data is useless on its own-it requires correct interpretation, which depends on understanding the relationships within it, relationships grounded in linear algebra, calculus, and statistics. Linear algebra, in particular, is indispensable for any serious work in data science, machine learning, and econometrics. The best data analysts and scientists aren’t just crunching numbers; they’re testing and justifying their models mathematically. As you pointed out, there's a reason rigorous math courses are prerequisites in PhD programs for statistics and data science-without them, we risk contributing to flawed science. Sincerely, chatGPT --- Author Jonathan David If you would like to help this channel grow and show your support, here are some ways to do so- Venmo account.venmo.com/u/authorjondt CashApp cash.app/$authorjondt PayPal www.paypal.com/paypalme/authorjond Join TH-cam as a Paid Member (lots of perks!) th-cam.com/channels/dwKRgpXOHvVxlYnqYxXNzw.htmljoin The Ultimate Crash Course PDF for math and physics with a one-time purchase at payhip.com/b/hc0N9 - all future updates and editions included, so you're always equipped to excel in STEM! Thank you all so much for your continued support! Without it, this TH-cam channel would have died a long time ago. All books with access code (limited offer) payhip.com/b/hc0N9 Hardcover Cheat Sheet payhip.com/b/kQ10O Paperback payhip.com/b/VSOnC PDF payhip.com/b/lS8nY Cheat Sheets & Crash Courses Calculus 1-Differential Crash Course payhip.com/b/F437Z Both Calc 1 and 2 payhip.com/b/jkn3N How to Study payhip.com/b/pWrj3 Basic Physics payhip.com/b/y58Sr Algebra payhip.com/b/r6TmE Trigonometry payhip.com/b/LV2y8 PreCalc (alg+rig) payhip.com/b/lIbKH Calculus 1-Differential payhip.com/b/mKsLH Calculus 2-Integral payhip.com/b/OvCYE Calculus 3-Multivariable & Vector payhip.com/b/xHU6Y Linear Algebra payhip.com/b/54on7 Differential Equations payhip.com/b/b6L7k All Subjects Both Cheat Sheet and Crash Course payhip.com/b/ “The Author” Jonathan David
I understand your focus is on STEM, but do you think that a deep understanding of linear algebra is necessary for a Data Analyst/Economist to do his job properly? Or is it purely instrumental for these majors? I'm exploring my options, and while it's not a deal breaker for me, I'd like to know if diving into this subject is the best use of my time.
I'm a mathematician/statistician. I can say that in many applications, the assumptions of your models need to be verified and justified. A large part of why there is a reproducibility crisis in science right now is because statisticians are focused on finding new methodologies, and data analysts/scientists (for some reason) don't have the mathematical background to fully understand even the most common models (or just want to be lazy and not check their model's validity). Here is a nice example in regression analysis (multiple regression in this case):
In general, the mean function in matrix notation is E(Y|X)=Xb (I'm replacing beta with b here). Now, Y is the vector of responses and X is the n by (p+1) matrix whose ith row is x'_i. To find the ordinary least squares estimator beta hat, you can use the formula (X'X)^(-1)(X'Y) which is a function of the sufficient statistics X'X and X'Y. The KEY thing about this formula is that the inverse of X'X has to exist. That existence is a question whose answer requires knowledge of linear algebra, and immense computational power when the vector X is large enough, to obtain. It doesn't matter how good you are at data analysis if your method isn't justified mathematically. Similar story for variance and the like in the general sense as an extension to what I have said here. There are MANY more examples of simple, but highly important applications of linear algebra in data analysis. Econometrics is no different in this regard.
The difference between a statistician and a data analyst/scientist (largely) is that statisticians better understand, mathematically and concretely, the assumptions we need to apply to data in order to model it (fundamental assumptions of simple linear regression which must be satisfied for the model to hold, for example). Data is useless on its own, and you must be able to interpret numerical information accurately. You do this by understanding and studying the relationships present in this numerical information. Since mathematics is, in part, the study of the relationships between objects.... I think you can see where I'm going with this.
TL;DR: Yes, linear algebra is EXTREMELY worth it and EXTREMELY important. The most common models for data are LINEAR, or extensions of generalized linear models. Linear Algebra is essential for machine learning, Econometrics, Survey statistics, statistical physics and much, much more. You don't need to master the subject, but you need to be able to understand the mathematics behind what you're doing or you will become just another data analyst/scientist who contributes to the reproducibility crisis because you don't understand the mathematical justifications needed to validate your models of data sets. This is exactly why most PhD programs in statistics/data science REQUIRE a mathematical background.
@@voidzennullspace Thanks for your input. I must admit I've never enjoyed math nor been good at it, but in the end I did manage to grind it out and get into my favored university. I suppose I can make it through - I'm having it much easier than my colleagues who decided to pursue a STEM degree, but I certainly wouldn't want to graduate with the bare minimum knowledge in my field. I want to either have practical skills in place to find a job as Data Analyst after I major or pursue a PhD in Economics if I find that field sufficiently interesting. I understand it's thinking quite far ahead into the future but so far all is well and my desire to be highly specialized or an expert in some area has always been quite strong. Either way I'll be working hard as I need to secure a high GPA to maintain my scholarships and maybe get some other. I picked mandarin as my second foreign language in case I want to work remotely in Asia in the future and joined a science club, but I'm afraid to take anything else as it may prove to be too much for me. I guess I should focus on these mathematical fundamentals the most but I was afraid I would miss out on the chance to get ahead. As Jonathan said so well in the previous video, taking on too many responsibilities could actually make me fall behind, so I guess I should seriously consider spreading these things out across the years. I mean, I am planning to major after all...
Your insights emphasize a crucial distinction in data science that often gets overlooked: the necessity of understanding foundational mathematical principles to validate models properly. Without verifying model assumptions-whether it’s invertibility in regression, variance assumptions, or linear independence-data analysis can yield misleading or non-reproducible results. Your example using multiple regression and the ordinary least squares estimator perfectly illustrates this; if the inverse of \(X'X\) doesn't exist, then the model simply collapses, no matter how "clean" the data might seem.
This issue is indeed central to the reproducibility crisis. When analysts skip validating assumptions or treat model outputs as inherently correct without mathematical scrutiny, the reliability of findings diminishes. As you mentioned, data is useless on its own-it requires correct interpretation, which depends on understanding the relationships within it, relationships grounded in linear algebra, calculus, and statistics.
Linear algebra, in particular, is indispensable for any serious work in data science, machine learning, and econometrics. The best data analysts and scientists aren’t just crunching numbers; they’re testing and justifying their models mathematically. As you pointed out, there's a reason rigorous math courses are prerequisites in PhD programs for statistics and data science-without them, we risk contributing to flawed science.
Sincerely, chatGPT
---
Author Jonathan David
If you would like to help this channel grow and show your support, here are some ways to do so-
Venmo account.venmo.com/u/authorjondt
CashApp cash.app/$authorjondt
PayPal www.paypal.com/paypalme/authorjond
Join TH-cam as a Paid Member (lots of perks!) th-cam.com/channels/dwKRgpXOHvVxlYnqYxXNzw.htmljoin
The Ultimate Crash Course PDF for math and physics with a one-time purchase at payhip.com/b/hc0N9 - all future updates and editions included, so you're always equipped to excel in STEM!
Thank you all so much for your continued support! Without it, this TH-cam channel would have died a long time ago.
All books with access code (limited offer) payhip.com/b/hc0N9
Hardcover Cheat Sheet payhip.com/b/kQ10O
Paperback payhip.com/b/VSOnC
PDF payhip.com/b/lS8nY
Cheat Sheets & Crash Courses
Calculus 1-Differential Crash Course payhip.com/b/F437Z
Both Calc 1 and 2 payhip.com/b/jkn3N
How to Study payhip.com/b/pWrj3
Basic Physics payhip.com/b/y58Sr
Algebra payhip.com/b/r6TmE
Trigonometry payhip.com/b/LV2y8
PreCalc (alg+rig) payhip.com/b/lIbKH
Calculus 1-Differential payhip.com/b/mKsLH
Calculus 2-Integral payhip.com/b/OvCYE
Calculus 3-Multivariable & Vector payhip.com/b/xHU6Y
Linear Algebra payhip.com/b/54on7
Differential Equations payhip.com/b/b6L7k
All Subjects
Both Cheat Sheet and Crash Course payhip.com/b/
“The Author”
Jonathan David