Great video! This training, validation and testing is relevant for modeling and simulation in general, and you would be surprised how many scientists and practitioners get this wrong.
You explained so many machine learning concepts easily within 15 minutes of this video. But this video aint popular like your cryptography and cybersecurity stuff,explains what the general audience likes
Love the videos! He is really a good teacher - thanks for all the good explanations. but when i see the paper he draws on, it reminds me on 80's printer paper....are they still in use or what is it for?
Idk if it's true or not, but I've heard that some universities bought a quinjabillion metric clucktonnes of that paper way back when it was expected to be used massively for a long time, so they hand it out gladly to whoever has a use for it now.
I loved the series, but I got a bit lost with this video. How does the content of video #8 relate to what was explained up to now? Does video #8 continue where video #7 left off, or does it take its output as an input in some way?
The use of 'precision' here sounds more like 'accuracy' in a truly scientific sense, that being how well it reflects a 'true' or correct outcome. In this vein 'precision' would be more like the ability of the system to repeatedly classify similar data, or the same sets, to the same outcome.
In classification, the definitions for precision and accuracy differ from those commonly used in science. Precision is defined as the proportion of instances correctly classified as positive (true positives) among all the instances classified as positive (true positives + false positives). Accuracy, on the other hand, is defined as the proportion of instances classified correctly (true positives + true negatives) among all instances. So, for example, imagine 100 people take a medical test. 20 are diagnosed with a disease, and among those, 15 do have the disease. Furthermore, of the 80 people not diagnosed with the disease, 5 do have the disease, so 75 people are correctly classified as not having the disease. As a result, the precision of the test is 15/20 = 75%, while the accuracy of the test is (15+75)/100 = 90%.
that's pretty wild that you can automatically create a reasonable decision tree to classify arbitrary data towards an arbitrary target attribute. likewise one could imagine targeting the decision tree towards gender, or income; it sounds like the algorithm doesn't care, it just uses clustering techniques to best group the data to predict the target attribute.
Ramix Nudles here is how I imagine it. The training data was used for training your model (obviously) so running the model on training data will always show 100% accuracy. The testing data is used by the model developer and is used to analyze he performance. The developer can look into the results and see any obvious mistakes and try and correct for them. The validation data would remain invisible to the developer, and would represent ‘new’ data points that the mode would see in the real world after the model has been developed and deployed. It should also perform well for on this with 0 developer interaction or knowledge of the data.
@@MusicBent Pretty much, but you mixed up test and validation data. Validation data is used to evaluate the model after training, or even while it's training on the training data, and see if it needs tweaking to improve its performance. But to make sure we ourselves don't overfit the model to the validation data, we evaluate the model on data unseen by the model (test data) to give a final unbiased assessment of its performance.
So if the data set contains such attributes as gender, race, religion, languages spoken, etc., the machine learning could make modeling decisions on loan approvals for instance heavily based on such factors. Interesting.
Yes. That's precisely why ethics in AI is such a growing concern. Many organizations are working to ensure that these kinds of biases do not inadvertently (or intentionally) make their way into ML-driven decision engines.
I'm not sure what the majority of medical doctors would have to say, but I do hear apprehension on the use of AI to aid in diagnosing patients. Which is interesting, because wouldn't it just be another useful tool at their disposal, such as a stethoscope?
So data classifiers are a new way of building uncompromising bureaucratic rules that escape peer-review and public oversight and not even their creators understand. Got it.
And that can be demonstrably (statistically) fairer (more likely to predict if you'll pay back your debt or not) than any human who decides based on emotion.
Check out the full Data Analysis Learning Playlist: th-cam.com/play/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba.html
Great video! This training, validation and testing is relevant for modeling and simulation in general, and you would be surprised how many scientists and practitioners get this wrong.
What a fantastic series! Will definitely rewatch it. I would love a video about image classification and validating results, confusion matrix ext.
You explained so many machine learning concepts easily within 15 minutes of this video. But this video aint popular like your cryptography and cybersecurity stuff,explains what the general audience likes
Love this series!
Love the videos! He is really a good teacher - thanks for all the good explanations. but when i see the paper he draws on, it reminds me on 80's printer paper....are they still in use or what is it for?
Idk if it's true or not, but I've heard that some universities bought a quinjabillion metric clucktonnes of that paper way back when it was expected to be used massively for a long time, so they hand it out gladly to whoever has a use for it now.
That's exactly what it is, and it's the standard Computerphile paper in all of the videos
Yeah nice touch isn't it, makes me feel like it's the 80s again 😂
You are absolutely handsome and brilliant! I'm so happy to learn from you such a smart and kind soul thank you for sharing your talent with the world
I loved the series, but I got a bit lost with this video. How does the content of video #8 relate to what was explained up to now? Does video #8 continue where video #7 left off, or does it take its output as an input in some way?
The use of 'precision' here sounds more like 'accuracy' in a truly scientific sense, that being how well it reflects a 'true' or correct outcome. In this vein 'precision' would be more like the ability of the system to repeatedly classify similar data, or the same sets, to the same outcome.
In classification, the definitions for precision and accuracy differ from those commonly used in science. Precision is defined as the proportion of instances correctly classified as positive (true positives) among all the instances classified as positive (true positives + false positives). Accuracy, on the other hand, is defined as the proportion of instances classified correctly (true positives + true negatives) among all instances. So, for example, imagine 100 people take a medical test. 20 are diagnosed with a disease, and among those, 15 do have the disease. Furthermore, of the 80 people not diagnosed with the disease, 5 do have the disease, so 75 people are correctly classified as not having the disease. As a result, the precision of the test is 15/20 = 75%, while the accuracy of the test is (15+75)/100 = 90%.
that's pretty wild that you can automatically create a reasonable decision tree to classify arbitrary data towards an arbitrary target attribute.
likewise one could imagine targeting the decision tree towards gender, or income; it sounds like the algorithm doesn't care, it just uses clustering techniques to best group the data to predict the target attribute.
I really want a video just on Support Vector Machines! (Example: why would a traditional neural network outperform it?)
Computer says no 😁
How is "validation" different from "testing"?
Ramix Nudles here is how I imagine it.
The training data was used for training your model (obviously) so running the model on training data will always show 100% accuracy.
The testing data is used by the model developer and is used to analyze he performance. The developer can look into the results and see any obvious mistakes and try and correct for them.
The validation data would remain invisible to the developer, and would represent ‘new’ data points that the mode would see in the real world after the model has been developed and deployed. It should also perform well for on this with 0 developer interaction or knowledge of the data.
Also, nice profile pic 👌🏻
@@MusicBent :-D
@@MusicBent Pretty much, but you mixed up test and validation data. Validation data is used to evaluate the model after training, or even while it's training on the training data, and see if it needs tweaking to improve its performance. But to make sure we ourselves don't overfit the model to the validation data, we evaluate the model on data unseen by the model (test data) to give a final unbiased assessment of its performance.
So if the data set contains such attributes as gender, race, religion, languages spoken, etc., the machine learning could make modeling decisions on loan approvals for instance heavily based on such factors. Interesting.
Yes. That's precisely why ethics in AI is such a growing concern. Many organizations are working to ensure that these kinds of biases do not inadvertently (or intentionally) make their way into ML-driven decision engines.
Only if those attributes are positively correlated with, say, debt default.
I'm not sure what the majority of medical doctors would have to say, but I do hear apprehension on the use of AI to aid in diagnosing patients. Which is interesting, because wouldn't it just be another useful tool at their disposal, such as a stethoscope?
Sir Could you please make a video explaining the resources you use to learn or enhance your programming skills
have a look at reddit.com/r/learnprogramming
@@heyandy889 thanks a lot
I'm passing this exam thanks to you lol
Neural network is gonna beat KNN, Tree, and SVM. But, no, I don't watch Siraj Raval anymore.
So data classifiers are a new way of building uncompromising bureaucratic rules that escape peer-review and public oversight and not even their creators understand.
Got it.
And that can be demonstrably (statistically) fairer (more likely to predict if you'll pay back your debt or not) than any human who decides based on emotion.
@@4.0.4 What a wonderfully naive response.
@@4.0.4 who says the training data isn't biased?
I love this comment.
lol ok