As a junior data analyst i felt hurt when you "assumed you already did data collection" 😂 it's basically the most daunting part of the job. Deploying models is fun! Building dockers that dont work is not!
Suggestions: (1) work on a dataset that one can monitor with real data (2) deploy the model API to AWS and/or posit connect (3) showcase drifts when they happen, and show the ways to handle them. I would learn a lot from these items. Thanks for the video!
@@james-h-wade Seconding the posit connect point, would be really get to get a view into how it's done there. I've had issues deploying as there are hardly any good walkthroughs!
Hello James, I have a question. I see that you did EDA first, then split the data into train and test sets. Shouldn't I do EDA after the split to avoid data leakage?
Thanks for sharing that. I'm thinking that should be a topic for a future video. There are many to choose from, and it's hard to understand the differences. My advice is to use the one that works. Posit Connect is the easiest to use in my experience, but it's a pro product.
Great video, now I ca deploy my model as API Can you make a video like this for plumber API deployment to vercel app project? It would be helfpul since if I using huggingface the space must a public and poeple can access to my R code files.
auc is an unreliable metric if classes are imbalanced; prediction probabilities need to be adjusted to "undo" the stratified sampling. you should keep a hold out set (randomly sampled) to verify the performance
As a junior data analyst i felt hurt when you "assumed you already did data collection" 😂 it's basically the most daunting part of the job. Deploying models is fun! Building dockers that dont work is not!
This is a great page
Thank you for the great r content James
I’m glad you like it!
Thanks, James, this was an excellent tutorial.
Thank you very much. This is fantastic.
Suggestions: (1) work on a dataset that one can monitor with real data (2) deploy the model API to AWS and/or posit connect (3) showcase drifts when they happen, and show the ways to handle them. I would learn a lot from these items. Thanks for the video!
Thank you for the suggestions. Those are great ideas for future videos.
@@james-h-wade Seconding the posit connect point, would be really get to get a view into how it's done there. I've had issues deploying as there are hardly any good walkthroughs!
which quarto function did you use to create the chunk with numbers on the right that shows the explanation?
Hello James, I have a question.
I see that you did EDA first, then split the data into train and test sets. Shouldn't I do EDA after the split to avoid data leakage?
Great series! A bit confusing was the part of model deployment-why HuggingFace, what is HuggingFace, other options, etc.
Thanks for sharing that. I'm thinking that should be a topic for a future video. There are many to choose from, and it's hard to understand the differences. My advice is to use the one that works. Posit Connect is the easiest to use in my experience, but it's a pro product.
@@james-h-wadeyes easy, but with a price, even for amateurs. I don't know yet something easier than HF, even though the docker part might scare
Great video, now I ca deploy my model as API
Can you make a video like this for plumber API deployment to vercel app project? It would be helfpul since if I using huggingface the space must a public and poeple can access to my R code files.
auc is an unreliable metric if classes are imbalanced; prediction probabilities need to be adjusted to "undo" the stratified sampling. you should keep a hold out set (randomly sampled) to verify the performance
🎉🎉🎉
the mlops workflow should become a tool that hides the code and only exposes options