I'm just starting to get acquainted with MLOps at work, and I must say, this entire series of videos is SUPER useful! I'll be recommending this channel to everyone I know in my circles. Thanks a ton for everything you do, Elle! :D
Thank you so much for this tutorial. I was struggling to understand how exactly DVC connected to cloud services, and your Google Drive example was extremely clear and simple.
-d flag is not working in 'v3.42.0' at 3:33. --default flag worked perfectly also the authentication is changed by little bit. This is for those who are watching in 2024.
Yes! Thank you @CRCE__Harkik_Prajapati! 🙏 Please see notes in the description about these updated docs: ✨ 𝗥𝗲𝗺𝗼𝘁𝗲 𝗠𝗼𝗱𝗶𝗳𝘆 𝗗𝗼𝗰𝘀: dvc.org/doc/command-reference/remote/modify ⚙️ 𝗛𝗼𝘄 𝘁𝗼 𝗦𝗲𝘁𝘂𝗽 𝗮 𝗚𝗼𝗼𝗴𝗹𝗲 𝗗𝗿𝗶𝘃𝗲 𝗗𝗩𝗖 𝗥𝗲𝗺𝗼𝘁𝗲: dvc.org/doc/user-guide/data-management/remote-storage/google-drive
Thanks for the intro! This was super helpful. Now I'm trying to use DVC for a computer vision project but I'm running into issues. My dataset is not too large ~40GB, I ran `dvc add` and it's been running for almost half an hour. Is this normal? I'm trying to find tutorials going through how to manage CV datasets but no luck. Do you know of any tutorials/documentation that could help me?
@Sabrina Pereira We see you found Alex's video Machine Learning Experimentation with DVC and VS Code. We are currently working on a new tool that will work with MUCH larger datasets, but in the meantime you may want to check out how DeepXHub uses DVC and CML in their CV projects: th-cam.com/video/GEpmbgR9dLo/w-d-xo.html
I enjoyed the video, but I have a question: isn't Git LFS accomplishing the same goal? My understanding is that we can already use Git LFS to store large files outside of our repositories, but still track their versioning. What would be the advantage of using DVC instead of Git LFS in this case?
@douglasmsantos Thanks for the question! Here's a great blog post from one of our Community members that addresses the issue and why they switched: mlops.systems/tools/redactionmodel/computervision/mlops/2022/05/24/data-versioning-dvc.html And you can check out our docs around the issue here: dvc.org/doc/user-guide#comparison-with-related-technologies
facing this error dvc push ERROR: unexpected error - Failed to authenticate GDrive: Access token refresh failed: invalid_grant: Token has been expired or revoked.
Hi @pratikmore_14! Thanks for the question! We noted in the description that this is one of our older videos and Google has adjusted how things need to be set up. So sorry for the confusion! Please take a look at these two docs for the most up-to-date info and let us know if you are having any more issues! ✨ 𝗥𝗲𝗺𝗼𝘁𝗲 𝗠𝗼𝗱𝗶𝗳𝘆 𝗗𝗼𝗰𝘀:dvc.org/doc/command-reference/remote/modify ⚙️ 𝗛𝗼𝘄 𝘁𝗼 𝗦𝗲𝘁𝘂𝗽 𝗮 𝗚𝗼𝗼𝗴𝗹𝗲 𝗗𝗿𝗶𝘃𝗲 𝗗𝗩𝗖 𝗥𝗲𝗺𝗼𝘁𝗲: dvc.org/doc/user-guide/data-management/remote-storage/google-drive
Here is an article from a community member that was using git lfs and how and why they migrated to DVC: mlops.systems/posts/2022-05-24-data-versioning-dvc.html
Wondering how does this work. I mean the data is in Google drive and git only has 'pointer' so how once we have changed our head of pointer to previous version it is able to quickly grab data from drive? Is it cached?
You should run `dvc pull` or `dvc checkout` to grab data from your drive. Cached - yes, DVC caches data locally after `dvc pull`. Hint: git hooks can automate this dvc.org/doc/command-reference/install#installed-git-hooks
Sorry for the issue. See the description for some updated docs links to help you with this as this is one of our older videos. You may also be interested in our course at learn.iterative.ai
Hi Elle. Is there a possible way to add a new file to dvc via python script? Currently I run "DVC add " using subprocess in python when I want to track new data with dvc.
Is there anything equivalent for version controlling the data in an RMDB? Like for each push to git, instead of push .dvc and changes to Google drive, can it runs db commit and log data versions to RMDBs and record them down in a db log table etc, and reverting data shall be done at DB level?
Will it attempt to re-upload if a connection to remote breaks and then is re-established? We have a NAS that can act funky sometimes. Will this works with folders and subfolders? Or is there tracking per each file?
If I send DVC files from multiple github repos all to the same gdrive folder, will dvc list from a specific repo find all of the files in the google drive folder, or only the ones pointed to by that github repo?
Hi! Building a DVC on AWS S3, and got an error while dvc push: ERROR: failed to transfer 'md5: 08a15725d545c61787127e5558959561' - 405, message='Method Not Allowed', url=URL('..... Any ideas why?...
So you're telling me that you're programming while explaining and looking at 2 screens at the same time ? Nice tutorial but at least don't pretend by touching the keyboard
Hi @Kevinsasso1405! Thanks for the question. In the same way! Please see the following videos for more info regarding computer vision projects Becoming a Pokemon Master with DVC: Reproducible Machine Learning Experiments with Rob De Wit: th-cam.com/video/3-DG4WS5Ikk/w-d-xo.html And Best MLOps Practices for Building End-to-End Machine Learning Computer Vision Projects with Alex Kim: th-cam.com/video/3-DG4WS5Ikk/w-d-xo.html We don't currently have an audio project to share, but it would work in much the same way!
I'm just starting to get acquainted with MLOps at work, and I must say, this entire series of videos is SUPER useful! I'll be recommending this channel to everyone I know in my circles. Thanks a ton for everything you do, Elle! :D
Thanks Prashanth, that's so great to hear! Let me know if I can be of any help in your own MLOps work :)
-Elle
Best course on DVC on the whole internet
Thanks Ankit! Keep your eyes out for a thorough online course free on our website coming at the end of the year!
Thank you so much for this tutorial. I was struggling to understand how exactly DVC connected to cloud services, and your Google Drive example was extremely clear and simple.
Even I do understand DVC after seeing this explanation. Thank you very much.
11:10 "It's very difficult to name these things" 🤣. Great tutorial!
Glad you liked it!
Great explanation of the basics, thanks a lot Elle!
-d flag is not working in 'v3.42.0' at 3:33. --default flag worked perfectly also the authentication is changed by little bit.
This is for those who are watching in 2024.
Yes! Thank you @CRCE__Harkik_Prajapati! 🙏 Please see notes in the description about these updated docs:
✨ 𝗥𝗲𝗺𝗼𝘁𝗲 𝗠𝗼𝗱𝗶𝗳𝘆 𝗗𝗼𝗰𝘀: dvc.org/doc/command-reference/remote/modify
⚙️ 𝗛𝗼𝘄 𝘁𝗼 𝗦𝗲𝘁𝘂𝗽 𝗮 𝗚𝗼𝗼𝗴𝗹𝗲 𝗗𝗿𝗶𝘃𝗲 𝗗𝗩𝗖 𝗥𝗲𝗺𝗼𝘁𝗲: dvc.org/doc/user-guide/data-management/remote-storage/google-drive
Elle, thanks for this tutorial! Easy to follow, clear examples and super video effects! )))
This series was super useful, and very easy to understand. Thanks!
This is awesome! I love it! Nice tutorial too!
extremely helpful hands-on, thank you!
Can we do versioning for Image and Video datasets using DVC? If so, please point out or make a tutorial video for the same. Thanks
This is very useful! Thanks for this video!
Really useful tutorial 💚
waiting for more DVC stuff 😍
Nicely Explained
Awesome tutorial!🎉
Thanks for the intro! This was super helpful. Now I'm trying to use DVC for a computer vision project but I'm running into issues. My dataset is not too large ~40GB, I ran `dvc add` and it's been running for almost half an hour. Is this normal? I'm trying to find tutorials going through how to manage CV datasets but no luck. Do you know of any tutorials/documentation that could help me?
@Sabrina Pereira We see you found Alex's video Machine Learning Experimentation with DVC and VS Code. We are currently working on a new tool that will work with MUCH larger datasets, but in the meantime you may want to check out how DeepXHub uses DVC and CML in their CV projects: th-cam.com/video/GEpmbgR9dLo/w-d-xo.html
a very clear instraction, Thanks
Nice tutorial!!
Thanks Ricardo :)
Thanks for your informative video! it helped alot keep up the good work
Glad it helped!
Really helpful, thank you!
I enjoyed the video, but I have a question: isn't Git LFS accomplishing the same goal? My understanding is that we can already use Git LFS to store large files outside of our repositories, but still track their versioning. What would be the advantage of using DVC instead of Git LFS in this case?
@douglasmsantos Thanks for the question! Here's a great blog post from one of our Community members that addresses the issue and why they switched: mlops.systems/tools/redactionmodel/computervision/mlops/2022/05/24/data-versioning-dvc.html
And you can check out our docs around the issue here: dvc.org/doc/user-guide#comparison-with-related-technologies
@@dvcorg8370 thank you for clarifying it!
Thanks for the tutorial. How is it different from Git LFS?
Thanks for the question Asad! This doc should help you out! dvc.org/doc/user-guide/related-technologies#git-lfs-large-file-storage
I want to use a Linux server as a remote server for storing and managing my datasets and models using DVC. How can I set this up?
facing this error dvc push
ERROR: unexpected error - Failed to authenticate GDrive: Access token refresh failed: invalid_grant: Token has been expired or revoked.
Hi @pratikmore_14! Thanks for the question!
We noted in the description that this is one of our older videos and Google has adjusted how things need to be set up. So sorry for the confusion! Please take a look at these two docs for the most up-to-date info and let us know if you are having any more issues!
✨ 𝗥𝗲𝗺𝗼𝘁𝗲 𝗠𝗼𝗱𝗶𝗳𝘆 𝗗𝗼𝗰𝘀:dvc.org/doc/command-reference/remote/modify
⚙️ 𝗛𝗼𝘄 𝘁𝗼 𝗦𝗲𝘁𝘂𝗽 𝗮 𝗚𝗼𝗼𝗴𝗹𝗲 𝗗𝗿𝗶𝘃𝗲 𝗗𝗩𝗖 𝗥𝗲𝗺𝗼𝘁𝗲: dvc.org/doc/user-guide/data-management/remote-storage/google-drive
This is awesome! :)
Thanks! 😄
You're so cool! Thank you!
Great product
Thx for your tutorial, crystal clear and very useful !
Also your terminal theme is pretty cool, what is it ?
What is the advtange of dvc over git lfs?
git lfs does the same thing, but transparently and automatically. This raises questions about why dvc is needed?
Here is an article from a community member that was using git lfs and how and why they migrated to DVC: mlops.systems/posts/2022-05-24-data-versioning-dvc.html
How does this work together with the Visual Studio Code extension for git?
Great tutorial! Thanks
Great stuff!
Hi Ellen!, How can I share the google driver folder so my teammates can pull the data
Wondering how does this work. I mean the data is in Google drive and git only has 'pointer' so how once we have changed our head of pointer to previous version it is able to quickly grab data from drive? Is it cached?
You should run `dvc pull` or `dvc checkout` to grab data from your drive. Cached - yes, DVC caches data locally after `dvc pull`.
Hint: git hooks can automate this dvc.org/doc/command-reference/install#installed-git-hooks
can someone help me? i sent a json with the credentials to gcs. i would like to remove it from there
hi! thank you for the video!
But I have a problem that when I "dvc push" , google accounts links are 400: invalid_requests . Do you have any idea?
Sorry for the issue. See the description for some updated docs links to help you with this as this is one of our older videos. You may also be interested in our course at learn.iterative.ai
Hi Elle. Is there a possible way to add a new file to dvc via python script? Currently I run "DVC add " using subprocess in python when I want to track new data with dvc.
For python you could use:
from dvc.repo import Repo
repo = Repo(".")
repo.add("abc.csv")
Great tutorial. Usually see them at 1.5x speed. Not this one lol.
Is there anything equivalent for version controlling the data in an RMDB? Like for each push to git, instead of push .dvc and changes to Google drive, can it runs db commit and log data versions to RMDBs and record them down in a db log table etc, and reverting data shall be done at DB level?
Elle, you were the OG DevRel before that role got popular :D
So true!
Will it attempt to re-upload if a connection to remote breaks and then is re-established? We have a NAS that can act funky sometimes. Will this works with folders and subfolders? Or is there tracking per each file?
If I send DVC files from multiple github repos all to the same gdrive folder, will dvc list from a specific repo find all of the files in the google drive folder, or only the ones pointed to by that github repo?
Hi! Building a DVC on AWS S3, and got an error while dvc push: ERROR: failed to transfer 'md5: 08a15725d545c61787127e5558959561' - 405, message='Method Not Allowed', url=URL('.....
Any ideas why?...
very good tutorial. because amateur like me can follow
Thanks for the kind words :)
нормик. молодец тетенька
хорош
GOOGLE blocks the files. Plus you forgot to mention pip install first.
Lehner Oval
epic
Can I follow you on Instagram or maybe connect you on Linkedin?
Definitely- on LinkedIn, you can find:
- Elle: www.linkedin.com/in/drelleobrien/
- DVC team: www.linkedin.com/company/iterative-ai
So you're telling me that you're programming while explaining and looking at 2 screens at the same time ? Nice tutorial but at least don't pretend by touching the keyboard
Haha, yeah, it's not DVC, a beter name is probably MDVS, MetaData Version Control ;)
Or git-lfs api, since it seems under the hood it's using git-lfs ;)
@@huamichaelchen If gitignore is used - git lfs is not used
how do you handle files that point to other files like images/audio?
Hi @Kevinsasso1405! Thanks for the question. In the same way! Please see the following videos for more info regarding computer vision projects
Becoming a Pokemon Master with DVC: Reproducible Machine Learning Experiments with Rob De Wit: th-cam.com/video/3-DG4WS5Ikk/w-d-xo.html
And
Best MLOps Practices for Building End-to-End Machine Learning Computer Vision Projects with Alex Kim: th-cam.com/video/3-DG4WS5Ikk/w-d-xo.html
We don't currently have an audio project to share, but it would work in much the same way!