Hehe finally get that "devops" for scrap the world. Good job as always. If I get this film year ago this will save me many hours. However this is perfect guide for someone who want to run scraper in cloud.
Cool. I think you (or your viewers) may also want to explore Functions product that D.O .offers. I see that you can also use cron-like features to run the python (or other languages) code. Just like GCP (cloud functions) there is a free layer, so you can run many runs before they start to charge you. But, in this case, for the log file thing, I guess you would also have to buy D.O. storage to store that file there. I see its $5 / mo for 250GB, which would be good if all functions one manages need much more than the basic droplets 10GB. I have to say the method of doing this a bit more manually like you are showing us, does offer much more flexibility about what you can do, and how to do it. I am more a GCP guy, so I will sure also try your method, depending on the use case I have. Thanks for the good info, as usual. Cheers
14:20 - about that 2>&1... 1 is the file descriptor for stdout (which is the default when using some command >> outfile) descriptor 2 is stderrr where the errors (usually) go, "usually" because some coders are too lazy, they output everything to stdout by using printf(...) instead of using fprintf(stderr, ...) for error output by 2>&1 you are redirecting stderr to stdout and errors go to you log - messages AND errors you could do something like 2 > error.log to get a separate error-logfile
Don‘t know about zsh but usually you can just type „cd“ to navigate to the users home dir. Also, at apt install if it asks for Y/n the capital letter is the default. So you can just press Enter if the Y is capitalized.
Great video! The videos you make are so much more transparent than other coding channels who just assume familiarity with this and that technology. (I’ve been watching for years and this is the first time I’ve commented!) Been running very similar workflows to this and Digital Ocean seems so much easier and cheaper than EC2, etc. Ansible is also great for automating the command lines, installs, etc. and gives full reproducibility if you want to set up multiple similar instances. One question: where do you recommend keeping your object store and/or SQL database for storing the scraped data? On something backed up by Digital Ocean, or your local machine, or a server in your homelab, or…? Cheers!
thanks! for my own projects I have most stuff on my home server, but for other work I usually just use a managed DB on Digital Ocean. It;s just easier to not have to think about it or worry.
thanks for the video mate! one question after watching this though: why didn’t you wrap all that up in a docker composer to set up your environment quickly?
I usually use selenium grid hosted and remote connect to it but it’s not something I do very often. You can run headless chrome on a vps too, via docker or similar
@@JohnWatsonRooney Thanks, there are many websites that I cannot scrape sometimes in headless approach. There are no APIs or any hidden JSONs either to scrape them from. An open browser is the only solution, seems to me.
Hey john , really nice video ! I was wondering if I could help you with more Quality Editing in your videos and also make a highly engaging Thumbnail and also help you with the overall youtube strategy and growth ! Pls let me know what do you think ?
My man, cron jobs to run scripts exist since the 80's , way before the cloud. I have never heard anyone running daily scripts from their PC manually, wth?!?!?
super slick! been following you since a year now. your content and skill is evolving constantly. its hard to be able to keep up with you :D
hey thanks a lot! really appreciate it
Loving the content John, keep em coming.
Hehe finally get that "devops" for scrap the world. Good job as always. If I get this film year ago this will save me many hours. However this is perfect guide for someone who want to run scraper in cloud.
Hah yeah I know, thanks
love your content!
Cool. I think you (or your viewers) may also want to explore Functions product that D.O .offers. I see that you can also use cron-like features to run the python (or other languages) code.
Just like GCP (cloud functions) there is a free layer, so you can run many runs before they start to charge you. But, in this case, for the log file thing, I guess you would also have to buy D.O. storage to store that file there. I see its $5 / mo for 250GB, which would be good if all functions one manages need much more than the basic droplets 10GB.
I have to say the method of doing this a bit more manually like you are showing us, does offer much more flexibility about what you can do, and how to do it.
I am more a GCP guy, so I will sure also try your method, depending on the use case I have.
Thanks for the good info, as usual.
Cheers
Good video!! Thank you for share!! 😁👍
14:20 - about that 2>&1...
1 is the file descriptor for stdout (which is the default when using some command >> outfile)
descriptor 2 is stderrr where the errors (usually) go,
"usually" because some coders are too lazy, they output everything to stdout by using printf(...) instead of using fprintf(stderr, ...) for error output
by 2>&1 you are redirecting stderr to stdout and errors go to you log - messages AND errors
you could do something like 2 > error.log to get a separate error-logfile
thank you for clarifying, I knew I wasn't quite right with that, and also that I just did it because that's how I was shown!
Don‘t know about zsh but usually you can just type „cd“ to navigate to the users home dir. Also, at apt install if it asks for Y/n the capital letter is the default. So you can just press Enter if the Y is capitalized.
Wow... I've learnt something new that I'll use with another project
Great video! The videos you make are so much more transparent than other coding channels who just assume familiarity with this and that technology. (I’ve been watching for years and this is the first time I’ve commented!) Been running very similar workflows to this and Digital Ocean seems so much easier and cheaper than EC2, etc. Ansible is also great for automating the command lines, installs, etc. and gives full reproducibility if you want to set up multiple similar instances. One question: where do you recommend keeping your object store and/or SQL database for storing the scraped data? On something backed up by Digital Ocean, or your local machine, or a server in your homelab, or…? Cheers!
thanks! for my own projects I have most stuff on my home server, but for other work I usually just use a managed DB on Digital Ocean. It;s just easier to not have to think about it or worry.
Many thanks for an amazing tutorial! It would be great to see how to push logs to a remote service like Sentry.
Cool suggestion, I'll have a look and see if I can do a follow up
thanks for the video mate! one question after watching this though: why didn’t you wrap all that up in a docker composer to set up your environment quickly?
I wanted to keep it as simple as possible for those who haven't got this far yet
I've really learnt a lot from you...
Thank you
Thanks for watching
@@JohnWatsonRooney I'll love if you can share your neovim setup...
There are a lot out there but I'm sure if you make one, it'll be different
Use "tail" for watching logs. Instead of "cat". That way you would get a live log.
Thanks, old habits die hard
Question: if you frequently work with similar tasks, why not make a Bash script?
How to combine scrapy with selenium
There’s a scrapy selenium package on pypi, and on the GitHub repo one of the issue reports shows how to update it to work
Good sir🎉🎉
What about Selenium bots?
I usually use selenium grid hosted and remote connect to it but it’s not something I do very often. You can run headless chrome on a vps too, via docker or similar
@@JohnWatsonRooney Thanks, there are many websites that I cannot scrape sometimes in headless approach. There are no APIs or any hidden JSONs either to scrape them from. An open browser is the only solution, seems to me.
Hey john , really nice video ! I was wondering if I could help you with more Quality Editing in your videos and also make a highly engaging Thumbnail and also help you with the overall youtube strategy and growth ! Pls let me know what do you think ?
Hey as much as I’d like an editor etc my channel doesn’t earn enough to pay for that I’m afraid
i use screen to manage multiple instances
My man, cron jobs to run scripts exist since the 80's , way before the cloud. I have never heard anyone running daily scripts from their PC manually, wth?!?!?
lol...so you don't know people convert their scripts to bat files and run cron locally 😅
You never run locally? Just debug straight in prod? Now THAT is from the 80s 😂
@@nuel_d_dev I'm assuming you aren't running your product on your home laptop, of course I mean on whatever servers you have available.
@@personofnote1571read again, he never runs them manually
Great ❤
haah yes !
Hm. Linux.
“Every day”, not “everyday”.