Thanks for your videos! I'm a little hesitant to use furrr if there are compatibility issues, wondering if that's been worked out, or what the specific issue with windows/rstudio is? Seems like foreach/dopar has a lot going for it despite some of the potential advantages of furrr. As i understand it, compatibility is good across operating systems.
@@JamesBarnesIV Thanks for watching and asking your question! I haven't really noticed any big problems with furrr and think they can be easily navigated. I actually just became familiar with foreach/dopar in the last few weeks.
Hi Brian - thanks for watching and for asking your question! Kelly from my group modified the code I presented in the video to show how to extract the feature importance values. You can see the changes in the code at github.com/riffomonas/mikropml_demo/commit/18bea7c396436917c54c775e182325a5b7566771 You can click on the individual files in that commit to see the two scripts she was working with
Assuming using all the cores gives best results is a mistake. You want to run this application with 1, 2, 4, 6...16 cores and record the run time for each. At some qty of cores used you will have best performance. My Xeon X5650 machines are 6 core triple DDR3 mem channel (SMT off) and I get best performance using 5 of the 6 cores.
Here is an example just run on my old Dell T3500, Ubuntu 20.04, 24GB ram + 2GB swap, X5690 processor. I am analysing a device with 35 columns and 48 rows of nodes. Each device has test results from each node laid out per column with 20 rows of data covering the 20 products tested. So there are 1,680 columns of integer data, 20 rows deep. I am using QIC() function from qicharts2. Each node is, in the first instance, processed as a run chart. for (nw in 1:12) { plan(sequential) cat("Workers = ", nw, " Stage Select ") tic() d1 % select(-Testdate, -UUT) toc() plan(multicore, workers = nw) cat("Stage QIC " ) tic() dm
I don't know what is my problem, but i see the new r procesess after calling the furrr functions, but their using 0 cpu. I don't know why. I'm using R 4.1.2 on a Windows PC, with RStudio and multisession strategy. Do you have any idea of what it can be?
@@Riffomonas I unveiled the mystery. The problem is that i was operating on a grouped tibble. Ungrouping it fixed the problem. I will check if it's a known problem, or if it's a desired effect.
future.apply is also great. Could you do a video using furrr to execute remotely on EC2 through AWS? I read this is a feature but haven't dug into it yet.
Have you tried using the furrr future_* functions on a windows computer? How did it go?
Thanks for your videos! I'm a little hesitant to use furrr if there are compatibility issues, wondering if that's been worked out, or what the specific issue with windows/rstudio is?
Seems like foreach/dopar has a lot going for it despite some of the potential advantages of furrr. As i understand it, compatibility is good across operating systems.
@@JamesBarnesIV Thanks for watching and asking your question! I haven't really noticed any big problems with furrr and think they can be easily navigated. I actually just became familiar with foreach/dopar in the last few weeks.
Hahaha of course you would do parallel processing in the next video! THANKS!
You bet - thanks for watching!
Quick question. How might I pull all combined feature importance out after running in the loop.
Hi Brian - thanks for watching and for asking your question! Kelly from my group modified the code I presented in the video to show how to extract the feature importance values. You can see the changes in the code at github.com/riffomonas/mikropml_demo/commit/18bea7c396436917c54c775e182325a5b7566771 You can click on the individual files in that commit to see the two scripts she was working with
Assuming using all the cores gives best results is a mistake. You want to run this application with 1, 2, 4, 6...16 cores and record the run time for each. At some qty of cores used you will have best performance. My Xeon X5650 machines are 6 core triple DDR3 mem channel (SMT off) and I get best performance using 5 of the 6 cores.
Here is an example just run on my old Dell T3500, Ubuntu 20.04, 24GB ram + 2GB swap, X5690 processor.
I am analysing a device with 35 columns and 48 rows of nodes. Each device has test results from each node laid out per column with 20 rows of data covering the 20 products tested.
So there are 1,680 columns of integer data, 20 rows deep. I am using QIC() function from qicharts2.
Each node is, in the first instance, processed as a run chart.
for (nw in 1:12) {
plan(sequential)
cat("Workers = ", nw, "
Stage Select
")
tic()
d1 % select(-Testdate, -UUT)
toc()
plan(multicore, workers = nw)
cat("Stage QIC
" )
tic()
dm
I don't know what is my problem, but i see the new r procesess after calling the furrr functions, but their using 0 cpu. I don't know why. I'm using R 4.1.2 on a Windows PC, with RStudio and multisession strategy. Do you have any idea of what it can be?
Sorry I'm not sure. I know that things on windows can be a little screwy at times. Can you maybe try one of the other strategies?
@@Riffomonas I unveiled the mystery. The problem is that i was operating on a grouped tibble. Ungrouping it fixed the problem. I will check if it's a known problem, or if it's a desired effect.
@@Riffomonas It is a known problem, and the developer simply suggest to ungroup the tibble before applying future maps.
future.apply is also great.
Could you do a video using furrr to execute remotely on EC2 through AWS? I read this is a feature but haven't dug into it yet.
Hi Dereck - I’m not sure that it would be any different on AWS. I’ll have to look into it. Thanks for the suggestion
Love the videos - using them to learn slurm and HPC. Please have a coffee on me. J
Thank you so much for your generosity!