Do not miss our Interview Question video series 30 Important C# Interview Questions : th-cam.com/video/BKynEBPqiIM/w-d-xo.html 25 Important ASP.NET Interview Questions : th-cam.com/video/pXmMdmJUC0g/w-d-xo.html 25 Angular Interview Questions : th-cam.com/video/-jeoyDJDsSM/w-d-xo.html 5 MSBI Interview Questions : th-cam.com/video/5E815aXAwYQ/w-d-xo.html
After inserting the Parallel.For-Call the total amount of iterations is 1,000,000 * 1,000,000. For unexperienced viewers leaving the old for loop in the program could be confusing.
+Cedric Reinsch Yeah! I was thinking the very same thing! However I don't think it would have affected the processor core loading. But it would have affected the core usage and also the total exec time.
Questpond has top notch videos and the explanations are among the best anywhere. Especially the design patterns series - simple to grasp some complex subjects! Excellent job Questpond! I'll likely sign up with them shortly.
You are so right. But well the TPL example was super loaded with 1 million time more but still performed better. So it sill conveys the same message TPL is better. So sorry i was so engrossed in the demo......Should avoid night recordings.
"P2" is typically used to refer to the "Pentium 2" processor. A processor with 2 cores is referred to as a "dual core" processor and a system with 2 processors is referred to as a "dual processor" system (a dual processor system can have more than 2 cores).
Good video showing concept of TPL but in this example diagram showing running multiple cores of 1/2 millions iterations each but code shows running multi cores of millions iterations which I don't quite follows. How do you knows that tasks of computing spread equally.
There seems to be a misconception here, when we spawn a new thread like shown in first part off video, it is not for completing task faster, it is just for making calling thread free and assigning task to another thread. By this its obvious that all cores of cpu will not take same load. Because spawned thread running on one core is processing assigned task and meanwhile calling thread finishes with main function.
To increase utilization, you should create multiple threads. As you have 4 cores, creating 4 threads with each running the RunMillionIterations would show you a high CPU utilization.
Nice video. thanks to the creator. However, it raises a few questions: 1. What if I wish to run a task forever according to its own logic (in which case the 'For" of the Parallel.For is not required). Thus, I need a one time launch ... and let the thread run as long as its own logic says so. 2. How do I enquire a core's overload to correctly set task-core affinity? 3. Which API is used to set affinity? 4. I don't understand why the Parallel.For is used ... there is a for () loop statement in the task's code? 5. It is unclear to me from the core overload chart whether we see 4 identical tasks running on 4 cores concurrently? If this is a single task running on all 4 cores. then: (a) what exactly controls splitting its work among cores and then combine the results as if it was executed on a single core. (b) how come the overload of all cores is similar to the overload of the execution of a single core. Where is the benefit? in the execution period length?
In your video related to Concurrency vs Parallelism you have said when two thread is running in parallel then there is no context switching and because of it performance of the application improve. But in this video you are saying there is a time slicing or context switching when two thread runs a task but how it happens I am not understanding it.
Quick question? I see with Task.Parallel you are manully creating 1,000,000 task which are running in parallelly across multiple core using Task.Parallel, same thing can be acheived using multi-threading if you create 1,000,000 task and cpu will schedule them to run across different core's. Your earlier example where you are running a for loop will also run on single core even when we use Task.Parallel. TPL or Parallel.For does not distribute a single task across multiple cores. A single task runs on one core. To utilize multiple cores, you need to break the workload into multiple parallel tasks. Can you correct me here if I am missing something?
Correct me if am wrong, Parallel.For runs from 0 to million each time it calls a function RunMillionIterations(). This means that the each function call will be divided among processors/ available threads. However, i see the explanation saying the whole iteration is divided among iteration that is kind of confusing.
Author seems to be under impression that PII processor means processor have two cores. Rather PI to PIV are all single core processors, only after arrival of dual core pentiums we had multiple cores and later it was followed by core 2, I3,I5,I7.
Summary - Task is an encapsulation on Theads Task Parallel Library: Helps in maximal utilization of all cores present in system which increases performance of an application
Try executing 10 million iterations and you'll see the performance is actually decreasing. In my Ryzen 3400g CPU, 10M sequential loop takes 200 seconds and 10M parallel loop takes 220 seconds. That's because with the default settings, the Parallel.For() method creates 81 threads in my machine. That's a lot of overhead. But if I limit it to only 3 or 4 threads, it performs a lot better. You should not blindly follow parallelism everywhere you can. Performance of CPU bound and I/O bound works differ vastly when using these techniques (I/O bound works usually always output better performance for this).
Confused..!!! may sound stupid but if our laptops have single processor then how does it make any difference whether we are using threading or task, because all we have is single processor right??
The explanation here is not correct. He took a single threaded "for loop" and placed it on a single thread. Well, that is in fact multi threading. Your fist thread is the thread of your application itself. The second thread is the "for loop", for a total of 2 threads. What you are really talking about is parallelizing work that is "normally" single threaded, which is distinctly different.
Can someone clear a confusion for me ? Isn't he running the millions iterations in only one thread ? I thought he would create a thread for each iteration.
I think this demonstration is not a good example, it is confusing and partially fundamentally wrong. You can only parallelize a loop, when the next result does not depend from the previous result. For example, when you process lines of an image. This program does not parallelize the job, it runs the same job manifold, which has no sense. Of course it shows that the workload is well distributed, but programmatical this has no sense. The first example will of course not accelerate the calculation. It will however free the main thread from the workload, so that the user interface stays fluid and responsive.
This is an extremely poorly made example. First, The for loop you start with CANNOT be parallelized as it is written, because the result of each iteration depends on the result of the previous iteration. It must run sequentially. Second, your "rewrite" is not splitting up the work of the original function and running half of it on each core. Instead, you are running the entire function 1,000,000 times (with the full lop done each time), with half of the calls running on each processor. You have not parallelized the original problem at all, you've just multiplied it.
Do not miss our Interview Question video series
30 Important C# Interview Questions : th-cam.com/video/BKynEBPqiIM/w-d-xo.html
25 Important ASP.NET Interview Questions : th-cam.com/video/pXmMdmJUC0g/w-d-xo.html
25 Angular Interview Questions : th-cam.com/video/-jeoyDJDsSM/w-d-xo.html
5 MSBI Interview Questions : th-cam.com/video/5E815aXAwYQ/w-d-xo.html
After inserting the Parallel.For-Call the total amount of iterations is 1,000,000 * 1,000,000. For unexperienced viewers leaving the old for loop in the program could be confusing.
+Cedric Reinsch
Yeah! I was thinking the very same thing!
However I don't think it would have affected the processor core loading. But it would have affected the core usage and also the total exec time.
Questpond helped me crack so many software interviews. Thank you Shiv Prasad sir.
Questpond has top notch videos and the explanations are among the best anywhere. Especially the design patterns series - simple to grasp some complex subjects! Excellent job Questpond! I'll likely sign up with them shortly.
You are so right. But well the TPL example was super loaded
with 1 million time more but still performed better. So it sill conveys the same message TPL is better.
So sorry i was so engrossed in the demo......Should avoid night recordings.
I love your concise to the point tutorials....Very helpful.
I think we do not need "for loop" in the method while we are using parallel class in the example mentioned in the video..
i agree
Right
"P2" is typically used to refer to the "Pentium 2" processor. A processor with 2 cores is referred to as a "dual core" processor and a system with 2 processors is referred to as a "dual processor" system (a dual processor system can have more than 2 cores).
examples show are different but proves the point anyways that TPL has better performance than threads.
Nice video with ample of information demonstrating comparison of legacy multi threading and modern TPL
Good video showing concept of TPL but in this example diagram showing running multiple cores of 1/2 millions iterations each but code shows running multi cores of millions iterations which I don't quite follows. How do you knows that tasks of computing spread equally.
There seems to be a misconception here, when we spawn a new thread like shown in first part off video, it is not for completing task faster, it is just for making calling thread free and assigning task to another thread. By this its obvious that all cores of cpu will not take same load. Because spawned thread running on one core is processing assigned task and meanwhile calling thread finishes with main function.
To increase utilization, you should create multiple threads. As you have 4 cores, creating 4 threads with each running the RunMillionIterations would show you a high CPU utilization.
Fast-forward to 8:31 if you already understand threading and using perfmon and actually wish to get to TPL.
Nice video. thanks to the creator. However, it raises a few questions:
1. What if I wish to run a task forever according to its own logic
(in which case the 'For" of the Parallel.For is not required).
Thus, I need a one time launch ... and let the thread run as long as its own logic says so.
2. How do I enquire a core's overload to correctly set task-core affinity?
3. Which API is used to set affinity?
4. I don't understand why the Parallel.For is used ... there is a for () loop statement in the task's code?
5. It is unclear to me from the core overload chart whether we see 4 identical tasks running on 4 cores concurrently?
If this is a single task running on all 4 cores. then:
(a) what exactly controls splitting its work among cores and then combine the results as if it was executed on a single core.
(b) how come the overload of all cores is similar to the overload of the execution of a single core. Where is the benefit?
in the execution period length?
In your video related to Concurrency vs Parallelism you have said when two thread is running in parallel then there is no context switching and because of it performance of the application improve. But in this video you are saying there is a time slicing or context switching when two thread runs a task but how it happens I am not understanding it.
Very nice video. Where is the next part of TPL tutorial
Quick question? I see with Task.Parallel you are manully creating 1,000,000 task which are running in parallelly across multiple core using Task.Parallel, same thing can be acheived using multi-threading if you create 1,000,000 task and cpu will schedule them to run across different core's. Your earlier example where you are running a for loop will also run on single core even when we use Task.Parallel.
TPL or Parallel.For does not distribute a single task across multiple cores. A single task runs on one core. To utilize multiple cores, you need to break the workload into multiple parallel tasks.
Can you correct me here if I am missing something?
really good video. what i suspected was happening but very clever way of proving it.
I have to say that concatenating a string with s = s+"x" one million times is absolute a killer for memory and GC
we can use 'StingBuilder' instead of 'String'
Correct me if am wrong, Parallel.For runs from 0 to million each time it calls a function RunMillionIterations(). This means that the each function call will be divided among processors/ available threads. However, i see the explanation saying the whole iteration is divided among iteration that is kind of confusing.
Trust your intuition it never lies
Hi, Great video, where are the other parts to it?
You are awesome sir.. really you always explain complex things in simple way. that a layman can understand.. thank you for sharing ..
sende
Author seems to be under impression that PII processor means processor have two cores. Rather PI to PIV are all single core processors, only after arrival of dual core pentiums we had multiple cores and later it was followed by core 2, I3,I5,I7.
Great explanation!
How do thread branches execute? If I spawn a thread from a thread does it time slice or does it parallelize. I guess I could experiment and find out.
awesome. love the explanation of all your videos.
Good video. Liked the perfmon demo.
Can we Eecute a long running in single core using two thread?
nice explanation. Awesome work.
Summary -
Task is an encapsulation on Theads
Task Parallel Library: Helps in maximal utilization of all cores present in system which increases performance of an application
Love the video but where's the 2nd part? cant find it...
If we execute multi thread in a single core then it context switching will happen? Am I right?
please provide link for 2nd part.
This video really helped me. Thank you !
very good video
What a nice tutorial!
good tutorial .. liked it !!
Great! Thanks
Try executing 10 million iterations and you'll see the performance is actually decreasing. In my Ryzen 3400g CPU, 10M sequential loop takes 200 seconds and 10M parallel loop takes 220 seconds. That's because with the default settings, the Parallel.For() method creates 81 threads in my machine. That's a lot of overhead. But if I limit it to only 3 or 4 threads, it performs a lot better. You should not blindly follow parallelism everywhere you can. Performance of CPU bound and I/O bound works differ vastly when using these techniques (I/O bound works usually always output better performance for this).
thank you for this explanation!!!!
Nice. Many thanks!
Confused..!!! may sound stupid but if our laptops have single processor then how does it make any difference whether we are using threading or task, because all we have is single processor right??
thank you for this great video, very well done!
Excellent 👌👌
Where is the 2nd part of this video? I cannot seem to find a link for it. Thanks.
I think these are paid videos. This video might be only for advertisement prapose.
I used parallel.foreach for filling a model but it is not filling correctly.....
Good one 👍🏻
If you spawn two or more threads manually it will parallelize but you will need callbacks unless its fire and forget.
+fordfiveohh For technical query mail us at questpond@questpond.com
great video,
you should have put the parallel for inside the run million iterations... technically you are showing two diff examples.....
video for mutex semaphore, semaphoreslim is missing in questpond.com. Pls put it back
+Rashmi Purbey
Hello Mam,
The video is there you can see C# Threading Q & A Videos section in questpond
+.NET Interview Preparation videos
couldn't find it also, not on your TH-cam channel or questpond.com
Good video
not able to access this video. it says an error occurred
P2(intel Pentium 2) is single core btw
Its a dope video awesome.
Is this code not doing 1 million * RunMillionIterations() I think you can drop the one of the for... probably the one inside RunMillionIterations
how to implement, below code using TPL. Any Idea?
foreach (GridViewRow row in grdSearch.Rows)
{ //do smth
}
Using parallel foreach
The explanation here is not correct. He took a single threaded "for loop" and placed it on a single thread. Well, that is in fact multi threading. Your fist thread is the thread of your application itself. The second thread is the "for loop", for a total of 2 threads. What you are really talking about is parallelizing work that is "normally" single threaded, which is distinctly different.
In the first example you run a single thread and in the second you run MANY threads
Where is the second part of TPL video?
I think you need to purchase his subscription for 2nd part ;-)
From questions.com
awesome..
It plays for me. Must have been just a TH-cam error.
Please set your second video here
Can someone clear a confusion for me ?
Isn't he running the millions iterations in only one thread ? I thought he would create a thread for each iteration.
An Interviewers asked me : Can you create 50 threads using TPL ? What would be my answer ?
I think this demonstration is not a good example, it is confusing and partially fundamentally wrong. You can only parallelize a loop, when the next result does not depend from the previous result. For example, when you process lines of an image.
This program does not parallelize the job, it runs the same job manifold, which has no sense. Of course it shows that the workload is well distributed, but programmatical this has no sense.
The first example will of course not accelerate the calculation. It will however free the main thread from the workload, so that the user interface stays fluid and responsive.
My dude you just ran 250000000000 iterations per core.
This is an extremely poorly made example.
First, The for loop you start with CANNOT be parallelized as it is written, because the result of each iteration depends on the result of the previous iteration. It must run sequentially.
Second, your "rewrite" is not splitting up the work of the original function and running half of it on each core. Instead, you are running the entire function 1,000,000 times (with the full lop done each time), with half of the calls running on each processor. You have not parallelized the original problem at all, you've just multiplied it.
P8, hehehe :p
another excellent explanation, thanks for sharing
thank you for this explanation!!!!