A very IMP question I had. 1. So we know that usually when the run() method of a thread is executed, the thread dies automatically after it's work is done. 2. Then why is it said that we put the same thread back to the pool or how does ThreadPool does that (i.e reuse the same thread), I think we are again creating a new thread right? and putting that back to the pool? and not reusing the original thread (which did the earlier task). Would be great if you could guide here. Thanks for the knowledge share !!
You are correct in saying that a thread dies after returning from the run() method. The key here is to never return from the run() method. This is an implementation detail of the ThreadPool class (in Java context). A common pattern is to maintain an array of threads (worker threads) and read tasks from a queue. When a new task arrives, a thread from the pool is assigned to it if it's available else the task waits. As soon as the thread is done executing the task, it goes back to the queue to get a new task else waits for it to arrive. This way, the lifecycle of the worker threads are maintained by the owning ThreadPool object which is executed in the main thread. Worker threads are alive as long as the ThreadPool object is alive.
Sir if you can please make a video series on building real word projects from scratch ( like the one's which are there in BUILD YOUR OWN X repo) Eg : * creating our own compiler * Git * HTTP server * Network application without libraries
In one of the faangs, there was an interview question: one method (class method/REST call etc) is being called by 1 million users, all users get the response at no delay. What is happening behind the scene?how come that method is getting executed million times concurrently.
@@freddyv5353 how? Suppose thousands of users are ordering different products at the same time. I have an order create API that is handling the requests concurrently.
This concept is same as connection pooling. saving the time to make connections. putting a limit on numbers of concurrent db connections that can be made, thus handling the load effectively
Another main reason for thread pool is "creating new thread and deleting then itself is very costly in terms of CPU". Also what is the max thread for a n-core cpu ?
In my experience in special VM means your workload is running.. I would say 2 * n should be size of thread pool... the reason is each core support 2 thread due to hyper threading concepts in modern chips.... so ideally across your application number of threads should not exceed 4 * n cores... take with a pinch of salt!!!
just a nitpick: you demonstrated with an example of go routines which are not actually threads (even using just go routines without keeping a worker pool would perform better than threads) but on a positive side, this was more about generic pooling concept (pooling of threads, HTTP connections, db connections or anything whose creation/bookkeeping is expensive)
Hey awesome video but here is a tip on "thread pool in Golang". In go thread pool is an anti-pattern and you should generally use semaphores instead. One can implement a basic semaphore using a buffered channel. There is a weighted semaphore implementation in the golang/x pkg if you wanna checkout or could make one yourself(I generally do this)
Are you saying his current implementation is an anti-pattern? If so could you elaborate? I'm new to Go and just now starting to read up on concurrency :p
You mentioned for network bound workload we can have more threads, for CPU heavy workload we have few threads... Isn't it should be opposite? 1. For network I/O workload because you wait more on network packet to send/arrive due to it's millisecond latency.. we can have few threads. so that thread has works rather than doing short stuff and wait 2. For CPU heavy workload : because you have more works we have more threads.. (Here I guess in hyper threading each core has 2 threads per core.. hence max we can go is approx/. 2 *number of cores) at max...
Hi Arpit thank you for the knowledge, just one question here. The implementation mentioned in the video is through go routines. In this case what does the thread pool signifies in terms of operating system threads, more briefly If i create a thread pool of size 5 (similar to mentioned in the video) will it reserve 5 go routines or 5 platform threads ? As afaik we can have more than one go routine running on one platform thread, due to the fact that go scheduler itself takes care of that. And a follow up on this is, If I have to implement this In java do I use the normal thread pool or virtual threads ? Thank you, cheers
Thread pool is all about keeping a set of threads ready to be pulled out and use for whatever you need. Virtual threads in Java were inspired from Goroutines. The concept of pools is independent of threads or virtual threads. The idea is to optimize on the cost of creation of new threads by reusing the existing ones.
Isn't it something similar to the batch processing (Correct me If I am wrong) For example: let's say you have a slice which contains 40 elements and at a time you want to process 20 of them, so you'll fetch the first 20, iterate over them and keep adding to the wait group, and perform the task using go routines and outside the loop you added wg.wait(). This way you're creating 20 threads and running them for a batch isn't it?
Hey Arpit, I know it a old video, but if you do see this and have a seconds to spare, I have a doubt. Why do we have a unbuffered channel, here like if we are adding jobs, to a workQueue, Should our channel have some buffer, in above code/scenario, are we blocking the putting of the job on workQueue, because it is unBuffered, so I add a job, I have to wait for some thread/go routine to pick up that job? before adding new Jobs to queue I am new to Go, and I am loving what you have to teach, Thanks
You can use an buffered channel here, but that would not change the concept or the approach. That still remains the same. I just kept things simpler for the demo :)
Thank you for explaining the concept on thread pool. How can we implement similar concept in python, such as server less functions on cloud. Python inherently doesn't have support for threads and thread pool due to Gil.
For our server example there is also another way to handle client, epoll/kqueue, so my question is what is the tradeoff of using thread pool over epoll or vice versa? Specially for the I/O bound task.
We have so many server side languages, so how to decide which language to choose. What are the factors which can help to make the decision. Mainly for Golang, Java, Javascript, Python, C#
It’s a decision you make based on many factors : comfort-level of Developers, performance expectations, task domain etc. Personally, i think go gives a good balance between the two. I would choose Python, if the work is more data science focused. Javascript for Server-Side only if the developers are JavaScript lovers and cant touch anything else.
Great explanation ! Just wanted to ask , will the worker threads created in the pool keeps on looking for a task submitted ? Or they only get active when a task is submmited ?
If by Golang videos you mean language tutorials, then I am not keen on doing it because I believe language learning can happen the quickest through books and blogs. Hence I do use Go in most of my examples to explain a concept or showcase how some problems can be solved.
A very IMP question I had.
1. So we know that usually when the run() method of a thread is executed, the thread dies automatically after it's work is done.
2. Then why is it said that we put the same thread back to the pool or how does ThreadPool does that (i.e reuse the same thread), I think we are again creating a new thread right? and putting that back to the pool? and not reusing the original thread (which did the earlier task).
Would be great if you could guide here. Thanks for the knowledge share !!
You are correct in saying that a thread dies after returning from the run() method. The key here is to never return from the run() method. This is an implementation detail of the ThreadPool class (in Java context). A common pattern is to maintain an array of threads (worker threads) and read tasks from a queue. When a new task arrives, a thread from the pool is assigned to it if it's available else the task waits. As soon as the thread is done executing the task, it goes back to the queue to get a new task else waits for it to arrive.
This way, the lifecycle of the worker threads are maintained by the owning ThreadPool object which is executed in the main thread. Worker threads are alive as long as the ThreadPool object is alive.
What all topics do you want me to cover? Do add them as a reply to this comment ⚡
Oauth
Great explanation sir!! Please make the next video on internals of redis database. Very much curious in deeping down on this topic with you.
Implementing a SSTable based data storage layer from scratch.
Sir if you can please make a video series on building real word projects from scratch ( like the one's which are there in BUILD YOUR OWN X repo)
Eg :
* creating our own compiler
* Git
* HTTP server
* Network application without libraries
Can you bring more such contents on threads considering more bottleneck problems. thanks in advance!
In one of the faangs, there was an interview question: one method (class method/REST call etc) is being called by 1 million users, all users get the response at no delay. What is happening behind the scene?how come that method is getting executed million times concurrently.
There must be more than one server handling the requests that are distributed by a load balancer
@@SampathKumarKamati given the same answer even threadpools on web server too. Interviewer not satisfied
response at no delay? that’s impossible. did you ask if the response was valid? they may all be getting errors
@@freddyv5353 how? Suppose thousands of users are ordering different products at the same time. I have an order create API that is handling the requests concurrently.
Resource is cached 😂
This concept is same as connection pooling. saving the time to make connections. putting a limit on numbers of concurrent db connections that can be made, thus handling the load effectively
Another main reason for thread pool is "creating new thread and deleting then itself is very costly in terms of CPU".
Also what is the max thread for a n-core cpu ?
In my experience in special VM means your workload is running.. I would say 2 * n should be size of thread pool... the reason is each core support 2 thread due to hyper threading concepts in modern chips.... so ideally across your application number of threads should not exceed 4 * n cores... take with a pinch of salt!!!
What is costly?
just a nitpick: you demonstrated with an example of go routines which are not actually threads (even using just go routines without keeping a worker pool would perform better than threads)
but on a positive side, this was more about generic pooling concept (pooling of threads, HTTP connections, db connections or anything whose creation/bookkeeping is expensive)
Yes. I tried to keep it slightly simple. If I would have covered the specifics, it would have repelled a lot of folks 😅
Hey awesome video but here is a tip on "thread pool in Golang".
In go thread pool is an anti-pattern and you should generally use semaphores instead. One can implement a basic semaphore using a buffered channel. There is a weighted semaphore implementation in the golang/x pkg if you wanna checkout or could make one yourself(I generally do this)
Are you saying his current implementation is an anti-pattern? If so could you elaborate? I'm new to Go and just now starting to read up on concurrency :p
@@JacoBluezz th-cam.com/video/5zXAHh5tJqQ/w-d-xo.htmlsi=yb-1QXpJ5W77LVoQ
You mentioned for network bound workload we can have more threads, for CPU heavy workload we have few threads... Isn't it should be opposite?
1. For network I/O workload because you wait more on network packet to send/arrive due to it's millisecond latency.. we can have few threads. so that thread has works rather than doing short stuff and wait
2. For CPU heavy workload : because you have more works we have more threads.. (Here I guess in hyper threading each core has 2 threads per core.. hence max we can go is approx/. 2 *number of cores) at max...
For the topics in future: I wish you could explain internals of go routines more in depth, (why they're light weight etc), thank you for the video
underastood, by your video
Best explanation i came across, thank you
Hi Arpit thank you for the knowledge, just one question here. The implementation mentioned in the video is through go routines. In this case what does the thread pool signifies in terms of operating system threads, more briefly If i create a thread pool of size 5 (similar to mentioned in the video) will it reserve 5 go routines or 5 platform threads ? As afaik we can have more than one go routine running on one platform thread, due to the fact that go scheduler itself takes care of that. And a follow up on this is, If I have to implement this In java do I use the normal thread pool or virtual threads ? Thank you, cheers
Thread pool is all about keeping a set of threads ready to be pulled out and use for whatever you need.
Virtual threads in Java were inspired from Goroutines. The concept of pools is independent of threads or virtual threads. The idea is to optimize on the cost of creation of new threads by reusing the existing ones.
that's a damn good handwriting
Isn't it something similar to the batch processing (Correct me If I am wrong) For example: let's say you have a slice which contains 40 elements and at a time you want to process 20 of them, so you'll fetch the first 20, iterate over them and keep adding to the wait group, and perform the task using go routines and outside the loop you added wg.wait(). This way you're creating 20 threads and running them for a batch isn't it?
You are assuming all tasks will take equal time to complete. Which might not be the case.
if thread pool is 5 and no of tasks > 5, doesn't it execute parallelly (not concurrently) ?
Hey Arpit, I know it a old video, but if you do see this and have a seconds to spare, I have a doubt.
Why do we have a unbuffered channel, here like if we are adding jobs, to a workQueue, Should our channel have some buffer, in above code/scenario, are we blocking the putting of the job on workQueue, because it is unBuffered, so I add a job, I have to wait for some thread/go routine to pick up that job? before adding new Jobs to queue
I am new to Go, and I am loving what you have to teach, Thanks
You can use an buffered channel here, but that would not change the concept or the approach. That still remains the same.
I just kept things simpler for the demo :)
@@AsliEngineering Awesome. Thanks for reverting back.
What is the font you have used in vscode looks nice
Your videos are Netflix series to me❤
Thank you for this insightful explanation Mr armpit
Thank you for explaining the concept on thread pool. How can we implement similar concept in python, such as server less functions on cloud. Python inherently doesn't have support for threads and thread pool due to Gil.
great video, thanks!
If cpu cores are 4 and threads are 10 is it worth? Or can we have at most treads equals to no of cores
For our server example there is also another way to handle client, epoll/kqueue, so my question is what is the tradeoff of using thread pool over epoll or vice versa? Specially for the I/O bound task.
One of the best content 👌
Thank you Malay!
We have so many server side languages, so how to decide which language to choose. What are the factors which can help to make the decision. Mainly for Golang, Java, Javascript, Python, C#
It’s a decision you make based on many factors : comfort-level of Developers, performance expectations, task domain etc.
Personally, i think go gives a good balance between the two.
I would choose Python, if the work is more data science focused.
Javascript for Server-Side only if the developers are JavaScript lovers and cant touch anything else.
Great explanation ! Just wanted to ask , will the worker threads created in the pool keeps on looking for a task submitted ? Or they only get active when a task is submmited ?
they get active when there is work to do, otherwise they sleep. It is wither managed by OS if kernel thread, and by language runtime if user threads.
Programming language agnostic Concurrency control mechanism.
Aa golang is your fav language, why don't you upload golang related videos?
If by Golang videos you mean language tutorials, then I am not keen on doing it because I believe language learning can happen the quickest through books and blogs.
Hence I do use Go in most of my examples to explain a concept or showcase how some problems can be solved.
@@AsliEngineering thanks for the reply, could you please suggest which books or blogs to follow to learn golang better and fast. Thanks in advance:)
@@Chakree45 I have the books I referred to for Go on my bookshelf ArpitBhayani.me/bookshelf
Thanks arpit and all d best to u🫂
@@Chakree45 thank you 🙌
wow
Make golang tutorials
SPEAK ENGLISH
great video, thanks!