Breaking down technical topics like micro-electronics is not for everyone, but you've done a great job of translating something complex like a CPU pipeline into a simple, easy-to-grasp and understanding. Just thank you!
Thank you for sharing your knowledge! Your videos aid me apply best practices in C programming in my daily work. Going to spare more time to watch and learn from your content.
Thanks for making this! This was very helpful. You also have a great voice for narration and excellent pace! May have to check out all your other videos :)
I'm glad you liked it. For a book, I like Computer Organization and Design by Patterson and Hennessy. It's the standard textbook for lots of schools. Maybe other people can chime in with other suggestions!
Yo thanks , i have a very silly doubt here we saw 4 stage instructions and the first stage is fetch cycle , right? Correct me if i am wrong... In the fetch cycle the address of the instruction fetched is given to the address reg. And now the contents of the address reg is passed on the bus, then the instruction is fetched from memory right? Is the whole thing happens only at the first stage? For more clear view i will write like this Add reg.
The stages of the pipeline all have to take the same amount of time, so they all have to be as long as the store stage. So, there's time for the fetch of an instruction in the fetch stage. The adding to the PC is done in a separate path, in parallel. That's what the Addr in that phase is doing while the fetch from memory of the instruction is happening. So, the things you wrote happen, but not sequentially. I hope that helps!
Just duplicating the existing execution stage doesn't help because you can't get two instructions there fast enough (without duplicating everything). However, all of the stages have to take the same amount of time for the pipeline to work. If you had a really long execution stage and split it into two, then the "tick" of your pipeline (how fast things go from one stage to the next) would be doubled. When your pipeline was completely full, you'd pump out finished instructions at twice the speed.
I didn't cover stall in this video. Some pipelines are more complicated than the simple ones I talked about here and can recognize that two sequential instructions interfere with each other. Then they are smart enough to "stall" one until the data it needs is ready. Branch penalty is what I showed here - the cost of taking a conditional branch and how that empties the pipeline. That emptying of the pipeline is what "flush" means.
@@wizardcraftcode valuable insight you're sharing. Would love to see how the CPU might be able to do this for itself? I hadn't realised that was an option
Thanks for nice video and explanation. my question is "How many stages can CPU maximum does in one cycle, I mean what is the limit? I don' think that it is unlimited right?"
The stages of a pipeline run in parallel meaning they all execute every cycle. In theory, you could build an N stage pipeline for any N, but, in reality, there's a limit to how many are useful.
now im wondering if cpu makers could improve the performance even more with parallelism on a singlethreaded application. so for example, if there are two paths of instructions being executed in a single code function, and some instructions didnt have much to do with others, maybe that instruction could be loaded into the cpu way before it is actually ready to be executed, and if the instruction after it doesnt rely on the result of that data, they could be processed in parallel by the same core. something like fn main() { let mut x = 0 let mut y = 0 for i in (0..1000).rev() { x *= 2 y *= 2 } } so for this code (rust), x and y are separate and dont depend on each other. would it be feasible to separate this into multiple threads purely using the cpu without any kind of explicit code to make it parallel?
Actually, there are already optimizing compiler that will know to put those on different cores in your CPU. Essentially, the compiler can control the parallelism on the cores within a CPU without the programmer having to explicitly code the multi-threading.
Even single cores have had multiple execution units and do batches, they also are smart about doing things in a new order so that things don't conflict, they also use unnamed registers to start new operations that other registers haven't yet been cleared for since it's out of order. Pipelining gets you to scalar, or one operation max per clock cycle, since the Pentium 3 or something, we've done out of order super-scalar processing
You are correct. It executes the math necessary for the instruction. The rest of what I talked about is the high level view of the rest of what you are talking about. This video is intentionally high level, but I could have been more precise with my language at that point.
@wizardcraftcode your voice is fine and plenty animated for the topic and format. It sounds like a typical voice. I think people expect TH-cam presenters to sound like a TV presenter that works with a vocal coach.
Totally fair! Most of my students like her, but I use her less often than I used to. For me, it's easier to script these as a conversation because I'm used to talking to my students, but I'm getting better and better. Maybe you'd like my more recent videos better.
Breaking down technical topics like micro-electronics is not for everyone, but you've done a great job of translating something complex like a CPU pipeline into a simple, easy-to-grasp and understanding. Just thank you!
Wow, thanks!
Best explanation ever! Never understood pipelining very well but your explanation made it so clear. Thanks!
WOOHOO!!!! Thank you so much!
I've been a programmer for a long time and never knew this.
Thanks.
Very cool! One of the great things about this career is that there's always something new to learn. I'm glad you found something valuable here!
Thank you for sharing your knowledge! Your videos aid me apply best practices in C programming in my daily work. Going to spare more time to watch and learn from your content.
Thank you for such a nice comment! I'm really glad you find them helpful!!!
I just wanted to learn about if and else and got the whole pipelining information for free! Thanks very useful
Bonus!!!!! I'm glad you found it useful!
I love how easy to understand this video is. Subscriber++
Woohoo! Thanks!!!!
Thank you, I never could before but this high level overview has helped me understand how conditional branching introduces inefficiency!
Woohoo! Victory is mine!!!!
Thank you for this. Structured, precise and to the point. I gained +5 intelligence from just this video.
Thanks! That's a fun comment and I'm glad it helped!
Superbly explained, thank you for taking the time to put this together! 😁
Thanks! I'm glad it is helpful!!!
Very nice explanation... I've been hearing about Pipelining CPUs, but until now I had no idea what they were talking about.
Glad it was helpful! Thanks!
You deserve way more subs, thx :)
Thanks! I'm working on that! :D
Omg so much quality content. blessed to found out this channel
Tell me abt it
Quality information for free!
Thank you, both! I love it when my videos are helpful!
Hey Merlin, you did fantastic job !!
Thanks!!!
Thank you for sharing knowledge selflessly!!!
It makes me really happy when people find them useful! Thank you!
Thanks for making this! This was very helpful. You also have a great voice for narration and excellent pace! May have to check out all your other videos :)
Please do!
I love her so much xxxxxx
Thank you for this well explanation. It's very helpful.
Thanks! I'm glad it helped
Thanks so much for this video. I have an interview and I'm sure they will ask me about pipelining!
Good luck!
@@wizardcraftcode Thanks I got the job!
@@THERaikami1 good bro..
@@THERaikami1 Woohoo!!!!
super helpful!
Really love how i understood it in no time
Yay! Thanks!
Thanks so much for this video! Very clear insight of cpu pipelining for the noob I am 😊
Glad it was helpful! That makes me happy
I'm learning scripting for now but I enjoyed this.
Thanks!
Great explained and easy to understand!
Thanks!
such beautiful explanation
Thank you!!! That makes me very happy!
Thank u mam you taught it so well , please suggest a book for processor architecture( computer architecture in general)
I'm glad you liked it. For a book, I like Computer Organization and Design by Patterson and Hennessy. It's the standard textbook for lots of schools. Maybe other people can chime in with other suggestions!
what a gen! 💎 instant subscribe!
Woohoo!!!! Thanks!
Great job ! Thank you for the clear explanation, it is very helpful !
Woohoo! Thanks!!!
Yo thanks , i have a very silly doubt here we saw 4 stage instructions and the first stage is fetch cycle , right? Correct me if i am wrong... In the fetch cycle the address of the instruction fetched is given to the address reg. And now the contents of the address reg is passed on the bus, then the instruction is fetched from memory right? Is the whole thing happens only at the first stage? For more clear view i will write like this
Add reg.
The stages of the pipeline all have to take the same amount of time, so they all have to be as long as the store stage. So, there's time for the fetch of an instruction in the fetch stage. The adding to the PC is done in a separate path, in parallel. That's what the Addr in that phase is doing while the fetch from memory of the instruction is happening. So, the things you wrote happen, but not sequentially. I hope that helps!
@@wizardcraftcode thanks for taking your time to respond me👍
Thank you alot, lets say that we keep all the stages the same but we add an extra execution stage. Would it not affect the speed of the CPU at all?
Just duplicating the existing execution stage doesn't help because you can't get two instructions there fast enough (without duplicating everything). However, all of the stages have to take the same amount of time for the pipeline to work. If you had a really long execution stage and split it into two, then the "tick" of your pipeline (how fast things go from one stage to the next) would be doubled. When your pipeline was completely full, you'd pump out finished instructions at twice the speed.
hi, what is stall and flush and branch penalty in this concept, and what is its impact to pipeline. thank you.
I didn't cover stall in this video. Some pipelines are more complicated than the simple ones I talked about here and can recognize that two sequential instructions interfere with each other. Then they are smart enough to "stall" one until the data it needs is ready. Branch penalty is what I showed here - the cost of taking a conditional branch and how that empties the pipeline. That emptying of the pipeline is what "flush" means.
This is helpful thank you
I'm so glad! Thank you!
Love your vids xx♥️♥️♥️♥️♥️
Thank you!!!
Apparently, I replied from the wrong account! Thanks from this one, too!
Thanks ❤
And what if I have 2 pipelines. Will the speed of the CPU double?
That would essentially be a two core CPU. If your compiler distributes things across those cores well, yes, the speed will double.
for compiled languages, wouldnt the compilers optimise the conditions for the conditional branching?
The compiler will, and the CPU will. My goal with this video was just to give a high level view of what is happening and why you should understand it
@@wizardcraftcode valuable insight you're sharing. Would love to see how the CPU might be able to do this for itself? I hadn't realised that was an option
@@BlueNSour I'll put that on the list of topics I should make videos about. Thanks!
Thanks for nice video and explanation. my question is "How many stages can CPU maximum does in one cycle, I mean what is the limit? I don' think that it is unlimited right?"
The stages of a pipeline run in parallel meaning they all execute every cycle. In theory, you could build an N stage pipeline for any N, but, in reality, there's a limit to how many are useful.
oooommmg fuck yes the little animation is so helpful and motivating for adhd its such a relief every time hahaha
I'm glad you like it!
now im wondering if cpu makers could improve the performance even more with parallelism on a singlethreaded application.
so for example, if there are two paths of instructions being executed in a single code function, and some instructions didnt have much to do with others, maybe that instruction could be loaded into the cpu way before it is actually ready to be executed, and if the instruction after it doesnt rely on the result of that data, they could be processed in parallel by the same core.
something like
fn main() {
let mut x = 0
let mut y = 0
for i in (0..1000).rev() {
x *= 2
y *= 2
}
}
so for this code (rust), x and y are separate and dont depend on each other. would it be feasible to separate this into multiple threads purely using the cpu without any kind of explicit code to make it parallel?
Actually, there are already optimizing compiler that will know to put those on different cores in your CPU. Essentially, the compiler can control the parallelism on the cores within a CPU without the programmer having to explicitly code the multi-threading.
Even single cores have had multiple execution units and do batches, they also are smart about doing things in a new order so that things don't conflict, they also use unnamed registers to start new operations that other registers haven't yet been cleared for since it's out of order. Pipelining gets you to scalar, or one operation max per clock cycle, since the Pentium 3 or something, we've done out of order super-scalar processing
That's so cool! So much magic underlying everything we do!
Awesome
Thank you!!!!!
good work!
Thanks!
Thanks lot❤
You're welcome 😊
Thanks from india
Thanks!
It's a great explanation. Are you a professor?
I'm glad you like it. Yes! I am a Software Engineering professor at Shippensburg University. Don't hold that against me! :)
ALU does not "execute" instruction, it only do the math commanded by CU. It is CU, which does execute the instruction.
You are correct. It executes the math necessary for the instruction. The rest of what I talked about is the high level view of the rest of what you are talking about. This video is intentionally high level, but I could have been more precise with my language at that point.
robotic sound could improve
I'll try to work on that! Thanks for the feedback
Overall explanation is good. But that robotic voice part is irritating.
I'll have to work on being a bit more animated. Thanks for the feedback!
@@wizardcraftcode don't believe that, I love the animated part!
@wizardcraftcode your voice is fine and plenty animated for the topic and format. It sounds like a typical voice. I think people expect TH-cam presenters to sound like a TV presenter that works with a vocal coach.
Really did not need that other voice
Totally fair! Most of my students like her, but I use her less often than I used to. For me, it's easier to script these as a conversation because I'm used to talking to my students, but I'm getting better and better. Maybe you'd like my more recent videos better.