perhaps you'd be interested to know that, in trouble shooting odd defects in chips, we do something quite similar to the software you have shown, but do it in reality. If the chip is in a package, the package is removed as well as the upper parts of the chip so as to expose the actual circuitry. the circuits that are suspected to be involved "wired up" and powered with instructions to match the failure conditions.A camera is used capture photon emissions and you can actually SEE the transistors turning on and off.
I always find that part of development fascinating as well. I'm not in the testing side of things but knowing that we can basically X-ray a chip to figure out what it's doing wrong is invaluable to me on the development and simulation side of things.
When i worked at Philips Semiconductors we had a FIB machine - a Focused Ion Beam. Using this it was possible to modify an IC - both cutting connections and adding new ones. Spare gates were usually dotted around the IC on early prototypes so if a signal needed inverting to fix a bug it was possible.
@martinwhitaker5096 when I worked in priduct engingineering bringing the Nindendo Wii out I used a FIB to make a wafer with known array fails to check our testing software
Would be nice to see that in action...I didn't know transistors would emit photons while working. Far less likely to ever see videos emerging of interesting but advanced machinery and proceses than of people using the end product. It's all commercial secret in most cases
I'm a chip engineer and I have to say, it's sometimes hard for even us to answer that question. We all specialize so far into our specific parts of a CPU that we may not know how a whole thing works. I did my PhD on ways to optimize active interposer design, which is basically a chip you put other chips on top of to allow them to talk together. Think about a Vega64 or Radeon VII. Those have passive interposers, just a bunch of tiny wires through a hunk of silicon to connect memory to the GPU. My work was on effective ways to implement some logic into that chip, like moving the memory controller down there, or the PCIE bus, or maybe a huge chunk of SRAM for a cache. I actually work on CPUs, not GPUs, but most people are more familiar with the Vega64 or VII than they are with Intel's Lakefield, which is actually something I worked on. I do regret the awful performance of those chips. A single Sunny Cove core with no hyperthreading paired with 4 old atom cores was never destined for greatness. Up next for me was the bridge dies in Sapphire Rapids, and now the topic of my PhD itself becomes a product in Meteor Lake. The architects make it all work inside, I'm just a glorified electrician wiring shit together.
I’m a sparky of 35 years experience and an electrical inspector, and I can tell you sir that your work just blows me away! The engineering behind such IC’s is just extraordinary! 👍👍👍
I've been hand assembling machine since 1980: Z-80a, 6502 and x86. decoding the instructions and actually performing the movement of bits, this is a great question. I always thought the bit pattern in each instruction perhaps had something to do with the decoding. This is a great video. Thank You!
Might be worth mentioning, when you jump from 4004 to Z80. Those are related chips. The 4004 begot the 8008, the 8008 begot the 8080, the 8080 begot the Z80. All within a very very short period of time.
10:36 Nothing says "I am not an electrical engineeer." as almost spilling some welding wire randomly in a PC motherboard for no reason at all. =) That made me pretty anxious and then I just started laughing XD
@@SpeccyMan Thanks for the correction. I have read about and practiced with electronics in another language, so I sometimes mix up some terms in english.
This was absolutely fantastic. I'm currently trying to teach myself Z80 Assembly as a hobby, many, many years after playing about with Basic as a kid in the 80s. This video has really helped turn the abstract concept of what's going on into a real visual way of seeing it happen. Superb. 😎👍
I was until recently quite ignorant of how computer hardware really works. Videos like this really help understanding this fascinating subject.Amazing content, keep it up
Programmers that actually KNOW (I mean like an engineer) how a cpu works are rare, let alone programmers have an idea what CPU instructions actually are. Many programmers don't even know in to string and visa versa is actually done!!!!!!!!!!!!!!! ....I WEEP FOR THE FUTURE!!!!!!!!!!!!!!!!!!!!!!!!!!!!
When i studied this at uni before we even looked at assembly and CPU architectures we build a finite state machine using an EEPROM. Learning how you can use a computer or dip switches to control the address bus and then control outputs attached to the data bus. Then if you got more advanced and had clicking you could take bits from the data bus and feed them back to the address bus and you've got a hardware finite state machine. Which as we found out later on is how you build an instruction decoder. Of course it seemed pointless in first year. Anyway I'm glad I learnt all of this. Helps me understand what the CPU vendors are talking about they talk about branch prediction, out of order execution and super scalar design. Also makes me appreciate just how much work goes into a compiler.
Wow. Thank you for a masterclass into the inner workings of the Z80! Instant subscriber! I had so much fun back in the 80s writing a lot of Z80 assembler. This is the deepest I’ve ever looked into its inner workings though.
The best way to learn such a profound question is make your own computer CPU. Not with commercial microprocessor like Z80, but with TTL gates or micro-programmable bit-slice chips like Am2900 series. Or in modern days, with FPGA chip and Verilog IC design tool. You can design completely new instruction set of very simple 8 bit CPU like 6502 and simulate it. Only with simple gates and macro blocks. No existing microprocessor is needed. Most tools are free and hardware evaluation board is not expensive. Recommend even for hobbyist.
I've not watched your channel for a few months and I love the progress you've made with editing and sound and script. Really informative. Kids at school are very lucky to have you as their teacher.
This is the big problem with modern x86 computers, anyone can overclock a Core i7 and think they know how a computer works. I thought I was an expert after doing x86 stuff for 10 years, only to discover 8-bit homebrew computers and realise I didn't have a clue what an instruction is, or memory addresses, or really anything at the machine code level and below. Once I built my first Z80 computer that could be programmed in machine language I had pretty much no idea how to even use it.
I feel your pain. I've worked on ARM, RISC-V, and lots of x86-64 machines. I've designed chunks of chips, and yet I'm still often times looking up how to do something in machine code for the times I do have to use it, which are thankfully rarer and rarer now that they've stuck me away in the development department rather than in testing.
The inner workings of a Z80 CPU are registers, ROM and functional units (like the ALU (Arithmetic Logic Unit), memory unit etc). The CPU first loads an instruction into the instruction register, which connects to ROM. The ROM is also connected to a step counter, which is reset at the start of each instruction. Each step in the ROM is a list of signals to send to parts of the chip (e.g. use the memory unit to fetch the next byte of the instruction, then store the result in a register). The last step of each sequence in the ROM is to load the next instruction into the instruction register and the cycle repeats. All CPUs come down to the basic building blocks of logic (AND, OR, NOT), implemented in transistors, wired together and sequenced by the clock. Just like normal programming you break the task down into components, sub components etc, and build each part from simpler parts connected together to achieve the required effect. A register for example is made from a bunch of Flip-Flop units, a Flip-Flop (at its simplest) is made from a couple of NOR gates (that is OR gates with their outputs NOT’ed). The difference with normal programming is that many things happen in parallel (so you can be fetching something from memory, moving the contents of one register to another and adding two registers together at the same time for example). If you’re interested at that kind of level then take a look at FPGAs, which are basically a bunch of uncommitted logic that your program can wire up any way that you like. You can implement a Z80, 6502, RISC-V etc, or even your own CPU design on them by defining the logic for each unit and how they are wired together. There’s also code available on the web for implementations of the above CPUs, so (once you learn to read the language used - typically Verilog or VHDL) you can see what is happening.
Four month's long last year I spent on successively designing microprocessor architectures, starting by the simplest and ending up in something very similar to the latest multichip mosaic Intel designed recently . . . . that I didn't know about yet. I soon got rid of the step counter, substituting ROM-stored next step numbers next to ROM-stored microcode steps. I added a four-gates circuit that decoded a two-bits microcode portion to control step jumps inside an instruction's microcode by either not jumping, always jumping, or conditionally jumping on the set or on the clear state of a selected condition flag bit, which would otherwise have contributed to the microcode ROM addressing along with the instruction opcode and the step counter state. Other developments were more complex and would require many pages to describe: pipelines, instruction caching, register RAM caching, interrupt control, addressing-space misalignment dealing, multiphase clock, task interleaving, and so on. I can't afford prototyping the final architectural design. I could one of the not-so-complex designs but what I was interested on was just design: mine would't be able to compete with commercial ones, so, what'd the worth while be? But I'm proud my designing evolution bridged me across 50 years of microprocessor design in just four months, especially since I've ignored what the industry accomplished the last 30 years.
@@wafikiri_ erm, what do you mean “can’t afford prototyping the final architecture design”? I’m assuming you’ve at least synthesised and simulated it to prove that it works? If you have got that far then, unless you need more than 138K of logic elements and 900K of internal memory (which is a LOT for an 8 bit design), then you should be able to implement it in a sub $200 FPGA.
FPGAs are rather more complicated than a simple case of Uncommitted Logic Arrays (ULA). They often have what are called LUTs (or Lookup Tables), which might actually just be memory containing input/output value mappings.
@@jnharton LUTs are exactly uncommitted logic. They use RAM cells to define a truth table so that they can act like AND/OR/XOR etc, but they are the basic building blocks of a design. LUTs are generally combined with flip-flops and carry chain logic, multiple to a logic block, but what really separates FPGAs from ULAs are dedicated function blocks that also form part of the fabric. These can be anything from routing resources through clock tiles and RAM blocks to DSP tiles. A moderate sized FPGA these days can implement an entire Z80 based computer within itself using those resources.
Just finish up writing a Gameboy emulator at the moment (z80ish) - fun exercise. Z80 instruction set is fairly regular, with some odd exceptions - so 500 instructions crunch down to more like 50 with variations. Doable as about a 2/3 month evening exercise
They're still separate instructions with different opcodes, though. And because the Z80 is a CISC cpu, it may well have entirely independent logic circuits/paths for each instruction. That's not say there isn't sharing of some parts, just that other parts may be entirely dedicated to just one or two instructions.
To this day, I still love my assembly. It's so rewarding to watch stuff run so crazy fast. I often use DOSBOX and play with the old video ram to make neat moving displays, or show game concepts. Video ram starts at B000H (mono) and B800H (color). 80x25 character grid, each character occupies two bytes, value and attribute. I so miss the old, simple days. Oh, I also miss the TRS-80, that was fun to poke your code then x=usr(0). Yeah....
I usually explain it like that: You know what register is ? That's "thing" in which you write bits and it stays there. Now lets look at 74xx chips of standard logic what we see? One of those chips is "register" and as we could see from description it works exactly as you expect from "software" point of view on "register". But now lets look how that value is stored - there is an actual voltages on pins of that register which represent bits you write into it. And with voltages what you can do ? You can control things. Like make LED light or turn on relay and such. So with that voltages you can control actual finite state machine which would generate other sequences of voltages like paper tape with holes make mechanical piano play music. And that finite state machine decompose each "cpu instruction" into series of elementary micro steps like "enable that circuit" or "switch that multiplexer" and such. So thats how cpu "knows" what to do with each instruction opcode.
So you mean unlike in analog circuits, in combinatorial digital circuits information flows in one direction? TTL gates are so complicated. LS ?? ECL is a bit easier just in reality all those current sources are expensive.
one of the most helpful things i found to understand is the cpu reads the instruction stack from memory. this clarifies 1. cpu is not following instruction set literally as it also performs its own mem lookups to read the instructions (and other stuffs like store jump pointers), and 2. the cpu does not store any data itself and when data is fetched any one of several memory or peripherals could respond (has no control). understanding from this angle solidifies what role is and allows to imagine in head a continuous loop of operations.
@@ArneChristianRosenfeldt right. perhaps it is due to being a programmer, but i had always assumed prior that cpu 'stores and runs the code' in a literal sense. while in reality, it could be processing instructions from a vid card's mapped i/o. where the instructions are stored, or how they are arranged (other than being sequential upon no branch case), is not its responsibility.
So like in a C64 code would run in ROM and still load data into RAM. Or code would run in RAM and jump into subroutines in ROM. Or code would look up Pi in BASIC-ROM. Yeah, indeed there is a trick in for the C64 were the output of a timer chip is used as code (absolute address) to jump to different places depending on the time. @@ta1bubba
When I am asked, “How does it work?” I usually answer, “Just well enough to get by.” That applies to most people, so we can assume machines do the same.
There is yotube serie where guy is building simple CPU on breadboard with leds to show status of everything. This I reccomend for explanation how cpu work internally. This video here is good too.
In the 1970's Radio Shack had a small book that explained the inner workings of the logic circuits in a handheld calculator. In about 1981, the Sinclair ZX-81 became available. It was a real computer, powered by a Z80 and had 1 or 2 kilobytes of RAM depending on the version you got. (I think the original English ZX-81 came with 1K and the U.S. Timex Sinclair ZX-81 version had 2K.). I put one together from a $100 kit. You could write small programs in Sinclair Basic, and if you were ambitious, you could escape from writing programs in Basic to programing directly in Z80 machine code. The Radio Shack book, and the Sinclair ZX-81 were a wonderful place to start to learn the basics of computing. I'm glad I had the experience of learning about computers on the ZX-81.
First of all, thanks an informative video. I'd just like add a few things to the subject: Nowadays the programs (the opcodes) are stored in the main memory but it was not always the case. Originally the "computing machines" were more thought as super calculators because it was where the need arose first. Mechanical or electromechanical devices had data separated from the "code" that took the form of a hardware configuration. This was tedious and error prone. The idea of making the configuration virtual by storing the configuration as digital values was a novelty that got the name "stored program computer". Some CPU architectures still use separate spaces for code and data (Harvard architecture). All the "regular" machines that we use today come from the famous Von Neumann paper (the original sin of computing). This paper was a mathematician's view on how to use an automated device to perform calculus and only this; using that era's technology. How to decode a formula expressed in text form and compute (evaluate) its result. The issue is that that specific paper for a specific task was generalized. It went from: This is A way to build a computing device to: This is THE (only) way to build a computing device There are many other structures and means to perform computing tasks, many of which are not sequential. The first one that comes to mind is neural networks. But unfortunately the Von Neumann model (Y shape with arrows) has "polluted" the minds of all engineers. This is why it is often difficult to get away from the "sequential mind bias" (parallel programming). The machine code of a CPU is the "grammar" of that CPU and gives it it's "personality". This is carefully crafted because it is what the programmer will "see" from the CPU. This is evidenced by the acronym ISA (Instruction Set ARCHITECTURE). The coder is supposed NOT to know and not need to know how the instructions are actually implemented. This is "at the discretion of the implementer", a form of "API" to the CPU. The only way "construct complexity" out of thin air is to implement an FSM (Finite State Machine) in some way. This state machine is the link between the instruction opcode(s) and the internal logic components that actualy perform the intended operation. This is why the internal logic can be completely different from what is seen from outside (the Z80 has a 4 bits ALU, see Ken Sheriff's work). The instruction decode is only the first stage of this FSM. An operation symbolised by an opcode is rarely if ever performed in a single step of the internal logic. Most 8 bit CPUs like (8080, Z80, 68xx, 65xx) implement this in combinatory logic with a special note for the 65xx family that uses a ROM as a first stage. Because the CISC (x86 etc...) CPUs have become increasingly complex this has been replaced by microcode. The actual hardwired state machine is replaced by an internal, hidden CPU that runs that microcode. Because there has been an inflation of modes, flags, security conditions etc... the number of clock cycles to execute a single top level assembler opcode has increased (sometimes 20 or 30). As opposed to this the RISC design paradigm does the exact opposite. Most of the RISC CPUs seek to increase execution sped by using as few cycles per instructions as possible. The only business of the CPU is to EXECUTE code not make decisions it has no information about. All complex decisions are delegated to the compiler. This is only the CPUs that are inspired by the Von Neumann model. There are many "funny" CPUs even designs that seek to use analogue attributes of electronic components. These are usually, very application specific and require interfacing to the standard digital world.
What do you mean by “few cycles”? 6502 needs two cycles for a lot of instructions. The SH2 in the SEGA Saturn, the MIPS in The PlayStation need one cycle per instruction. Though these consoles cheat. You only have limited SRAM. DRAM access takes many cycles.
@@ArneChristianRosenfeldt By "as few cycles per instructions as possible" I was referring to a design goal of the ISAs. 1 cycle clearly qualifies as "as few cycles per instructions as possible". Basically RISC designs have so few clock cycles that the execution time is equal to the fetch time of the individual bytes from memory. There are no added cycles where the CPU is just churning inside. The 6502 by its design resembles a RISC design but Sophie Wilson (designer of ARM) said it is on a category of its own. I am not sure why, but maybe because of the 0 page operations that can operate directly on memory and do not obey the "load / store" principle of RISC where the memory transfers are separate instructions from the compute operations. As opposed to RISC, CISC designs do not have as a primary goal to simplify or reduce the clock cycles of the execution. The hope is that by having more "complete" instructions that more closely matches the C language for example, the instruction count would be reduced to for the same task. The drawback of this is that it leads to complex decisions at runtime. So, CPUs like the 6502 (even if it is not a RISC CPU per say) will tend to have an execution time that is exactly the byte fetch count when instructions are executed on registers. So an instruction with implicit operand on the accumulator will only use 1 cycle. When using 0 page instructions or indexed addressing the number of byte fetches will extend the execution of the single instruction because the instruction become memory bound. These memory bound operations would not be allowed on a real load / store architecture. You would have to: - load in a register - compute store the register back in memory as 3 distinct operations. So to summarize, when you are in the very specific case that you have a CPU that is both RISC and load / store (AVR, MIPS) you are correct, most of the instructions will be 1 cycle, unless they need to be extended by wait states for slow devices. This is only valid because direct operations on memory are not allowed.
I just meant that MIPS as invented at the university in 1980 aimed at 1 cycle per instruction. I just mean that they replaced the complicated "few" optimization problem with a KISS: one cycle or we don't do it. 6502 actually is hardcoded to use two cycles to fetch. One cycle is wasted on single byte instructions. The CMOS version corrects this. Probably it was never really needed, but simplified things. The zero page thing was seen in other microprocessors of the time. RCA 1802 had this external register file, where you still had to load values into the ALU first. Then there was this "first 16bit" CPU which actually just stored 16 bit values in a zero page like a 6502, but could already switch the page for mutli-tasking. So the motivation was not to implement complex instructions, but to outsource as many transistors as possible to off-the-shelf memory. Z80 has 16 bytes. It could really have supported a lot of reg-reg instructions and hence load store may have made sense. reg-mem saves you register transistors because the other operand is loaded into a temporary register, which is free again in the next instruction . The same register can be used first for the addressing mode and then for the reg-mem operation. CMP and test ( zero register target ) also save. RISC CPUs use the same addressing modes. Only Postincrement and Predecrement or so is missing. I don't really know why. Yeah, would need to write back two values into the register file at once. But that would also be useful for a two register pushy shift. Source: Bitfield, shift value, write backs: left half and right half. I guess that MIPS did not want to carry around circuitry which is not used by most instructions. ARM has a dedicated stack pointer just like the 8-bit CPUs. @@danielleblanc5923
When I was digging through Vega BIOS files, I seen some strings that started with $. I thought "maybe these are commands, since even modern PC components have ancient guts that get things started and handle plenty of the lowest level functions". Here I am over a year later hearing a teacher confirming that thinking.
@@maxims.4882 So far as I recall you are correct. It's been a lot of years since I last used asm on a TRS-80. So thanks for the correction m8. I've probably forgotten more than I knew. 🤣
@@maxims.4882 That isn't entirely accurate as the XOR r instruction specifies any of A,B,C,D,E,H and L to be r so an XOR B is possible but it should more accurately be seen as meaning XOR A,B since the result will still end up in the accumulator and not provide the result the OP suggested. The only way to zero the B register is LD B,0. There is no shortcut.
Although this i a good attempt. If you want to dive much deeper in the working of each part of a cpu. This channel www.youtube.com/@weirdboyjim/videos has made a quad pumped cpu from scratch. Quad pumped is more advanced then the 6502 or Z80 but it is still the beginning of CPU development before clock doubling and the use of caches. You will need to watch many episodes to see the whole working. And sometimes you might need to google how a certain electronic component works or what it's basic function is to understand the video but this is as close as you can get to fully understand it. The later added technologies like branch prediction, clock doubling and caches where to increase speed and efficiency. The quad pumped part in this video series is also a early example that was such a improvement for efficiency, quad pumped wastes less clock cycles between instructions.
I started with this CPU back then. I added bytes to the eprom for the operating system using a dip switch and programmed address after address using a button so that something useful happens after the start. Output was via hexadecimal system to seven segment LEDs.
Nice CPU video. One little detail that you missed is that what happens if the CPU gets sent an opcode instruction that it isn't designed for. Older 8-bit CPUs had various opcode instructions undefined which could cause problems if used. For the 6502 there are quite a few "undefined" opcodes that one shouldn't use but people found that if they did use them they could carry out certain operations faster as it combined operations together in the CPU. And others would cause everything to crash. It used to be a thing for people writing 8-bit demo code they could use illegal opcodes to speed up their code using less cycles at the risk of breaking the program if later versions of the chip used completely different microcode (the internal instructions inside the CPU that carried out the machine code fed to it). This has proved an issue with implementations of CPUs on FPGA and they don't implement the undocumented instructions. Or you have a chip that is enhanced like the 65CE02 chip used in the Mega 65 which adds instructions to the "gaps" left in the original instruction set that were previously used as undocumented opcodes. There is a document called "The Undocumented Z80 Documented" by Sean Young which covers the unusual effects that can occur when you used these undocumented opcodes on the Z80 processor...
The Z80, 6502 etc used a technique called partial decoding, that is to say that to save space they didn’t bother to fully decode all possible instruction codes. Because of that (and sometimes because the designers had ideas that didn’t work out) you get these op codes that do odd, but sometimes useful things. Better FPGA cores can actually implement these undocumented instructions, and some are based on a decomposition of the original chip logic (known as a net list) so they can implement the chip behaviour precisely.
Early x86 chips (8086 and 8088) could try to execute undefined instructions and maybe have the chip do something. From the 80186 onwards, they'd just trigger an Illegal Opcode exception, which while boring, is useful for compatibility, as you could use that to jump into a software routine that implemented an instruction available on a later processor.
Some undocumented Z80 instructions were useful, especially if you didn't need one or both of the index registers but you did need more 8-bit registers. At the cost of a bit of speed, you could use the IX and IY registers as two 8-bit registers each. (The speed decrease was caused by the time required to load an extra byte during the instruction cycle in order to access the 16-bit registers in 8-bit mode)
I grew up in the 1970's and had the Radio Shack TRS-80 which utilized the Z8080 Microprocessor. I will never forget the day I upgraded my memory from 4K to 16k. I had more memory than I knew wat to do with. I had no choice but to learn Machine code as computers were so slow back then that using interpreters like Basic was not practical at all. When learning machine code, there were 2 things I found most important that you left out and they have a lot to do with the inner workings of the CPU. In fact, they give you extra insight into the innerworkings of the CPU. The first is the PC or Program Counter and the other is the Zero Flag. The PC(Program Counter) is a pointer. It tells the CPU at which memory location the next instruction to be executed is located. So the Add command only needs one number to execute. The code that instructs the CPU to add register A to B. The Load A register command, LDA is a 2 byte instruction, One byte that instructs the CPU to poke a value into the A register followed by the byte representing the value you want poked into A and thus 2 must be added to the PC. A Branch Not Equal to Zero command(BNE) is a 3 byte instruction which requires the byte representing the instruction itself and the next 2 byte representing the memory location to branch to. If you say wanted to BNE to memory location 4A00 then it would the Instruction code followed by the second byte of the destination address then the first byte. BNE 00,4A. Any operation that results in a zero will set the zero flag for comparison operations. If you want to code a loop that performs a task 10 times, you would load A with a value of 9. Remember, 0 counts as a number in computers. That is where the mysterious bit comes into play. If Hex FF or Binary 11111111 is equal to 255 then why is it 256. Because 0 counts as a number. After loading 9 into the Register, you perform a chunk of code, then Decrement the Register which subtracts one from that register. First time though it will be reduced to 8. Since it is not equal to zero yet the flag has not been set to 1. Now when we BNE(Branch not equal to zero) it will take you back to the top of the code just past where you loaded the register with 09. If you branch to the beginning, it will reload 9 into the register, you would never reach 0 and you would be caught in an endless loop. Any coders worst nightmare. If you set the memory location to one past the LDA instead of 2, it will see the 9 you entered as the next instruction code and you will likely crash. So if your code starts at 4A00, then the BNE would have to be to 4A02 as 4A00 contains instruction for load register and 4A01 contains the 9 you want poked into the register. the last time through our loop we decrement from 1 to 0, which sets the zero flag. Now when the computer reaches your BNE instruction, since the register is Zero the Program Counter(PC) will be incremented by 2 past the address you were branching to when not equal to zero. Memory Code 4A00 LDA - Load A register with 4A01 09 - value poking into A register 4A02 Loop content ...Coding in here 4A20 BNE - Branch not equal to Zero 4A21 02 - Second byte of destination memory location 4A22 4A - First byte of destination memory location 4A23 New code starts here so the value of 4A23 is place into the program counter(PC) or it is incremented by two to this address. In an assembler it would look something like Initiate: LDA,9 MainLoop: Loop content Code here BNE MainLoop exit loop to continue your next section of code. A third important aspect to know about is the stack which stores information for JSR type commands. JSR is Jump Saving Return and pushes the memory location your code will be returning to after the jump to other code is executed and the return instruction is encountered. The stack kind of takes care of itself, but if you start pushing and pulling your own values into and out of the stack, then you better keep track. That loop you are in, if it does any pushing or puling from the stack, better be null by the time you are done. If you push and pull equally from the stack you will be fine. But if you push something onto the stack and you don't pull it from the stack before utilizing the return instruction, the address will be buried under the data you pushed onto the stack, thus it will go to whatever memory location is equal to the values on the stack, which could be anywhere.
Love your definition of JSR as Jump Saving Return (address) but if you check the official documentation, and every other reference I’ve ever seen the actual meaning is Jump to Sub-Routine. Like all the other instructions the full details of how the operation is performed is a bit much to capture in 3 letters.
The 8080 had no Z in it as it was not a Zilog component! It was made by Intel. So you either had a Z80 (which I doubt based on the mnemonics you used) or an Intel 8080. No competent programmer would have such a nightmare because they would know how to calculate the correct byte for a relative jump or they would use an assembler and ensure the label for the jump destination is in the correct place. GIGO! Note: If you did indeed use the TRS-80 then you really ought to know it had a Z80 CPU and the instructions you used in your assembly language are not Z80 mnemonics!
@@SpeccyMan I was in the 5th grade when I started coding. I wasn't that distracted by the exact processor that was inside. I was enveloped in figuring out the coding process using a book my brother recommended. Don't remember the name of the book. I might have mentioned the 8080 with a z, as in a reference to its being the predecessor perhaps. I always said the z80 and z8080 as being part of the same thing. One did lead to the other. You are questioning the competence of someone who was in the fifth grade. I guarantee I have written more programs than you will ever imagine. tens of thousands of lines long. Once computers became fast enough I stopped using machine language and switched to Dark Basic in the 1990's which is now AGK or App Game Kit which allows you to code in AGK with multiple languages but compile for any operating device from computers to phones with android or I-phone. I didn't use an assembler, I wrote my code out on graph paper. So yeah, when I had to Branch or jump or jump saving return JSR, I had to insert all the code in between before I would know where the jump would be to. I wrote each memory location on the graph paper and then used arrows that I would connect to the lower area on the graph paper where the jump would end up. I just left the 2 bits for the memory location of the jump empty until I reach the end of intermediate code, finished my arrow and went back to the jump and inserted that memory location for the jump. I then used the crude assembler that came with the computer to input the data. I was using machine language but I never learned a damn thing about how to use an assembler, and I don't even remember if there were labels in that assembler. All I am saying dude is that was a long time ago, who fucking cares. Oh yeah, how about you use your real name coward.
@@MikePerigo I don't care what you say, there was a JSR that stood for Jump Saving Return. A JSR pushes the next memory location onto the stack, to be retrieved later and poked into the PC(program counter) there is a jump to subroutine, that doesn't care about a return address, and that was just a jump statement. Have you actually programmed in machine language? Cause it doesn't make sense you don't know the difference between a JMP(jump) and a JSR(jump saving return) A jump requires no return from subroutine command.
Hey I just posted a video short of a Zilog I-box prototype from 1997. Really cool device my grandpa brought home for me. I had said I wished we could have internet on our TV and he smiled and said “we are working on that” and a few months later this thing came home. Lost it for decades.. just found. Let me know if you’d like more information or videos.
Research "bit-slice processors" and all will become clear. Ben Eater's CPU on a bread board is a bit-slice CPU and is very easy to understand. More complex machines such as the z80 and 6502 are microcode architecture where each op code runs a little program inside the chip which coordinates the various parts of the chip. Start with bit-slice. Beautifully simple and elegant.
The answer to the students question (a true answer, actually) about how a CPU executes binary machine code, is that the CPU is running a program written in micro-code that carries out the necessary operations to implement the machine code.
@@ArneChristianRosenfeldt - since RISC V is an architecture as opposed to an implementation of that architecture, it makes no sense to ask if RISC V is, has been, or will be based on microcode or not, since it can be all of the above.
The introduction into the instruction formats goes into the bare metal layout of the multiplexer. The ISA is tied tightly to single cycle RISC. Though indeed, the smallest implementation uses multiple cycles, but I have not looked into it. Probably the cache logic in Load/Store is replaced by microcode. Also the branch instruction could be split into two ALU cycles: One for compare and the other for the relative addressing of the jump. @@JanBruunAndersen
The array of dots is the instruction decoder (ROM). It's basicly just a lookup which is hooked to the internal control lines of the ALU and other parts of the chip. This decoder ROM is basically the physical predecessor to microcode.
@@ArneChristianRosenfeldt No i meant predecessor, because the microcode in these chips was not writable. Today's microcode ROMs are technically EEPROMs. But maybe that's a definition thing in my head :D
I would have taken a chronological approach, starting with an unpowered CPU. The power is applied to the circuit, a circuit external to the CPU holds the CPU RESET* pin low for a while to allow the power to stabilize and possibly some other circuit(s) to be set to a known state. When the RESET* pin is low, the CPU is stopped. Then the RESET* pin is set high. The CPU then fetches an instruction from memory (ROM) and executes it. The location of this memory address is hard-coded into the CPU chip and is part of the CPU documentation. The CPU then proceeds in the way you describe, fetching instruction after instruction. There can be variations such as the CPU loading other values from ROM before starting the instruction execuction stream.
Don't know if the Z80 has any, but the 6502 has "illegal opcodes" which do strange things according to the decode matrix, as the bits of the opcode define what to do and what to do it to (with the later 65C02 locking out the functions and treating all the undefined as NOP of varying cycle time)
I think Intel 8050 fills the complete matrix in a regular way. No prefix allowed. No illegal opcode. SuperFx has 8 Bit instructions. All legal. So it is not thaaat difficult.
That would be true of almost any similarly designed hardware, especially with no intermediate layers. The 6502 and Z80 achieve all of their functionality in actual hardware. Or, in other words, there is a block of hardware for the ADD operation/instruction and a separate one for the SUBTRACT... Whereas most modern CPUs rely on building Instructions/Operations in microcode.
@@jnharton 6502 and z80 and even SAP-1 as presented by Ben Eater use a standard ALU just like processors did before. ADD and SUB happen in the same block of hardware. Logic also. The ALU does not have illegal control inputs. Likewise it is easy to avoid illegal register names: just have a power of two number of registers! I like the branch encoding in 6502: 2 bits name the flag. One bit has the value for which a branch should be taken. Nothing illegal. Just “always branch” is missing.
@@ArneChristianRosenfeldt It was simply an example, any two (or three) instructions will do. Besides the ALU (aka 'Arithmetic Logic Unit') isn't a magic box, but rather a complex subunit unto itself. There is almost certainly a binary adder in there and probably also some multiplexers. . An "illegal instruction" is just a valid instruction code that wasn't used and trying to execute it activates some part of the overall circuit. Unless it happens, by coincidence usually, to consistently do something useful, it is of no concern. P.S. We're talking about "microprocessors" specifically. There were processors before that, but they were massive by comparison.
The Z80 was a breakthrough device. However, it was soon revealed that it was indeed a slow processor. OK for simple equipment, but got easily bogged down with more complex instruction sets.
The box on the cpu circled the cache, not the registers. You can tell because the are organised in rank and file like a grill. The registers are thousands of times smaller in area. A reprise of Turing's tape 'machine' is still a good place to start.
The CPU die pictured is a Z80 which has no cache, so they are the registers. Register files are also laid out in a very regular grid like structure because there is a lot of commonality between inputs and outputs.
It's hard to give an exact one size fits all sort of explanation to a question like how does the CPU decode instructions. Different Instruction sets allow for different approaches and optimizations when decoding (for example by having categories of instruction types that share the same layout of bit fields), different hardware implementations can be done for those instruction sets depending on target performance, power usage etc. And different architecture types for a certain ISA can have different requirements for a decoder (for example parallel decoding in superscalar CPUs, predecoding, microcodes etc.). Looking at older hardware, which tends to be simpler is a good starting point however. I think working with the abstraction of logic gates and explaining a CPU using a visual logic simulator (like Logisim, DIgital by hneeman etc.) is much more approachable, as it is easier to see individual parts of the CPU and is easier to comprehend. By adding an explanation of how a universal logic gate (NOR or NAND) can be implemented using Transistors, all of the abstraction can be undestood.
@@Ignat99Ignatov I know how semiconductors work and that would of course be necarry to understand if you wanted to know how modern CPUs are typically implemented in hardware, but I doubt that you'd want to know that in order to understand how a computer works on a lower level, after all a modern computer could just as well be implemented with vacuum tubes, relais, mechanically etc. and work exactly the same, so understanding the physics that make semiconductors work for digital circuits isn't necessary. Computer architecture is mostly about the higher level concepts and how they can be implemented with simpler building blocks that can easily be synthesized into hardware (e.g. designing an ALU based on gates and latches in the Register Transfer Level or Hardware Description Language). Looking at a Computer without any abstractions makes it hard to get insight into why a certain p-n junction is required and what it is used for, since that is typically defined by the abstracted design, which is the reason why I am saying that starting with a layer of abstraction would be useful for understanding how computers work. Getting to see the actual hardware on a die is still interesting of course! It just isnt suited well for learning about computer architecture (which as I understood it, was the goal of this video). Sorry for the long reply, your message seemed to critisize my suggestion of using abstractions so I wanted to clarify why I suggested that in the first place and clear up any misunderstandings about my intention. Based on your youtube subscriptions I assume you might be familiar with digital circuit design and that you get what I was trying to say.
@@Ignat99Ignatov so your argument is "I worked on semiconductor designs professionally therefore you don't know how a transistor works and everything you are saying is wrong"? That's quite rude. I'll reiterate that my argument was that in order to understand computer architecture abstraction is useful. Looking at a die and saying "that's where the zero flag is stored" is less useful than explaining the concepts of how everything works together. there is a reason why most books about computer architecture use abstraction and why HDL is popular. Physical implementation of a digital circuit and conceptual design can be separated when learning about the fundamentals, which is especially useful when you don't know anything about either. I'm sure that you know your stuff well, but I learned computer architecture separate from semiconductors and can still understand the concepts well enough to design processors in HDL and simulation software, which is all I need as long as I don't do hardware design professionally. I'm just trying to help as a person who is interested in the topic and was once in a similar position as the student described in the video, which is why I thought that my comment might be relevant. If you were nice about it and provided proper arguments that would be much more helpful to me and anyone else reading the comments. I sure hope that isn't how you treated your students.
Interesting question I guess the CPU is built in such a way that the flows of electrons can open a series of minuscule gates in the transistors of certain kinds and that they can be arranged and built up to do basic computation? Who knows is like magic lol
That's pretty much the basic idea yes. Transistors control the flow of 1 current with another, so by wiring them together in specific ways you can create basic logic gates, which are AND, OR, XOR, NOR, NAND, XNOR, and NOT. From these you build your circuits, such as an adder, a subtractor, a multiplier, and a divider, and you now have the basics of an ALU, which you can feed data into and get results back out of. Combining gates another way lets you store charges as pieces of data. This is how the registers are built. Everything in that chip is just logic gates and sometimes lone transistors turning on and off, and connecting and disconnecting, various wires together such that the bits flowing in result in the desired bits flowing out. This is a gross oversimplification, but hopefully it conveys the idea of how you start building up a processor.
For a complete understanding of how a CPU works, you could do worse than to watch the series of 45 videos by Ben Eater as he constructs the SAP-1 (Simple As Possible) computer on breadboards.
If that kid really wants to understand they just need to go an implement an 8bit cpu from scratch in verilog on an FPGA. Then they'll understand the basics. I do understand why they would ask though, today everything CPU/Computer wise is stupidly complex, you'd struggle to understand how everything actually works. Those of us who started this journey as children back in the 70's and 80's had a much easier time, you really could understand exactly how a CPU and computer worked back then. The manuals even came with the full circuit diagram of the machine.
I would of shown him "Ben eater" an amazing channel, I believe he built an 8bit computer on breadboard with wires(many many wires) lol. he also shows each part in working order, how and why if I remember correctly.
I took a whole semester of Digital Electronics in college to find out how CPUs work, so yeah, not a great topic for a tossed-off answer. It was a pretty cool course, what with designing circuits from scratch with flow charts and voltage timing diagrams, but it also wasn't especially relevant to my programming focus. (Aside from the bit of the final project where I had to hand-assemble a small 8080 program into octal.)
I taught it to primary age as a set of pigeon holes each with a number in it (RAM) and a set of instructions on a piece of paper (if the number in hole 45 is less than 25 then xxx and so on). Then I let them play with it for a bit. Then you talk about encoding the instructions as numbers also in pigeon holes. They kind-of got it at that point.
I have one Jacquard loom programmers, dating from circa 1840. It is a real jumble of wood frames, wires, and springs. Even though there were a few cardboard punched cards with up to six or eight holes, tied to one another by two hewn string loops, I donated them to a museum, along with another such loom programmer. I was 13 when I saw those fascinating pieces of machinery and was fortunate enough to have already had learned some basics of computer science (Turing machines and binary codes), so I could immediately tell my parents those were programming devices of sorts, probably for the designs of textiles in the looms (this house they bought had been a textile manufacture in the 18th and 19th centuries). I am 68 y-o now, with some decades-long experience in computer hardware design and programming. That I still have one of the Jacquard programmers that inspired computer hardware is a source of pride for me.
@SpeccyMan no, every game I ever played on it, I had to spend an hour typing it in and an hour of debugging typos, and then if I was lucky the game was worth playing for half an hour or so. When I turned it off, the game disappeared, no long-term storage. Later, I think we used a cable to connect it to a tape recorder, and you could save and load on cassette tapes, but it was slow, and nothing I ever typed in was really worth saving.
This is a long video that, to be honest, does not explain what your student have asked. If someone wants to know how a computer really works, this is the best explanation ever! @BenEater 8-bit computer
Why so many AI generated melted and artificial graphics? Interesting content but still - why? The "motherboard" has no chance of working and the spurious ceramic caps make no sense. It's like a bad dream or halucinations...
This video by Technology Connections (and I see he’s posted Part 2) th-cam.com/video/ue-1JoJQaEg/w-d-xo.html shows how a technical pinball machine works. The long and the short of it is that it too is using electricity to move stuff about so that it can keep score, change the state of paddles, bumpers etc, detect input. After watching this video of yours, I kind of feel that the two are distantly related, even though they are very, very different beasts.
I would add that even though it LOOKS complicated, it is knowable and understandable! There is a path to unraveling and understanding the complexity of a CPU. Ben Eater has a nice video series where he builds a CPU using only simple logic gates: th-cam.com/video/HyznrdDSSGM/w-d-xo.html Having a degree in electrical engineering is not really a requirement. This is completely within reach for anyone who is curious and persistent.
So i get how a Z80 , and a 6502 work , and i use a model that is some sort of unholy mix between the two when im programming... now here is my question ive been trying to update that mental model for many years but cant seem to find anything up for the task... in my mind a cpu is always loaded 100% and doing instructions as fast as it can each clock cycle, (well with a little instruction decoding as preparation for the actual thing happens) here is the problem,... mow does a modern pc (or even the latest single core ones like a Pentium IV) know when not to run at 100% load , or even clock down its speed , or put it differently why does a loop waiting for input (not from a irq but from something that is polled like usb)not peg the core at 100% load (using a nonblocking check inside a while 1{} for instance
That is actually a good question. I don't know, but modern CPUs can probably monitor their power consumption and temperature and use that as a gauge how hard they're working. Some of it will come from the OS though.
@@ncot_tech ps a few years back i also wouuld have said that your model for how stuff moves around is not right in the beginning of the video, as data cannot move on its own from storage to memory. so before executing a program the currently running program (kernel) would have to move that binary from storage to memory and therefore has to pass trough the cpu aswell so the double arrow between storage and memory is nonexistent, even if both are attached to the systembus. but with the new ps5 i think it might not be wrong anymore
@@ncot_tech ive looked every where i knew where to look , and even ased chatgpt, and alll it could come up with is it has mechanisms that detect idle time... but coulldnt tell mehow it could have idle time in the first place
@@hoefkensjSome CPUs have a halt instruction, such as the x86 CPUs. Another approach is to lower the clock speed (and possibly the voltage) when the OS detects low load on a thread.
@@toby9999 i know but when my cpu is running at a low clock speed its % usage is relative to that clockspeed and not to its maximum clockspeed , so (and even this is simplified ) a cpu does 1 instruction per clock cycle doing 1 instruction every clockcycle at 800mhz would mean 100% use in the same regard doing1 instruction per clock at 5GHz also means 100% use , if you dont want a cpu to do any actual work you can have it do a number of nop's but those are still instructiosn to do so cpu use would still be 100%, and that is the case for a z80 , unfortunatly i dont own a z80(used to have a uProffessor Z80 kit) anymore but im think that was still valid back then, oldes i currently have that works is a 8088 so . having said that i honest by god have ho idea howto write a program that would only use a Z80 for 50% in a way that if it wanted a seperate program could if it wanted use the other 50% and run concurrent with my code to load the cpu 100% and even that would be easier then have it do something only 50% of the time... note that in order for a moder cpu to lower its clock speed , the low use comes first , and that triggers the drop in cpu clock , as in im not doing much atm , i can reduce my clock without much impact .
I was confused by the program to add two numbers. Register b cant be (conveniently) loaded from memory, so num1 is loaded to the accumulator and transfered to b. But b never needs loaded with 0 and the first add a,b instruction is redundant.
Hi @@Bob-1802 , I had to check to be sure but no - $xx in that instruction is what is known as an "absolute" value, its value is fixed within the encoded instruction not elsewhere in memory. If you look at the HEX compilation of the instruction it is 06 00. 06 means LD B and 00 is the absolute value to load. In this syntax if the value is in memory, then it is written in brackets like : LD A,(location) That is called 'direct' addressing, where 'location' is the address of the memory containing the value to be loaded. Z80 can only read single bytes from memory into the A register not the others. This makes its instructions more convoluted than for example 6502 instructions to add numbers. The example here is further obscured, as B never needs loaded with 0 and the first ADD just adds 0 to (num1). That could serve a purpose in a different algorithm where it was perhaps clipped out of for this example, as adding 0 will include a CARRY from any previous add instruction, but here its totally meaningless and does confuse the lesson a bit. I came across a good essay on this here -> cowlark.com/2018-03-18-z80-arithmetic
@@Bob-1802 Not quite. That mnemonic loads the B register with a LITERAL integer value! If such an instruction as was suggested would exist it would be LD B,(xx) which is load B with the contents of memory address xx. An instruction that doesn't exist in the Z80 as the OP wrote.
Tell your students to build a bit adder in No Man's Sky. It is easier than one might think and will show them the electrics and logic of what is happening.
@@RetroRogersLab Autoswitches are [AND] gates. Trying to remember the name of the other... Power Inverter? is an [OR] gate? playing around will inform. The logic of an adder is well documented on YT. The hard part for me, a hobby electron pusher was building it w/o having everything in one long line because in NMS in 'wiring mode' all connections are visible. and it gets hard to tell what you are connecting to.
You stil skipped the explanation in this video too. You jumped directly to explain machine code and how to program machine code (assembly) and a mini routine. No! Your student actually asked "how the decoder works", the only answer is simple. IT IS JUST a mini ROM INSIDE the microprocessor like the z80 and any other processor!, THE fetched instruction is placed in its address bus, and the data stored in that mini rom activates multiples lines, one line can be enhable register B plus READ, it will dump the value in its internal data bus, and enable register A plus WRITE, and so on, that will make ld a,b , in one tick clock, if multistage instruction, a counter can count up the stage, the count also will be feed in the decoder mini ROM , AS THE LOWER addresses bits, that counter can be reset to stage zero, so, one of the bits store in the rom must mark when the instruction ends its execution. Now, the next question the student will ask is, how the ADDRESS DECODER of a ROM works. And the answer is , it is just a binary tree of AND / OR gates!. You can teach the basica of addressind decoder to enable the correct byte /word to be outlut from that mini rom
Before whole cpu in one chip, there was CPUs made of PCB cards, with lots of TTL Logic chips, and the decoder was ACTUALLY A ROM IC chip, 256bytes ROM , 1K ROM, 2 roma in parallel, so it outputs up to 16 enhabled signals, etc. registers were STATIC RAm Chips, and there were ADDER / ALU chips, COUNTERS UP, COUNTERS DOWN chips, it is all in the TTL FAMILY HAND BOOK, THE BLUE BOOKS,. Z80, 8080, 6502, MADE all those huge pcb cards all in one chip.
Microcode are the steps on the processor that tells the processor how to execute the machine code. I takes the machine code and then switches the right hardware lines to move a number from the data bus to the relevant register. This is why some instructions take several clock cycles. That’s what the “unknown” block in your video is doing. en.wikipedia.org/wiki/Microcode. Intel processors have complex instructions set that can perform many steps in on instructions, whereas RISC processors have simpler reduced instructions and smaller microcode.
RISC does not take steps. 32 bit instructions can drive a lot of control lines with the help of minimal combinatorial logic. Nobody looks up 32 bits in ROM.
perhaps you'd be interested to know that, in trouble shooting odd defects in chips, we do something quite similar to the software you have shown, but do it in reality. If the chip is in a package, the package is removed as well as the upper parts of the chip so as to expose the actual circuitry. the circuits that are suspected to be involved "wired up" and powered with instructions to match the failure conditions.A camera is used capture photon emissions and you can actually SEE the transistors turning on and off.
I'd call that a data leak. 😁
I always find that part of development fascinating as well. I'm not in the testing side of things but knowing that we can basically X-ray a chip to figure out what it's doing wrong is invaluable to me on the development and simulation side of things.
When i worked at Philips Semiconductors we had a FIB machine - a Focused Ion Beam.
Using this it was possible to modify an IC - both cutting connections and adding new ones.
Spare gates were usually dotted around the IC on early prototypes so if a signal needed inverting to fix a bug it was possible.
@martinwhitaker5096 when I worked in priduct engingineering bringing the Nindendo Wii out I used a FIB to make a wafer with known array fails to check our testing software
Would be nice to see that in action...I didn't know transistors would emit photons while working.
Far less likely to ever see videos emerging of interesting but advanced machinery and proceses than of people using the end product.
It's all commercial secret in most cases
I'm a chip engineer and I have to say, it's sometimes hard for even us to answer that question. We all specialize so far into our specific parts of a CPU that we may not know how a whole thing works. I did my PhD on ways to optimize active interposer design, which is basically a chip you put other chips on top of to allow them to talk together. Think about a Vega64 or Radeon VII. Those have passive interposers, just a bunch of tiny wires through a hunk of silicon to connect memory to the GPU. My work was on effective ways to implement some logic into that chip, like moving the memory controller down there, or the PCIE bus, or maybe a huge chunk of SRAM for a cache.
I actually work on CPUs, not GPUs, but most people are more familiar with the Vega64 or VII than they are with Intel's Lakefield, which is actually something I worked on. I do regret the awful performance of those chips. A single Sunny Cove core with no hyperthreading paired with 4 old atom cores was never destined for greatness. Up next for me was the bridge dies in Sapphire Rapids, and now the topic of my PhD itself becomes a product in Meteor Lake. The architects make it all work inside, I'm just a glorified electrician wiring shit together.
I’m a sparky of 35 years experience and an electrical inspector, and I can tell you sir that your work just blows me away!
The engineering behind such IC’s is just extraordinary!
👍👍👍
Well done. I asked the same question for 30 years. I finally understood when watching Ben Eaters excellent video series "SAP-1" breadboard computer.
I've been hand assembling machine since 1980: Z-80a, 6502 and x86. decoding the instructions and actually performing the movement of bits, this is a great question. I always thought the bit pattern in each instruction perhaps had something to do with the decoding. This is a great video. Thank You!
I was doing the same... in my case, signetics 2650, 6502, 68000
Might be worth mentioning, when you jump from 4004 to Z80. Those are related chips. The 4004 begot the 8008, the 8008 begot the 8080, the 8080 begot the Z80. All within a very very short period of time.
begat
10:36 Nothing says "I am not an electrical engineeer." as almost spilling some welding wire randomly in a PC motherboard for no reason at all. =) That made me pretty anxious and then I just started laughing XD
solder
@@SpeccyMan Thanks for the correction. I have read about and practiced with electronics in another language, so I sometimes mix up some terms in english.
This was absolutely fantastic. I'm currently trying to teach myself Z80 Assembly as a hobby, many, many years after playing about with Basic as a kid in the 80s. This video has really helped turn the abstract concept of what's going on into a real visual way of seeing it happen. Superb. 😎👍
Now I know why I’m a subscriber. One of the best videos I’ve ever watched on TH-cam. Bravo maestro.
I was until recently quite ignorant of how computer hardware really works. Videos like this really help understanding this fascinating subject.Amazing content, keep it up
Z80 is amazing! I had a MSX back in the day. Thanks for your videos!
Programmers that actually KNOW (I mean like an engineer) how a cpu works are rare, let alone programmers have an idea what CPU instructions actually are. Many programmers don't even know in to string and visa versa is actually done!!!!!!!!!!!!!!! ....I WEEP FOR THE FUTURE!!!!!!!!!!!!!!!!!!!!!!!!!!!!
When i studied this at uni before we even looked at assembly and CPU architectures we build a finite state machine using an EEPROM. Learning how you can use a computer or dip switches to control the address bus and then control outputs attached to the data bus. Then if you got more advanced and had clicking you could take bits from the data bus and feed them back to the address bus and you've got a hardware finite state machine. Which as we found out later on is how you build an instruction decoder. Of course it seemed pointless in first year. Anyway I'm glad I learnt all of this. Helps me understand what the CPU vendors are talking about they talk about branch prediction, out of order execution and super scalar design. Also makes me appreciate just how much work goes into a compiler.
Wow. Thank you for a masterclass into the inner workings of the Z80! Instant subscriber!
I had so much fun back in the 80s writing a lot of Z80 assembler. This is the deepest I’ve ever looked into its inner workings though.
The best way to learn such a profound question is make your own computer CPU. Not with commercial microprocessor like Z80, but with TTL gates or micro-programmable bit-slice chips like Am2900 series. Or in modern days, with FPGA chip and Verilog IC design tool. You can design completely new instruction set of very simple 8 bit CPU like 6502 and simulate it. Only with simple gates and macro blocks. No existing microprocessor is needed. Most tools are free and hardware evaluation board is not expensive. Recommend even for hobbyist.
I've not watched your channel for a few months and I love the progress you've made with editing and sound and script. Really informative. Kids at school are very lucky to have you as their teacher.
Amazing! This is really amazing. I've been wanting to take a deep dive into the Z80 for quite some time now.
This is the big problem with modern x86 computers, anyone can overclock a Core i7 and think they know how a computer works. I thought I was an expert after doing x86 stuff for 10 years, only to discover 8-bit homebrew computers and realise I didn't have a clue what an instruction is, or memory addresses, or really anything at the machine code level and below. Once I built my first Z80 computer that could be programmed in machine language I had pretty much no idea how to even use it.
I feel your pain. I've worked on ARM, RISC-V, and lots of x86-64 machines. I've designed chunks of chips, and yet I'm still often times looking up how to do something in machine code for the times I do have to use it, which are thankfully rarer and rarer now that they've stuck me away in the development department rather than in testing.
I came from the other direction and segment registers just blew my mind moving from 8-bit CPUs to x86.
The inner workings of a Z80 CPU are registers, ROM and functional units (like the ALU (Arithmetic Logic Unit), memory unit etc). The CPU first loads an instruction into the instruction register, which connects to ROM. The ROM is also connected to a step counter, which is reset at the start of each instruction. Each step in the ROM is a list of signals to send to parts of the chip (e.g. use the memory unit to fetch the next byte of the instruction, then store the result in a register). The last step of each sequence in the ROM is to load the next instruction into the instruction register and the cycle repeats.
All CPUs come down to the basic building blocks of logic (AND, OR, NOT), implemented in transistors, wired together and sequenced by the clock. Just like normal programming you break the task down into components, sub components etc, and build each part from simpler parts connected together to achieve the required effect. A register for example is made from a bunch of Flip-Flop units, a Flip-Flop (at its simplest) is made from a couple of NOR gates (that is OR gates with their outputs NOT’ed). The difference with normal programming is that many things happen in parallel (so you can be fetching something from memory, moving the contents of one register to another and adding two registers together at the same time for example).
If you’re interested at that kind of level then take a look at FPGAs, which are basically a bunch of uncommitted logic that your program can wire up any way that you like. You can implement a Z80, 6502, RISC-V etc, or even your own CPU design on them by defining the logic for each unit and how they are wired together. There’s also code available on the web for implementations of the above CPUs, so (once you learn to read the language used - typically Verilog or VHDL) you can see what is happening.
Four month's long last year I spent on successively designing microprocessor architectures, starting by the simplest and ending up in something very similar to the latest multichip mosaic Intel designed recently . . . . that I didn't know about yet. I soon got rid of the step counter, substituting ROM-stored next step numbers next to ROM-stored microcode steps. I added a four-gates circuit that decoded a two-bits microcode portion to control step jumps inside an instruction's microcode by either not jumping, always jumping, or conditionally jumping on the set or on the clear state of a selected condition flag bit, which would otherwise have contributed to the microcode ROM addressing along with the instruction opcode and the step counter state. Other developments were more complex and would require many pages to describe: pipelines, instruction caching, register RAM caching, interrupt control, addressing-space misalignment dealing, multiphase clock, task interleaving, and so on.
I can't afford prototyping the final architectural design. I could one of the not-so-complex designs but what I was interested on was just design: mine would't be able to compete with commercial ones, so, what'd the worth while be? But I'm proud my designing evolution bridged me across 50 years of microprocessor design in just four months, especially since I've ignored what the industry accomplished the last 30 years.
@@wafikiri_ erm, what do you mean “can’t afford prototyping the final architecture design”? I’m assuming you’ve at least synthesised and simulated it to prove that it works? If you have got that far then, unless you need more than 138K of logic elements and 900K of internal memory (which is a LOT for an 8 bit design), then you should be able to implement it in a sub $200 FPGA.
FPGAs are rather more complicated than a simple case of Uncommitted Logic Arrays (ULA).
They often have what are called LUTs (or Lookup Tables), which might actually just be memory containing input/output value mappings.
@@jnharton LUTs are exactly uncommitted logic. They use RAM cells to define a truth table so that they can act like AND/OR/XOR etc, but they are the basic building blocks of a design. LUTs are generally combined with flip-flops and carry chain logic, multiple to a logic block, but what really separates FPGAs from ULAs are dedicated function blocks that also form part of the fabric. These can be anything from routing resources through clock tiles and RAM blocks to DSP tiles. A moderate sized FPGA these days can implement an entire Z80 based computer within itself using those resources.
Just finish up writing a Gameboy emulator at the moment (z80ish) - fun exercise. Z80 instruction set is fairly regular, with some odd exceptions - so 500 instructions crunch down to more like 50 with variations. Doable as about a 2/3 month evening exercise
They're still separate instructions with different opcodes, though.
And because the Z80 is a CISC cpu, it may well have entirely independent logic circuits/paths for each instruction. That's not say there isn't sharing of some parts, just that other parts may be entirely dedicated to just one or two instructions.
Love that. I never would have imagined I could watch a Z80 with the hood off.
To this day, I still love my assembly. It's so rewarding to watch stuff run so crazy fast. I often use DOSBOX and play with the old video ram to make neat moving displays, or show game concepts. Video ram starts at B000H (mono) and B800H (color). 80x25 character grid, each character occupies two bytes, value and attribute. I so miss the old, simple days. Oh, I also miss the TRS-80, that was fun to poke your code then x=usr(0). Yeah....
Love these informational CPU videos. More please :D
I usually explain it like that:
You know what register is ? That's "thing" in which you write bits and it stays there. Now lets look at 74xx chips of standard logic what we see? One of those chips is "register" and as we could see from description it works exactly as you expect from "software" point of view on "register". But now lets look how that value is stored - there is an actual voltages on pins of that register which represent bits you write into it. And with voltages what you can do ? You can control things. Like make LED light or turn on relay and such. So with that voltages you can control actual finite state machine which would generate other sequences of voltages like paper tape with holes make mechanical piano play music. And that finite state machine decompose each "cpu instruction" into series of elementary micro steps like "enable that circuit" or "switch that multiplexer" and such. So thats how cpu "knows" what to do with each instruction opcode.
So you mean unlike in analog circuits, in combinatorial digital circuits information flows in one direction? TTL gates are so complicated. LS ?? ECL is a bit easier just in reality all those current sources are expensive.
one of the most helpful things i found to understand is the cpu reads the instruction stack from memory. this clarifies 1. cpu is not following instruction set literally as it also performs its own mem lookups to read the instructions (and other stuffs like store jump pointers), and 2. the cpu does not store any data itself and when data is fetched any one of several memory or peripherals could respond (has no control). understanding from this angle solidifies what role is and allows to imagine in head a continuous loop of operations.
You talk about microcode vs RISC with dedicated code memory?
@@ArneChristianRosenfeldt right. perhaps it is due to being a programmer, but i had always assumed prior that cpu 'stores and runs the code' in a literal sense. while in reality, it could be processing instructions from a vid card's mapped i/o. where the instructions are stored, or how they are arranged (other than being sequential upon no branch case), is not its responsibility.
So like in a C64 code would run in ROM and still load data into RAM. Or code would run in RAM and jump into subroutines in ROM. Or code would look up Pi in BASIC-ROM. Yeah, indeed there is a trick in for the C64 were the output of a timer chip is used as code (absolute address) to jump to different places depending on the time. @@ta1bubba
When I am asked, “How does it work?” I usually answer, “Just well enough to get by.” That applies to most people, so we can assume machines do the same.
This was cool. Though I know how all this works, it is another thing to know how to explain it properly -- which you did a marvelous job of.
You are getting close. Watch the Ben Eater CPU creation videos for a marvelous series on the topic where he creates one on breadboards.
There is yotube serie where guy is building simple CPU on breadboard with leds to show status of everything. This I reccomend for explanation how cpu work internally. This video here is good too.
I believe you are referring to @BenEater
@@RetroRogersLab Yes.
In the 1970's Radio Shack had a small book that explained the inner workings of the logic circuits in a handheld calculator. In about 1981, the Sinclair ZX-81 became available. It was a real computer, powered by a Z80 and had 1 or 2 kilobytes of RAM depending on the version you got. (I think the original English ZX-81 came with 1K and the U.S. Timex Sinclair ZX-81 version had 2K.). I put one together from a $100 kit. You could write small programs in Sinclair Basic, and if you were ambitious, you could escape from writing programs in Basic to programing directly in Z80 machine code. The Radio Shack book, and the Sinclair ZX-81 were a wonderful place to start to learn the basics of computing. I'm glad I had the experience of learning about computers on the ZX-81.
First of all, thanks an informative video.
I'd just like add a few things to the subject:
Nowadays the programs (the opcodes) are stored in the main memory but it was not always the case. Originally the "computing machines" were more thought as super calculators because it was where the need arose first. Mechanical or electromechanical devices had data separated from the "code" that took the form of a hardware configuration. This was tedious and error prone.
The idea of making the configuration virtual by storing the configuration as digital values was a novelty that got the name "stored program computer". Some CPU architectures still use separate spaces for code and data (Harvard architecture).
All the "regular" machines that we use today come from the famous Von Neumann paper (the original sin of computing). This paper was a mathematician's view on how to use an automated device to perform calculus and only this; using that era's technology. How to decode a formula expressed in text form and compute (evaluate) its result. The issue is that that specific paper for a specific task was generalized.
It went from: This is A way to build a computing device
to: This is THE (only) way to build a computing device
There are many other structures and means to perform computing tasks, many of which are not sequential. The first one that comes to mind is neural networks. But unfortunately the Von Neumann model (Y shape with arrows) has "polluted" the minds of all engineers. This is why it is often difficult to get away from the "sequential mind bias" (parallel programming).
The machine code of a CPU is the "grammar" of that CPU and gives it it's "personality". This is carefully crafted because it is what the programmer will "see" from the CPU. This is evidenced by the acronym ISA (Instruction Set ARCHITECTURE). The coder is supposed NOT to know and not need to know how the instructions are actually implemented. This is "at the discretion of the implementer", a form of "API" to the CPU.
The only way "construct complexity" out of thin air is to implement an FSM (Finite State Machine) in some way. This state machine is the link between the instruction opcode(s) and the internal logic components that actualy perform the intended operation. This is why the internal logic can be completely different from what is seen from outside (the Z80 has a 4 bits ALU, see Ken Sheriff's work). The instruction decode is only the first stage of this FSM. An operation symbolised by an opcode is rarely if ever performed in a single step of the internal logic.
Most 8 bit CPUs like (8080, Z80, 68xx, 65xx) implement this in combinatory logic with a special note for the 65xx family that uses a ROM as a first stage.
Because the CISC (x86 etc...) CPUs have become increasingly complex this has been replaced by microcode.
The actual hardwired state machine is replaced by an internal, hidden CPU that runs that microcode. Because there has been an inflation of modes, flags, security conditions etc... the number of clock cycles to execute a single top level assembler opcode has increased (sometimes 20 or 30).
As opposed to this the RISC design paradigm does the exact opposite. Most of the RISC CPUs seek to increase execution sped by using as few cycles per instructions as possible. The only business of the CPU is to EXECUTE code not make decisions it has no information about. All complex decisions are delegated to the compiler.
This is only the CPUs that are inspired by the Von Neumann model. There are many "funny" CPUs even designs that seek to use analogue attributes of electronic components. These are usually, very application specific and require interfacing to the standard digital world.
What do you mean by “few cycles”? 6502 needs two cycles for a lot of instructions. The SH2 in the SEGA Saturn, the MIPS in The PlayStation need one cycle per instruction. Though these consoles cheat. You only have limited SRAM. DRAM access takes many cycles.
@@ArneChristianRosenfeldt By "as few cycles per instructions as possible" I was referring to a design goal of the ISAs. 1 cycle clearly qualifies as "as few cycles per instructions as possible".
Basically RISC designs have so few clock cycles that the execution time is equal to the fetch time of the individual bytes from memory. There are no added cycles where the CPU is just churning inside.
The 6502 by its design resembles a RISC design but Sophie Wilson (designer of ARM) said it is on a category of its own. I am not sure why, but maybe because of the 0 page operations that can operate directly on memory and do not obey the "load / store" principle of RISC where the memory transfers are separate instructions from the compute operations.
As opposed to RISC, CISC designs do not have as a primary goal to simplify or reduce the clock cycles of the execution. The hope is that by having more "complete" instructions that more closely matches the C language for example, the instruction count would be reduced to for the same task. The drawback of this is that it leads to complex decisions at runtime.
So, CPUs like the 6502 (even if it is not a RISC CPU per say) will tend to have an execution time that is exactly the byte fetch count when instructions are executed on registers. So an instruction with implicit operand on the accumulator will only use 1 cycle. When using 0 page instructions or indexed addressing the number of byte fetches will extend the execution of the single instruction because the instruction become memory bound.
These memory bound operations would not be allowed on a real load / store architecture. You would have to:
- load in a register
- compute
store the register back in memory
as 3 distinct operations.
So to summarize, when you are in the very specific case that you have a CPU that is both RISC and load / store (AVR, MIPS) you are correct, most of the instructions will be 1 cycle, unless they need to be extended by wait states for slow devices. This is only valid because direct operations on memory are not allowed.
I just meant that MIPS as invented at the university in 1980 aimed at 1 cycle per instruction. I just mean that they replaced the complicated "few" optimization problem with a KISS: one cycle or we don't do it. 6502 actually is hardcoded to use two cycles to fetch. One cycle is wasted on single byte instructions. The CMOS version corrects this. Probably it was never really needed, but simplified things. The zero page thing was seen in other microprocessors of the time. RCA 1802 had this external register file, where you still had to load values into the ALU first. Then there was this "first 16bit" CPU which actually just stored 16 bit values in a zero page like a 6502, but could already switch the page for mutli-tasking. So the motivation was not to implement complex instructions, but to outsource as many transistors as possible to off-the-shelf memory. Z80 has 16 bytes. It could really have supported a lot of reg-reg instructions and hence load store may have made sense. reg-mem saves you register transistors because the other operand is loaded into a temporary register, which is free again in the next instruction . The same register can be used first for the addressing mode and then for the reg-mem operation. CMP and test ( zero register target ) also save. RISC CPUs use the same addressing modes. Only Postincrement and Predecrement or so is missing. I don't really know why. Yeah, would need to write back two values into the register file at once. But that would also be useful for a two register pushy shift. Source: Bitfield, shift value, write backs: left half and right half. I guess that MIPS did not want to carry around circuitry which is not used by most instructions. ARM has a dedicated stack pointer just like the 8-bit CPUs. @@danielleblanc5923
When I was digging through Vega BIOS files, I seen some strings that started with $. I thought "maybe these are commands, since even modern PC components have ancient guts that get things started and handle plenty of the lowest level functions". Here I am over a year later hearing a teacher confirming that thinking.
You brought back a lot of memories. LD B,0? XOR B,B! Probably the first "improvement" I learnt. Keep well m8. 😃 🖖 🇦🇺
In Z80 XOR works with "A" registry only. Z80 have't XOR B,B operation.
@@maxims.4882 So far as I recall you are correct. It's been a lot of years since I last used asm on a TRS-80. So thanks for the correction m8. I've probably forgotten more than I knew. 🤣
@@maxims.4882 That isn't entirely accurate as the XOR r instruction specifies any of A,B,C,D,E,H and L to be r so an XOR B is possible but it should more accurately be seen as meaning XOR A,B since the result will still end up in the accumulator and not provide the result the OP suggested. The only way to zero the B register is LD B,0. There is no shortcut.
Doing some assembly coding on the z80 teaches you a lot of all this. Because the user manual for the CPU actually lists the encoding with each opcode.
Teacher (whispering into cuff): Yeah, we got another one. Yep, asking too many questions. Yeah, I'll keep him here for pickup.
Although this i a good attempt. If you want to dive much deeper in the working of each part of a cpu. This channel www.youtube.com/@weirdboyjim/videos has made a quad pumped cpu from scratch. Quad pumped is more advanced then the 6502 or Z80 but it is still the beginning of CPU development before clock doubling and the use of caches. You will need to watch many episodes to see the whole working. And sometimes you might need to google how a certain electronic component works or what it's basic function is to understand the video but this is as close as you can get to fully understand it. The later added technologies like branch prediction, clock doubling and caches where to increase speed and efficiency. The quad pumped part in this video series is also a early example that was such a improvement for efficiency, quad pumped wastes less clock cycles between instructions.
Good explanation right down to the registers and logic gates!
I started with this CPU back then. I added bytes to the eprom for the operating system using a dip switch and programmed address after address using a button so that something useful happens after the start. Output was via hexadecimal system to seven segment LEDs.
"the cpu. how's it work"
(cue the full contents of a bachelor's degree in computer engineering)
Nice CPU video. One little detail that you missed is that what happens if the CPU gets sent an opcode instruction that it isn't designed for. Older 8-bit CPUs had various opcode instructions undefined which could cause problems if used. For the 6502 there are quite a few "undefined" opcodes that one shouldn't use but people found that if they did use them they could carry out certain operations faster as it combined operations together in the CPU. And others would cause everything to crash. It used to be a thing for people writing 8-bit demo code they could use illegal opcodes to speed up their code using less cycles at the risk of breaking the program if later versions of the chip used completely different microcode (the internal instructions inside the CPU that carried out the machine code fed to it). This has proved an issue with implementations of CPUs on FPGA and they don't implement the undocumented instructions. Or you have a chip that is enhanced like the 65CE02 chip used in the Mega 65 which adds instructions to the "gaps" left in the original instruction set that were previously used as undocumented opcodes. There is a document called "The Undocumented Z80 Documented" by Sean Young which covers the unusual effects that can occur when you used these undocumented opcodes on the Z80 processor...
The Z80, 6502 etc used a technique called partial decoding, that is to say that to save space they didn’t bother to fully decode all possible instruction codes. Because of that (and sometimes because the designers had ideas that didn’t work out) you get these op codes that do odd, but sometimes useful things. Better FPGA cores can actually implement these undocumented instructions, and some are based on a decomposition of the original chip logic (known as a net list) so they can implement the chip behaviour precisely.
Early x86 chips (8086 and 8088) could try to execute undefined instructions and maybe have the chip do something. From the 80186 onwards, they'd just trigger an Illegal Opcode exception, which while boring, is useful for compatibility, as you could use that to jump into a software routine that implemented an instruction available on a later processor.
Some undocumented Z80 instructions were useful, especially if you didn't need one or both of the index registers but you did need more 8-bit registers. At the cost of a bit of speed, you could use the IX and IY registers as two 8-bit registers each. (The speed decrease was caused by the time required to load an extra byte during the instruction cycle in order to access the 16-bit registers in 8-bit mode)
Intel 8050 has all opcodes documented. A quarter of them are mov
I grew up in the 1970's and had the Radio Shack TRS-80 which utilized the Z8080 Microprocessor. I will never forget the day I upgraded my memory from 4K to 16k. I had more memory than I knew wat to do with. I had no choice but to learn Machine code as computers were so slow back then that using interpreters like Basic was not practical at all. When learning machine code, there were 2 things I found most important that you left out and they have a lot to do with the inner workings of the CPU. In fact, they give you extra insight into the innerworkings of the CPU.
The first is the PC or Program Counter and the other is the Zero Flag. The PC(Program Counter) is a pointer. It tells the CPU at which memory location the next instruction to be executed is located. So the Add command only needs one number to execute. The code that instructs the CPU to add register A to B. The Load A register command, LDA is a 2 byte instruction, One byte that instructs the CPU to poke a value into the A register followed by the byte representing the value you want poked into A and thus 2 must be added to the PC. A Branch Not Equal to Zero command(BNE) is a 3 byte instruction which requires the byte representing the instruction itself and the next 2 byte representing the memory location to branch to. If you say wanted to BNE to memory location 4A00 then it would the Instruction code followed by the second byte of the destination address then the first byte. BNE 00,4A.
Any operation that results in a zero will set the zero flag for comparison operations. If you want to code a loop that performs a task 10 times, you would load A with a value of 9. Remember, 0 counts as a number in computers. That is where the mysterious bit comes into play. If Hex FF or Binary 11111111 is equal to 255 then why is it 256. Because 0 counts as a number. After loading 9 into the Register, you perform a chunk of code, then Decrement the Register which subtracts one from that register. First time though it will be reduced to 8. Since it is not equal to zero yet the flag has not been set to 1. Now when we BNE(Branch not equal to zero) it will take you back to the top of the code just past where you loaded the register with 09. If you branch to the beginning, it will reload 9 into the register, you would never reach 0 and you would be caught in an endless loop. Any coders worst nightmare. If you set the memory location to one past the LDA instead of 2, it will see the 9 you entered as the next instruction code and you will likely crash. So if your code starts at 4A00, then the BNE would have to be to 4A02 as 4A00 contains instruction for load register and 4A01 contains the 9 you want poked into the register. the last time through our loop we decrement from 1 to 0, which sets the zero flag. Now when the computer reaches your BNE instruction, since the register is Zero the Program Counter(PC) will be incremented by 2 past the address you were branching to when not equal to zero.
Memory Code
4A00 LDA - Load A register with
4A01 09 - value poking into A register
4A02 Loop content
...Coding in here
4A20 BNE - Branch not equal to Zero
4A21 02 - Second byte of destination memory location
4A22 4A - First byte of destination memory location
4A23 New code starts here so the value of 4A23 is place into the program counter(PC) or it is incremented by two to this address.
In an assembler it would look something like
Initiate: LDA,9
MainLoop: Loop content
Code here
BNE MainLoop
exit loop to continue your next section of code.
A third important aspect to know about is the stack which stores information for JSR type commands. JSR is Jump Saving Return and pushes the memory location your code will be returning to after the jump to other code is executed and the return instruction is encountered. The stack kind of takes care of itself, but if you start pushing and pulling your own values into and out of the stack, then you better keep track. That loop you are in, if it does any pushing or puling from the stack, better be null by the time you are done. If you push and pull equally from the stack you will be fine. But if you push something onto the stack and you don't pull it from the stack before utilizing the return instruction, the address will be buried under the data you pushed onto the stack, thus it will go to whatever memory location is equal to the values on the stack, which could be anywhere.
Love your definition of JSR as Jump Saving Return (address) but if you check the official documentation, and every other reference I’ve ever seen the actual meaning is Jump to Sub-Routine. Like all the other instructions the full details of how the operation is performed is a bit much to capture in 3 letters.
The 8080 had no Z in it as it was not a Zilog component! It was made by Intel. So you either had a Z80 (which I doubt based on the mnemonics you used) or an Intel 8080. No competent programmer would have such a nightmare because they would know how to calculate the correct byte for a relative jump or they would use an assembler and ensure the label for the jump destination is in the correct place. GIGO!
Note: If you did indeed use the TRS-80 then you really ought to know it had a Z80 CPU and the instructions you used in your assembly language are not Z80 mnemonics!
@@SpeccyMan I was in the 5th grade when I started coding. I wasn't that distracted by the exact processor that was inside. I was enveloped in figuring out the coding process using a book my brother recommended. Don't remember the name of the book. I might have mentioned the 8080 with a z, as in a reference to its being the predecessor perhaps. I always said the z80 and z8080 as being part of the same thing. One did lead to the other. You are questioning the competence of someone who was in the fifth grade. I guarantee I have written more programs than you will ever imagine. tens of thousands of lines long. Once computers became fast enough I stopped using machine language and switched to Dark Basic in the 1990's which is now AGK or App Game Kit which allows you to code in AGK with multiple languages but compile for any operating device from computers to phones with android or I-phone. I didn't use an assembler, I wrote my code out on graph paper. So yeah, when I had to Branch or jump or jump saving return JSR, I had to insert all the code in between before I would know where the jump would be to. I wrote each memory location on the graph paper and then used arrows that I would connect to the lower area on the graph paper where the jump would end up. I just left the 2 bits for the memory location of the jump empty until I reach the end of intermediate code, finished my arrow and went back to the jump and inserted that memory location for the jump. I then used the crude assembler that came with the computer to input the data. I was using machine language but I never learned a damn thing about how to use an assembler, and I don't even remember if there were labels in that assembler. All I am saying dude is that was a long time ago, who fucking cares. Oh yeah, how about you use your real name coward.
@@MikePerigo I don't care what you say, there was a JSR that stood for Jump Saving Return. A JSR pushes the next memory location onto the stack, to be retrieved later and poked into the PC(program counter) there is a jump to subroutine, that doesn't care about a return address, and that was just a jump statement. Have you actually programmed in machine language? Cause it doesn't make sense you don't know the difference between a JMP(jump) and a JSR(jump saving return) A jump requires no return from subroutine command.
Typical answer to the kid would be 'don't worry, how computers work is not a topic in the exams'
Hey I just posted a video short of a Zilog I-box prototype from 1997. Really cool device my grandpa brought home for me. I had said I wished we could have internet on our TV and he smiled and said “we are working on that” and a few months later this thing came home. Lost it for decades.. just found. Let me know if you’d like more information or videos.
Research "bit-slice processors" and all will become clear. Ben Eater's CPU on a bread board is a bit-slice CPU and is very easy to understand. More complex machines such as the z80 and 6502 are microcode architecture where each op code runs a little program inside the chip which coordinates the various parts of the chip. Start with bit-slice. Beautifully simple and elegant.
This was very informative. Enjoyed muchly.
This is quite simply a superb treatment on the subject. Thank you.
The answer to the students question (a true answer, actually) about how a CPU executes binary machine code, is that the CPU is running a program written in micro-code that carries out the necessary operations to implement the machine code.
And what if it is RISCV without microcode?
@@ArneChristianRosenfeldt - since RISC V is an architecture as opposed to an implementation of that architecture, it makes no sense to ask if RISC V is, has been, or will be based on microcode or not, since it can be all of the above.
The introduction into the instruction formats goes into the bare metal layout of the multiplexer. The ISA is tied tightly to single cycle RISC. Though indeed, the smallest implementation uses multiple cycles, but I have not looked into it. Probably the cache logic in Load/Store is replaced by microcode. Also the branch instruction could be split into two ALU cycles: One for compare and the other for the relative addressing of the jump. @@JanBruunAndersen
The array of dots is the instruction decoder (ROM). It's basicly just a lookup which is hooked to the internal control lines of the ALU and other parts of the chip. This decoder ROM is basically the physical predecessor to microcode.
You mean implementation?
Basically you cannot correctly spell basically.
@@SpeccyMan My god, the world will end because i misspelled a word :P
@@ArneChristianRosenfeldt No i meant predecessor, because the microcode in these chips was not writable. Today's microcode ROMs are technically EEPROMs. But maybe that's a definition thing in my head :D
@@KitsuneAlex I read that microcode was invented and the name coined in 1968 before the invention of microprocessor.
DrMattRegan did a fantastic set of videos on the details of most stages of CPU from a digital logic perspective.
I would have taken a chronological approach, starting with an unpowered CPU. The power is applied to the circuit, a circuit external to the CPU holds the CPU RESET* pin low for a while to allow the power to stabilize and possibly some other circuit(s) to be set to a known state. When the RESET* pin is low, the CPU is stopped. Then the RESET* pin is set high. The CPU then fetches an instruction from memory (ROM) and executes it. The location of this memory address is hard-coded into the CPU chip and is part of the CPU documentation. The CPU then proceeds in the way you describe, fetching instruction after instruction.
There can be variations such as the CPU loading other values from ROM before starting the instruction execuction stream.
Just perfect explanation - great job👍👍👍
Thank you 🙂
Don't know if the Z80 has any, but the 6502 has "illegal opcodes" which do strange things according to the decode matrix, as the bits of the opcode define what to do and what to do it to (with the later 65C02 locking out the functions and treating all the undefined as NOP of varying cycle time)
It does. 😊
I think Intel 8050 fills the complete matrix in a regular way. No prefix allowed. No illegal opcode. SuperFx has 8 Bit instructions. All legal. So it is not thaaat difficult.
That would be true of almost any similarly designed hardware, especially with no intermediate layers.
The 6502 and Z80 achieve all of their functionality in actual hardware. Or, in other words, there is a block of hardware for the ADD operation/instruction and a separate one for the SUBTRACT...
Whereas most modern CPUs rely on building Instructions/Operations in microcode.
@@jnharton 6502 and z80 and even SAP-1 as presented by Ben Eater use a standard ALU just like processors did before. ADD and SUB happen in the same block of hardware. Logic also. The ALU does not have illegal control inputs. Likewise it is easy to avoid illegal register names: just have a power of two number of registers! I like the branch encoding in 6502: 2 bits name the flag. One bit has the value for which a branch should be taken. Nothing illegal. Just “always branch” is missing.
@@ArneChristianRosenfeldt It was simply an example, any two (or three) instructions will do.
Besides the ALU (aka 'Arithmetic Logic Unit') isn't a magic box, but rather a complex subunit unto itself.
There is almost certainly a binary adder in there and probably also some multiplexers.
.
An "illegal instruction" is just a valid instruction code that wasn't used and trying to execute it activates some part of the overall circuit.
Unless it happens, by coincidence usually, to consistently do something useful, it is of no concern.
P.S.
We're talking about "microprocessors" specifically. There were processors before that, but they were massive by comparison.
The Z80 was a breakthrough device. However, it was soon revealed that it was indeed a slow processor. OK for simple equipment, but got easily bogged down with more complex instruction sets.
To be fair, This stuff is a bit above GCSE, Wasn't shown the inner cogs of a Z80 until I got in to College.
I would recommend the book But How Do It Know to everyone for one of the best explanations on how a microprocessor works.
The box on the cpu circled the cache, not the registers. You can tell because the are organised in rank and file like a grill. The registers are thousands of times smaller in area. A reprise of Turing's tape 'machine' is still a good place to start.
The CPU die pictured is a Z80 which has no cache, so they are the registers.
Register files are also laid out in a very regular grid like structure because there is a lot of commonality between inputs and outputs.
New Sub! Detroit, Michigan, US
Well Done Sir, Excellent Video!
It's hard to give an exact one size fits all sort of explanation to a question like how does the CPU decode instructions.
Different Instruction sets allow for different approaches and optimizations when decoding (for example by having categories of instruction types that share the same layout of bit fields), different hardware implementations can be done for those instruction sets depending on target performance, power usage etc. And different architecture types for a certain ISA can have different requirements for a decoder (for example parallel decoding in superscalar CPUs, predecoding, microcodes etc.).
Looking at older hardware, which tends to be simpler is a good starting point however.
I think working with the abstraction of logic gates and explaining a CPU using a visual logic simulator (like Logisim, DIgital by hneeman etc.) is much more approachable, as it is easier to see individual parts of the CPU and is easier to comprehend. By adding an explanation of how a universal logic gate (NOR or NAND) can be implemented using Transistors, all of the abstraction can be undestood.
@@Ignat99Ignatov
I know how semiconductors work and that would of course be necarry to understand if you wanted to know how modern CPUs are typically implemented in hardware, but I doubt that you'd want to know that in order to understand how a computer works on a lower level, after all a modern computer could just as well be implemented with vacuum tubes, relais, mechanically etc. and work exactly the same, so understanding the physics that make semiconductors work for digital circuits isn't necessary.
Computer architecture is mostly about the higher level concepts and how they can be implemented with simpler building blocks that can easily be synthesized into hardware (e.g. designing an ALU based on gates and latches in the Register Transfer Level or Hardware Description Language).
Looking at a Computer without any abstractions makes it hard to get insight into why a certain p-n junction is required and what it is used for, since that is typically defined by the abstracted design, which is the reason why I am saying that starting with a layer of abstraction would be useful for understanding how computers work.
Getting to see the actual hardware on a die is still interesting of course! It just isnt suited well for learning about computer architecture (which as I understood it, was the goal of this video).
Sorry for the long reply, your message seemed to critisize my suggestion of using abstractions so I wanted to clarify why I suggested that in the first place and clear up any misunderstandings about my intention.
Based on your youtube subscriptions I assume you might be familiar with digital circuit design and that you get what I was trying to say.
@@Ignat99Ignatov so your argument is "I worked on semiconductor designs professionally therefore you don't know how a transistor works and everything you are saying is wrong"? That's quite rude.
I'll reiterate that my argument was that in order to understand computer architecture abstraction is useful. Looking at a die and saying "that's where the zero flag is stored" is less useful than explaining the concepts of how everything works together.
there is a reason why most books about computer architecture use abstraction and why HDL is popular.
Physical implementation of a digital circuit and conceptual design can be separated when learning about the fundamentals, which is especially useful when you don't know anything about either.
I'm sure that you know your stuff well, but I learned computer architecture separate from semiconductors and can still understand the concepts well enough to design processors in HDL and simulation software, which is all I need as long as I don't do hardware design professionally.
I'm just trying to help as a person who is interested in the topic and was once in a similar position as the student described in the video, which is why I thought that my comment might be relevant.
If you were nice about it and provided proper arguments that would be much more helpful to me and anyone else reading the comments. I sure hope that isn't how you treated your students.
Yeah, the famous Jump, Immediate, and Register instruction subsets of MIPS. But Arm and power are so complicated
its an unsettling thought that im old enuf to have actually programmed a Z80...but only as an intel 8080...
Interesting question I guess the CPU is built in such a way that the flows of electrons can open a series of minuscule gates in the transistors of certain kinds and that they can be arranged and built up to do basic computation?
Who knows is like magic lol
That's pretty much the basic idea yes. Transistors control the flow of 1 current with another, so by wiring them together in specific ways you can create basic logic gates, which are AND, OR, XOR, NOR, NAND, XNOR, and NOT. From these you build your circuits, such as an adder, a subtractor, a multiplier, and a divider, and you now have the basics of an ALU, which you can feed data into and get results back out of.
Combining gates another way lets you store charges as pieces of data. This is how the registers are built. Everything in that chip is just logic gates and sometimes lone transistors turning on and off, and connecting and disconnecting, various wires together such that the bits flowing in result in the desired bits flowing out.
This is a gross oversimplification, but hopefully it conveys the idea of how you start building up a processor.
it is very instructing, thank you !
For a complete understanding of how a CPU works, you could do worse than to watch the series of 45 videos by Ben Eater as he constructs the SAP-1 (Simple As Possible) computer on breadboards.
If that kid really wants to understand they just need to go an implement an 8bit cpu from scratch in verilog on an FPGA. Then they'll understand the basics. I do understand why they would ask though, today everything CPU/Computer wise is stupidly complex, you'd struggle to understand how everything actually works. Those of us who started this journey as children back in the 70's and 80's had a much easier time, you really could understand exactly how a CPU and computer worked back then. The manuals even came with the full circuit diagram of the machine.
Better still implement an 8-bit CPU using 74 series TTL logic! That would be a real challenge.
@@SpeccyMan That is fun. Could make it even more fun and only allow them to use a single kind of gate such as the magical NAND. :D
I would of shown him "Ben eater" an amazing channel, I believe he built an 8bit computer on breadboard with wires(many many wires) lol.
he also shows each part in working order, how and why if I remember correctly.
also great video, keep up your great work.
I took a whole semester of Digital Electronics in college to find out how CPUs work, so yeah, not a great topic for a tossed-off answer. It was a pretty cool course, what with designing circuits from scratch with flow charts and voltage timing diagrams, but it also wasn't especially relevant to my programming focus. (Aside from the bit of the final project where I had to hand-assemble a small 8080 program into octal.)
What CPU should be the first implemented in Graphene? The venerable 6502? z80? cray 1?
I taught it to primary age as a set of pigeon holes each with a number in it (RAM) and a set of instructions on a piece of paper (if the number in hole 45 is less than 25 then xxx and so on). Then I let them play with it for a bit.
Then you talk about encoding the instructions as numbers also in pigeon holes. They kind-of got it at that point.
I have one Jacquard loom programmers, dating from circa 1840. It is a real jumble of wood frames, wires, and springs. Even though there were a few cardboard punched cards with up to six or eight holes, tied to one another by two hewn string loops, I donated them to a museum, along with another such loom programmer. I was 13 when I saw those fascinating pieces of machinery and was fortunate enough to have already had learned some basics of computer science (Turing machines and binary codes), so I could immediately tell my parents those were programming devices of sorts, probably for the designs of textiles in the looms (this house they bought had been a textile manufacture in the 18th and 19th centuries). I am 68 y-o now, with some decades-long experience in computer hardware design and programming. That I still have one of the Jacquard programmers that inspired computer hardware is a source of pride for me.
I've seen this done where each child roleplays the different components.
My first computer was a trs80 with a z80 chip in it. It was like star trek in our house when we got that 😆
But did you ever play the classic "Star Trek" game on it?
@SpeccyMan no, every game I ever played on it, I had to spend an hour typing it in and an hour of debugging typos, and then if I was lucky the game was worth playing for half an hour or so. When I turned it off, the game disappeared, no long-term storage. Later, I think we used a cable to connect it to a tape recorder, and you could save and load on cassette tapes, but it was slow, and nothing I ever typed in was really worth saving.
This is a long video that, to be honest, does not explain what your student have asked.
If someone wants to know how a computer really works, this is the best explanation ever! @BenEater 8-bit computer
Why so many AI generated melted and artificial graphics? Interesting content but still - why? The "motherboard" has no chance of working and the spurious ceramic caps make no sense. It's like a bad dream or halucinations...
This video by Technology Connections (and I see he’s posted Part 2) th-cam.com/video/ue-1JoJQaEg/w-d-xo.html shows how a technical pinball machine works. The long and the short of it is that it too is using electricity to move stuff about so that it can keep score, change the state of paddles, bumpers etc, detect input.
After watching this video of yours, I kind of feel that the two are distantly related, even though they are very, very different beasts.
I would add that even though it LOOKS complicated, it is knowable and understandable!
There is a path to unraveling and understanding the complexity of a CPU. Ben Eater has a nice video series where he builds a CPU using only simple logic gates: th-cam.com/video/HyznrdDSSGM/w-d-xo.html
Having a degree in electrical engineering is not really a requirement. This is completely within reach for anyone who is curious and persistent.
So i get how a Z80 , and a 6502 work , and i use a model that is some sort of unholy mix between the two when im programming... now here is my question ive been trying to update that mental model for many years but cant seem to find anything up for the task... in my mind a cpu is always loaded 100% and doing instructions as fast as it can each clock cycle, (well with a little instruction decoding as preparation for the actual thing happens) here is the problem,... mow does a modern pc (or even the latest single core ones like a Pentium IV) know when not to run at 100% load , or even clock down its speed , or put it differently why does a loop waiting for input (not from a irq but from something that is polled like usb)not peg the core at 100% load (using a nonblocking check inside a while 1{} for instance
That is actually a good question. I don't know, but modern CPUs can probably monitor their power consumption and temperature and use that as a gauge how hard they're working. Some of it will come from the OS though.
@@ncot_tech ps a few years back i also wouuld have said that your model for how stuff moves around is not right in the beginning of the video, as data cannot move on its own from storage to memory. so before executing a program the currently running program (kernel) would have to move that binary from storage to memory and therefore has to pass trough the cpu aswell so the double arrow between storage and memory is nonexistent, even if both are attached to the systembus. but with the new ps5 i think it might not be wrong anymore
@@ncot_tech ive looked every where i knew where to look , and even ased chatgpt, and alll it could come up with is it has mechanisms that detect idle time... but coulldnt tell mehow it could have idle time in the first place
@@hoefkensjSome CPUs have a halt instruction, such as the x86 CPUs. Another approach is to lower the clock speed (and possibly the voltage) when the OS detects low load on a thread.
@@toby9999 i know but when my cpu is running at a low clock speed its % usage is relative to that clockspeed and not to its maximum clockspeed , so (and even this is simplified ) a cpu does 1 instruction per clock cycle doing 1 instruction every clockcycle at 800mhz would mean 100% use in the same regard doing1 instruction per clock at 5GHz also means 100% use , if you dont want a cpu to do any actual work you can have it do a number of nop's but those are still instructiosn to do so cpu use would still be 100%, and that is the case for a z80 , unfortunatly i dont own a z80(used to have a uProffessor Z80 kit) anymore but im think that was still valid back then, oldes i currently have that works is a 8088 so . having said that i honest by god have ho idea howto write a program that would only use a Z80 for 50% in a way that if it wanted a seperate program could if it wanted use the other 50% and run concurrent with my code to load the cpu 100% and even that would be easier then have it do something only 50% of the time... note that in order for a moder cpu to lower its clock speed , the low use comes first , and that triggers the drop in cpu clock , as in im not doing much atm , i can reduce my clock without much impact .
I was confused by the program to add two numbers. Register b cant be (conveniently) loaded from memory, so num1 is loaded to the accumulator and transfered to b. But b never needs loaded with 0 and the first add a,b instruction is redundant.
I wonder!
Doesn't the instruction *LD B,$xx* load a value from memory to B register?
Hi @@Bob-1802 ,
I had to check to be sure but no - $xx in that instruction is what is known as an "absolute" value, its value is fixed within the encoded instruction not elsewhere in memory. If you look at the HEX compilation of the instruction it is 06 00. 06 means LD B and 00 is the absolute value to load.
In this syntax if the value is in memory, then it is written in brackets like : LD A,(location) That is called 'direct' addressing, where 'location' is the address of the memory containing the value to be loaded. Z80 can only read single bytes from memory into the A register not the others. This makes its instructions more convoluted than for example 6502 instructions to add numbers.
The example here is further obscured, as B never needs loaded with 0 and the first ADD just adds 0 to (num1). That could serve a purpose in a different algorithm where it was perhaps clipped out of for this example, as adding 0 will include a CARRY from any previous add instruction, but here its totally meaningless and does confuse the lesson a bit.
I came across a good essay on this here ->
cowlark.com/2018-03-18-z80-arithmetic
@@Bob-1802 Not quite. That mnemonic loads the B register with a LITERAL integer value! If such an instruction as was suggested would exist it would be LD B,(xx) which is load B with the contents of memory address xx. An instruction that doesn't exist in the Z80 as the OP wrote.
Tell your students to build a bit adder in No Man's Sky.
It is easier than one might think and will show them the electrics and logic of what is happening.
Minecraft, Factorio and Autonauts are some others that are Turing complete. I hadn't heard about No Man's Sky. Off to re-install now.
@@RetroRogersLab Autoswitches are [AND] gates. Trying to remember the name of the other... Power Inverter? is an [OR] gate?
playing around will inform. The logic of an adder is well documented on YT.
The hard part for me, a hobby electron pusher was building it w/o having everything in one long line because in NMS in 'wiring mode' all connections are visible.
and it gets hard to tell what you are connecting to.
Send him to Ben Eaton's excellent videos where he builds a cpu out of conventional logic gates on a prototype board.
He uses a ready made ALU. The more I think about it, the more I believe in a good ALU. On ARM2 ALU covers half of the chip.
Your use of Seventy-Six implies a decimal number, not hexadecimal - imo "Seven Six" would avoid the confusion.
I say things like seventy-a, aaty-b and ceety-six for 7A, AB and C6 when reading hex.
Check Ben eaters video how the decode is executed
You stil skipped the explanation in this video too. You jumped directly to explain machine code and how to program machine code (assembly) and a mini routine. No! Your student actually asked "how the decoder works", the only answer is simple. IT IS JUST a mini ROM INSIDE the microprocessor like the z80 and any other processor!, THE fetched instruction is placed in its address bus, and the data stored in that mini rom activates multiples lines, one line can be enhable register B plus READ, it will dump the value in its internal data bus, and enable register A plus WRITE, and so on, that will make ld a,b , in one tick clock, if multistage instruction, a counter can count up the stage, the count also will be feed in the decoder mini ROM , AS THE LOWER addresses bits, that counter can be reset to stage zero, so, one of the bits store in the rom must mark when the instruction ends its execution. Now, the next question the student will ask is, how the ADDRESS DECODER of a ROM works. And the answer is , it is just a binary tree of AND / OR gates!. You can teach the basica of addressind decoder to enable the correct byte /word to be outlut from that mini rom
Before whole cpu in one chip, there was CPUs made of PCB cards, with lots of TTL Logic chips, and the decoder was ACTUALLY A ROM IC chip, 256bytes ROM , 1K ROM, 2 roma in parallel, so it outputs up to 16 enhabled signals, etc. registers were STATIC RAm Chips, and there were ADDER / ALU chips, COUNTERS UP, COUNTERS DOWN chips, it is all in the TTL FAMILY HAND BOOK, THE BLUE BOOKS,. Z80, 8080, 6502, MADE all those huge pcb cards all in one chip.
a deep dive in 15 minutes eh....
Your audio is 15 dB down.
Unnecessary white-space is an indication of dreadful programming ability.
Or neatness. It depends. If it's used to make the different sections line up nicely, that can aid troubleshooting later.
Yes! Order and structure is necessary, if only for the eye. Also looks nice when printed out as a bonus
6:20 индусы код писали? Зачем столько лишних операций?
ld a,(num1)
ld b,a
ld a,(num2)
add a,b
halt
Зачем лишние сложения с нулём в начале?
LD A,(num 1)
LD HL,num2
ADD A,(HL)
HALT
😁
Microcode are the steps on the processor that tells the processor how to execute the machine code. I takes the machine code and then switches the right hardware lines to move a number from the data bus to the relevant register. This is why some instructions take several clock cycles. That’s what the “unknown” block in your video is doing. en.wikipedia.org/wiki/Microcode. Intel processors have complex instructions set that can perform many steps in on instructions, whereas RISC processors have simpler reduced instructions and smaller microcode.
RISC does not take steps. 32 bit instructions can drive a lot of control lines with the help of minimal combinatorial logic. Nobody looks up 32 bits in ROM.
amazing video , thank you.
Your simple diagam should have included the actual first code source or the BOOT ROM!!!