He taught me for my first year of compsci Last year. He has been one of the best teachers of my degree thus far. You can tell hes passionate which is a massive plus when it comes to teaching. Hes a lovely person IRL too.
As soon I saw the title of this video, I had to put a like. I've been following the emulation scene since the very early beginnings and it has always stunned me how they could create a software with such precision that could emulate not just the single instructions of a processor, that's fairly easy, but the whole architecture of a physical electronic object, with such precision. I think that, together with crypthography and data compression, the emulation represents the top peak reached by programming ever.
Enjoyed the video. I started an Atari2600 emulator about four years ago and it still consumes a lot my time. There's always something new to think about, things you would never imagine touching when you start. For example, in addition to the CPU the RIOT and the TIA the display itself is an important component. The only displays available at that time were CRTs, so games were written with that in mind. So to be a truly accurate emulation, you need to think about how the TV signal works. The cartridges too are full of mystery. Even when the 2600 was in its heyday, cartridges were more becoming more complex; different bank switching methods, additional RAM, etc. Each of these have their own little quirks that need to be accounted for. And that's before we get into the cartridges of today which can include ARM chips and WiFi connections. It's a fascinating hobby but it's a very deep rabbit hole.
I started making an SMS emulator about 20 years ago. I got almost all of the Z80 emulated before realizing it was going to be more of a task than I really wanted to do by myself. You have my respect.
The fact people still make games for it is so awesome. Someone actually managed to make a fully functional side scrolling port of super Mario bros, which is incredibly impressive. I don't know how they got the 2600 to side scroll, it can't do that natively, right? I remember hearing the NES was a big deal because it had side scrolling built in, the first console to do that, which made it much easier to make those kind of games, and it was all inspired by that Pacman side scrolling platformer, I think it's called Pac Land, which Miyamoto said was the biggest influence for super Mario bros. But there's a reason side scrolling platformers all of a sudden became the predominant genre when the nes came out, when before, the predominant genre was space shoot em ups. The nes just made it so simple to do. So the fact they got it working on a 2600 is insane. Go search for it on TH-cam and you'll see.
The breakdown of an instruction opcode into its component bits can be explained by looking at how the CPU hardware decodes and follows instructions. Ben Eater’s “8-bit computer build” series of videos does a great job of this.
Since he mentioned the Apple M1’s capability to emulate x86 towards the end: Apple even built parts of this emulation into the hardware itself (I believe parts of the memory model stuff as well), in order to make this emulation a lot faster than with pure software and thus make the transition from x86 to ARM much smoother.
Dr. Steve Bagley is a brilliant Computer Scientist. Love the way he explains things simply and elegantly without overloading people at the same time. What a great video on this subject.
2 ปีที่แล้ว +7
I haven't been coding for a while or writing programs but I really enjoyed this conversation Steve. It was brilliant! I take so many parallels to other things like music, understanding the times written, those contemporary theories and the techniques then and now. Any how, thank you again, it fired up my brain and that's what I love about Computerphile's.
Long time ago, when Intel introduced the Pentium II, many Pascal programms crashed because the CPU was "too fast". The "workaround" was to run something in the background to slow the system down! Timing issues are truly nightmares.
That was because Pascal tested the CPU speed by counting in a loop until a timer timed out. If the loop counter overflowed, due to the CPU being too fast, the CPU threw an exception. I suppose if you could hack the exe to bypass the speed test you could avoid the crash, however the program would run at an unknown speed, usually way too fast if it was a game.
This takes me back many years to transferring _Night Lore_ from tape to disc for my BBC micro. The tape was in a weird nonstandard format that used a custom loader that was obfusticated to resist reverse engineering by xoring with bytes in the ROM chips and the timers in the 6522s to make single stepping cause failure. What fun. The final emulator was considerably more interesting than the game, especially the switch from 2 megs to 1 meg when the 6522s were accessed and the innovative use of two unsupported op codes that made weird address modes work on the Y register (though only on the chips that the Beeb used - it would fail on something like an M50747 that was used as a simple CPU). Happy days.
@@karlramberg Quite so, especially the funnies that additional hardware can introduce, that is what always impressed me with MAME. I believe the original Z80 had many that were used by game writers in the wild because they were handy but the 6502's were definitely more mundane. I never had much to do with Z80 until I became an embedded systems engineer, and then I mostly used HD64180 which would throw an illegal instruction trap if such an opcode was executed... if it was illegal that is: that chip vastly extended the Z80 instruction set to include such goodies as an 8x8 multiply. The 64180 still didn't have a clear carry and an ADD though, so you always had to OR A before the first ADC.
A further simplification of the 6502 emulator comes from most opcodes of the form $x2, for example $02 which will simply crash the CPU, hanging it until reset. And all that simplification goes down the drain when having to implement "illegal opcodes" which are instructions that are not documented and were not even intended to be implemented by the creators of the CPU. But the CPU doesn't reject them and many of them actually do more or less useful things. Some are useful to make software a little bit smaller and faster. Others can be used to obscure software, such as copy protections, by making it impossible to disassemble the code with a standard disassembler. One of the details that's making an Amiga emulator hard is a coprocessor named Copper which is crucially important for video output. For purposes of the emulation means there are now two processors, the 680x0 and copper to be emulated and they need to run absolutely in sync. Additional complexity from today's video output working entirely differently at a hardware level and an emulator would be running on top of a giant layer of software of a modern GUI, graphics driver and OS. Doing too much 6502 has strange effects onthe brain. I saw the A9 written on your piece of paper and my brain still immediately said ”:LDA"” like 35 years after learning to program the 6502. Though the 6502 was my first CPU I never got the hang of little endian. My paper notes of numbers are also big endian after all ☺
This is something that has always bugged me - Is it still possible to execute native assembly language on a modern consumer desktop processor, i know it's possible with microcontrollers but what about an intel or amd chip? I assume the modern cross platform OS like linux/windows depends pn standardised efi boot operation to get started, so should the assembly instructions packed as an efi image?
That's why it's hard to emulate a game like Duck Hunt, which used the timing of the election beam in your CRT to know when you pulled the trigger on the light gun.
@@aravindpallippara1577 Oh those modern systems are very backward compatible. You might boot from a CD or DVD image on a USB stick and the CD/DVD image contains the image of a boot floppy. Even though your system has no actualy floppy drive or even just an interface to hookup one. EFI may be the future - but the BIOS is still around. About the only things that ever got removed from the PC BIOS are tape drive support and ROM BASIC. Similarly you still can write applications or parts of applications in assembler. Complex operating systems such as Linux still contain small parts of assembler code. Since the 80s I'm hearing it's not worth to write assembler, the compiler does a better job. Only if I'm completly plastered, undercaffeinated and tired :-) These days compilers have gotten dramatically better so there indeed more rarely is a reason to write assembler. Then there are things such as special registers or instructions which are not available through the compiler and finally very special stuff such as bug workarounds or detailed control over instruction scheduling. Yes, EFI is different from BIOS. More complex but at the same time take the existence of things such as Linux or a whole bunch of other pet OS projects as proof it still is possible to run assembly code. I'm deliberately ignoring stuff like secure boot here.
@@aravindpallippara1577 Most modern desktops use the x86-64/AMD64 instruction set architecture. When you compile a C(++)/Rust program, the compiler will turn your program into x86 assembly, and then run an assembler to turn that assembly into machine language code. Nothing's stopping you from writing an assembly file and passing it to an assembler, or even writing machine code yourself!
I recall doing this around 1986 for a machine tool firm I worked as a college student. Emulated the 6809 as it was the processor used in their tools. That way, code could be tested in the emulator before being burned to a ROM and loaded into the tools. This brings back some memories indeed.
Writing emulators is quite fun. Can definitly recommend. You just need to know a bit about how a CPU works (what is a register, how do bitwise operations work, what are flags,...) and then you literally just take the manual of the CPU and implement every single instruction. My progression is (including interpreters which are similar): Brainfck (took 1 hour) CHIP-8 (took 1-2 days) Space Invaders / Intel 8080 (took about a week) next I want to try to make a gameboy emulator, but now that I have to work I have very little free time/motivation.
I used to be involved in some emulation projects in the past, but I mostly did peripheral stuff (in both senses). I started writing an NES emulator which has a cpu based on the 6502, and I could never get all the instructions to work right. Eg. the tests kept saying my subtract was outputting the wrong borrow flag, but I couldn't find any issue in the logic when compared with the documentation. Despite that, I did manage to get a few very simple games to work somewhat! Although they weren't playable because I never implemented the graphics sprites in the PPU.
Emulating a simple system like the Chip-8/Superchip or emulating a more complex system in a simplified manner like Gameboy/NES is a fun passtime if you're not being worked to death as a coder already.
The internals of the 4-switch Atari VCS/2600 look much more neatly organized and more "modern" than its original 6-switch variety, with the later Atari 2600 Jr (fully compatible) going a few steps further. Each was a year or two apart. It's interesting how quick some processes changed, such as more organized component layout and smaller traces.
Been a while but I thought they all had six switches but moved difficulty switches to the back? Was it power, B&W/color, select, reset, difficulty A and difficulty B?
@@mikechappell4156 The 4 switch versions had the difficulty switches moved from silver toggles on the front to black switches on the back. It also moved the channel switch from the bottom (which was inset so you had to use a screwdriver) to the back. The "six switch" versus "four switch" nomenclature strictly refers to how many silver switches were on the front.
For the timing stuff I think a simple solution for a quick & dirty implementation would be to just create a configuration file that declares the delay between sending a message for X instruction and continuing execution of the emulated CPU/s, then users of the emulator can tune things for their game and submit the timings they found worked best, finally as the submissions come in the default 0s that were originally used can be replaced with the average value for each - which is the most likely timing of how long each processor took. This in turn tells the developer that maybe they missed something or did the target processor being emulated more efficiently than the actual processor. Since each processor is expected to last the lifetime of emulation one can also use some global variables for io to reduce delays that should not be there
I chuckled at his remark about hating little-endian processors. I was team big-endian for a long time. I honestly could not think of a single good reason to represent values "backwards", but as I dove into the lower levels of computer science, I was slowly convinced that little-endian was superior.
Big Endian is nicer to the programmer, Little Endian is nicer to the hardware designer. Thankfully, we're mostly abstracted from the difference these days.
That realisation you've spent time looking at an atari schematic, specifically the TIA chip pins/connections, and never realised what TIA actually stood for...
Having spent quite a bit of time modifying WinUAE (an Amiga emulator), I'm amazed emulation works as well and as fast as it does on modern machines. Just for the heck of it, I added a 68030-style cache to the 68010, with some impressive results. I also tried adding FPU support, but quickly found out AmigaOS blows up when it finds FPU support but not 68020 addressing support. Fun stuff. I don't think it's necessary to learn assembly to be a good programmer, but everybody should at least understand what the hardware is actually doing.
An excellent introduction! But it opens up so many questions. For example, does an emulator need to be multi-threaded to emulate all the various chips and their timings? And how do you throttle the speed on a modern cpu to match the speed of the chip you're emulating? Or do you? Unless I dreamt it, I saw a presentation from somebody who emulated a C-64 in JavaScript (yes, with SID chip support, VIC-II display semantics and everything.). How would you even dream of getting the timings right?
I loved this video. As if somebody was watching me when I was attempting to write 6502 emulator in 2017. :) I failed to implement proper timings. I vaguely remember there is even some sort of primitive pipelining happening - handful of instructions are sort of executed in parallel. I had nobody around me to explain how the ticks are happening. I don't know if I remember this correctly but it actually has two clock sources? I wanted to implement a tick in software (timer interrupt) that would emulate a tick of an actual processor. Sadly, as I said, I failed to implement it. It's still in my TODO list, maybe I'll revisit some day.
The 6502 has a single clock source but for electrical design reasons it is split up. You can treat it as a single clock source. If I understood you correctly. The "parallel execution" I believe is that a handful of instructions overlap the end of the previous instruction with fetching the next one. It's not smart out-of-order execution like modern CPUs have - all the timing is hand-designed and fully predictable.
Some years ago I made an emulator in C for the Z80 CPU which is probably comparable (in complexity) to the 6502. And exactly like Steve says it's simple enough to start out with. Memory is just a malloc'ed chunk of bytes and the CPU registers and flags are just a struct with a couple of members. I had it do some fun things but I never got around to build any I/O. Got a bit lost in the many possibilities and, ultimately, lost interest in favour of other projects. I may revisit the project at some point.
You guys ever thought about doing a video on SGI MIPS architecture? I've always been facinated by these machines and how they work, and with so little info still avalible about them, it would be great to get your insights into how these things work on a lower level.
Regarding Apples Rosetta 2: the cores in the M-Series SOCs have a flag that puts them in TSO-mode (total store order) like x86 CPUs. Also, they implemented a way to generate two legacy flags and a mode that emulates x86 floating point behavior.
The silver lining about machines with unpredictable instruction timing is it was also unpredictable for the game developers, so the generally didn't rely on it.
@@fake12396 maybe but it's usually rare enough to throw in a hack, like pretend some time passed when accessing the disk. IIRC there was a game that would freeze if the disk was too fast. It's not an all-pervasive thing like the timing on 8/16-bit consoles.
Is it also a valid approach to brute force the emulation by going down one level and doing the all the logic gates/wiring in software? This way you don't rely on the intended function, but the actual effects of the logic flow. An extreme version would be physical simulation, which exists for arbitrary electronic circuitry with actual voltage/current levels simulated, but I would expect a proper digital circuitry being able to be accurately simulated on the high binary logic level.
Right channel audio drops severely @3:28 making the audio incredibly unbalanced and borderline unlistenable on headphones and then returns to normal @5:21.
Someone is building an High Level Emulator for iOS. and that will bring back some of my childhood games. I read their blogpost about how they are essentially rewriting functions to rebuild the many libraries that Apple provided at a time, but making it run in Rust and on modern Operating systems. I can't wait for them to get Phone OS 3 running, which will allow me to continue playing Mirror's Edge mobile and speedrun it.
Video output programming was a bit easier on the Nintendo Entertainment system. It has a picture processing unit (PPU). An interrupt occurs at the beginning of the vertical blanking interval. Then the game program writes to the memory regions that control what appears on the screen for the next frame. Then the PPU starts outputting the next frame when the vertical blank is over. The game program just had to make sure all of the video memory writing is done during the vertical blank, whatever number of CPU cycles that is.
Some time ago I saw a list of architectures that a program supported, and I saw one called "mipsel". I thought, "huh, that's interesting. MIPS I know, but what's 'mipsel'?" And then I saw it's a computer architecture dad joke: MIPS Little-Endian.
aaaa why does everyone get the program counter wrong it's not 16 bits long, its' 2 8 bit values, they have functionally separate circuits and it's easier to implement them in an emulator this way
I'm guessing as a general rule the emulator must simulate the cpu and hardware to normal or increased clock cycles accurately as per to be finished before rendered on the host system.
6:59 nope, the C64 didn't have a 6502, it had a 6510. IIRC it ran at something like 985 kHz. crawling along by today's standards. BTW, did you know that tia is the spanish word for aunt?
Technically, the ST was a different Atari, Atari Corp. Though I'm actually not sure that Atari themselves actually produced a lot of software for the ST? Though of course they did include Jaguar software. Which is from Atari Corp. So gawd knows
If you mean the 6502, it didn't do port-mapped IO at all. You had to hardware-map all ports to memory locations. The Z80, however, did port IO so it is mentioned in the Z80 manual and often wired that way in hardware.
No, little endian is the way to go, sure it's a little annoying to look at when debugging the memory but that's a minor inconvenience in exchange for that extra bit of speed, no matter how small the extra bit is
Endianess doesn't matter. I've been working with both systems on multiple hardwares and there are no real advantages or disadvantages of one over the other.
@@ivanskyttejrgensen7464 I've always found I program better with little endian and my bignum projects always end up faster when I use little endian over big endian for the arrays, it's simply the time spent getting the last index vs just starting with 0 every time
For variable-size byte arrays that’s entirely a matter of how you’re implementing the software for it. At the hardware level all words (longs, shorts, etc) are the same width and endianness is simply a matter of mapping address lines, so the choice doesn’t impact the speed of access. What matters more is whether your accesses are aligned to the memory’s word size.
@@trevinbeattie4888 I went as far working with just pointers and minimal information at the bottom level of my code, for example with the addition/subtraction functions which absolutely needed to use bit X to Y instead of bit 0 to N which was given in the top level functions (since they don't take a starting bit parameter, just a bit count), to get the needed last byte to start big endian I ended up having to use extra math. After to many if statements etc I decided to just abandon endian compatibility of the native hardware and just do little endian (which my system was using anyways so made it more convenient to compare results in bulk tests). About the only support I gave after that was just a function to convert between endians
Little-endian makes sense for something like the 6502. When you need to read a byte-aligned 16-bit immediate address, then add the index register to it, it makes sense to read and add the low byte first to calculate the carry, then read the high byte. I doubt there's any real noticeable speed increase in a modern CPU.
With the timing, can't you just use the known documented clock speeds, memory data rates, and cycles per clock of the systems used to calculate it easily?
I don't know about the atari, but the 6502 can address 16 bit memory addresses. 128 bytes is way too little, that's half a page. Of course some of the address space is used for hardware addressing, some of it is used for ROM, probably the upper half of the memory since that's where the reset vectors are, which are probably on the cartridge. Page 1 and 2 are special anyway and need to be R/W, but you could use them of course. Even if you set aside half your address space for hardware, and half of what remains for the ROM, you'd still have 0x0000 to 0x3FFF free for general purpose RAM. I just can't imagine why they'd take a general purpose processor with the capability of a 16 bit address bus and limit it to half a page of memory. that's ridiculous.
The RAM was a cost reduction measure. Memory was STUPIDLY expensive in the late 1970s and Atari wanted to produce a console that was affordable to the average home. Given that the system was only supposed to run a handful of cartridges originally, 128 bytes was enough. Of course, the console became incredibly popular and the programmers figured out how to get the Atari to run games it wasn't designed to handle. I've had a few people suggest that a C compiler could be created for the Atari 2600. I always manage to disabuse them of the notion once I explain how little memory we're working with. It was not uncommon to restrict the stack to 4 or 5 positions just to have enough memory for the rest of the game! Not a chance you're fitting stack-based function calls in there.
There are "emulators" that emulate a machine that doesn't physically exist. Chip-8 and PICO-8 are virtual game consoles that only exist in emulation. You can also argue that things like the JVM is a kind of emulator for an instruction set without a hardware implementation.
@@angeldude101 I thought there was some actual hardware that implemented the JVM at some point. Probably from Sun Microsystems, given they were the ones behind Java in the first place.
DOSBox not only emulates the hardware of a 486-era PC, but it also emulates the BIOS and DOS (and by default tells DOS programs it's MS-DOS 5.0), so there's some software emulation in action. A more straightforward PC emulator would just do the hardware and require you to get a BIOS ROM dump and install MS-DOS for the software side of things. WINE is not an emulator (which is in fact what the name stands for), but it does do a similar task in that it translates Windows system calls into UNIX equivalents, allowing you to run Windows programs on x86 Linux or MacOS X, while leaving the rest of the program running natively.
@@Roxor128 so what makes wine that different from a emulator? I'm under the impression that they don't call it an emulator because people expect an emulator to well emulate a cpu archtecture (which it doesn't). Couldn't you say wine 'emulates' the windows API? It appearently didn't do much with windows syscalls untill the recent syscall user dispatch (if i recall the name correctly)
@@tuxecure You've pretty-much got it. It's not an emulator because it doesn't mimic any hardware. It assumes that you're using the same hardware for Linux as you would for Windows, so there's no need for hardware mimicry, just translating the API. You won't be running x86 Windows programs on a Raspberry Pi with it, for instance. You could do it with DOSBox and a suitable Windows installation inside it, though.
Personally I prefer HLE when possible. This makes it easier to hack in higher frame rates which is very important to me. Much more than unseen cycle accuracy.
It's damn cool when you stop to really think about it, it's essentially a living, breathing schematic of the original device, all the more insane when you think about the fact that a lot of the chips that people have somehow managed to emulate are actually closely guarded secrets
simulate a pure pre-calc rom im-memory compute machine (every calculation one/two parameter alu math and any code logic operation stored in rom, super simple total system complexity, just one core, then multiply those cores)
try RLE compression, but make it a dictionary, and have the first bit be switch to repeat/step-over modes, then n,1-15 bits per dictionary (256 entries at max or optimum), to indicate the step or rle repeat for that dictionary, creating a sparse representation of the file to be compressed, and you can also sum up all the counts of the dictionary numbers, to act as semi-crc check
no interrupts, just linear execution for the pre-rom in-memory single core computer, just state bits (all in same memory) that can be read by the main program, optionally
because its only rom-ram minimum transistors active per clock, you can run the pre-computer at very high clock frequency, like optical fiber dac speeds, 1PHz
the single core at 1PHz is same as 1M x 1GHz cores, but power usage as a simple linear memory controller is very low and super simple, not compicated, 16-32 bit floats/integers in the alu, and memory jumps, if statements, control instructions, are also just an alu rom-op(s), also all ops take exactly same time every time, same number of cycles, only external memory modifications, like keyboard or pci-e compat layers will have different timings
This dude taught my CS fundamentals in my first year of uni at Nottingham, great teacher
We’re all very excited for you and hope didn’t miss a stroke when you typed the comment.
Very lucky!
@@custardtart1312 who hurt you?
Teachers like this make diference
He taught me for my first year of compsci Last year. He has been one of the best teachers of my degree thus far. You can tell hes passionate which is a massive plus when it comes to teaching. Hes a lovely person IRL too.
If an emulator ran a game and Nintendo wasn't around to hear it, would it make a sound?
Brillant comment
Maybe the emulator wouldn't, but Nintendo would!
@@zlac on point!
Yes. 🙄
🤌🔥🔥🔥
I developed an interest in emulators as a teenager. The idea of playing old PS1 games for free on my PC back in the mid-2000s was mind blowing
As soon I saw the title of this video, I had to put a like. I've been following the emulation scene since the very early beginnings and it has always stunned me how they could create a software with such precision that could emulate not just the single instructions of a processor, that's fairly easy, but the whole architecture of a physical electronic object, with such precision. I think that, together with crypthography and data compression, the emulation represents the top peak reached by programming ever.
Enjoyed the video. I started an Atari2600 emulator about four years ago and it still consumes a lot my time. There's always something new to think about, things you would never imagine touching when you start.
For example, in addition to the CPU the RIOT and the TIA the display itself is an important component. The only displays available at that time were CRTs, so games were written with that in mind. So to be a truly accurate emulation, you need to think about how the TV signal works.
The cartridges too are full of mystery. Even when the 2600 was in its heyday, cartridges were more becoming more complex; different bank switching methods, additional RAM, etc. Each of these have their own little quirks that need to be accounted for. And that's before we get into the cartridges of today which can include ARM chips and WiFi connections.
It's a fascinating hobby but it's a very deep rabbit hole.
I started making an SMS emulator about 20 years ago. I got almost all of the Z80 emulated before realizing it was going to be more of a task than I really wanted to do by myself. You have my respect.
Nice. Do you have any development logs around? I wrote an ATARI2600 emulator myself but apart from a handful of games it never worked correctly.
The fact people still make games for it is so awesome. Someone actually managed to make a fully functional side scrolling port of super Mario bros, which is incredibly impressive. I don't know how they got the 2600 to side scroll, it can't do that natively, right? I remember hearing the NES was a big deal because it had side scrolling built in, the first console to do that, which made it much easier to make those kind of games, and it was all inspired by that Pacman side scrolling platformer, I think it's called Pac Land, which Miyamoto said was the biggest influence for super Mario bros. But there's a reason side scrolling platformers all of a sudden became the predominant genre when the nes came out, when before, the predominant genre was space shoot em ups. The nes just made it so simple to do.
So the fact they got it working on a 2600 is insane. Go search for it on TH-cam and you'll see.
The breakdown of an instruction opcode into its component bits can be explained by looking at how the CPU hardware decodes and follows instructions. Ben Eater’s “8-bit computer build” series of videos does a great job of this.
Glad to see Ben videos mentioned! His videos are awesome for prospective computer engineers and scientists.
When he said 6502, Ben was the first person that came to mind.
Since he mentioned the Apple M1’s capability to emulate x86 towards the end: Apple even built parts of this emulation into the hardware itself (I believe parts of the memory model stuff as well), in order to make this emulation a lot faster than with pure software and thus make the transition from x86 to ARM much smoother.
Dr. Steve Bagley is a brilliant Computer Scientist. Love the way he explains things simply and elegantly without overloading people at the same time. What a great video on this subject.
I haven't been coding for a while or writing programs but I really enjoyed this conversation Steve. It was brilliant! I take so many parallels to other things like music, understanding the times written, those contemporary theories and the techniques then and now. Any how, thank you again, it fired up my brain and that's what I love about Computerphile's.
I can't get enough of the content from this channel. GREAT stuff!
Long time ago, when Intel introduced the Pentium II, many Pascal programms crashed because the CPU was "too fast". The "workaround" was to run something in the background to slow the system down! Timing issues are truly nightmares.
@@RangieNZ Thanks mate!
That was because Pascal tested the CPU speed by counting in a loop until a timer timed out. If the loop counter overflowed, due to the CPU being too fast, the CPU threw an exception. I suppose if you could hack the exe to bypass the speed test you could avoid the crash, however the program would run at an unknown speed, usually way too fast if it was a game.
@@Mark_Bridges Interesting. I wish I knew this back then (some 25 years ago)!
@@jaffarbh I learned this way too late too. However, unless you know how to bypass the speed test, it doesn't really help you.
But if timers were available , why was there a need to check the CPU speed?
This takes me back many years to transferring _Night Lore_ from tape to disc for my BBC micro. The tape was in a weird nonstandard format that used a custom loader that was obfusticated to resist reverse engineering by xoring with bytes in the ROM chips and the timers in the 6522s to make single stepping cause failure. What fun. The final emulator was considerably more interesting than the game, especially the switch from 2 megs to 1 meg when the 6522s were accessed and the innovative use of two unsupported op codes that made weird address modes work on the Y register (though only on the chips that the Beeb used - it would fail on something like an M50747 that was used as a simple CPU). Happy days.
Emulators get really tricky to implement when software exploits quirks, bugs and edge cases in original hardware.
@@karlramberg Quite so, especially the funnies that additional hardware can introduce, that is what always impressed me with MAME. I believe the original Z80 had many that were used by game writers in the wild because they were handy but the 6502's were definitely more mundane. I never had much to do with Z80 until I became an embedded systems engineer, and then I mostly used HD64180 which would throw an illegal instruction trap if such an opcode was executed... if it was illegal that is: that chip vastly extended the Z80 instruction set to include such goodies as an 8x8 multiply. The 64180 still didn't have a clear carry and an ADD though, so you always had to OR A before the first ADC.
A further simplification of the 6502 emulator comes from most opcodes of the form $x2, for example $02 which will simply crash the CPU, hanging it until reset. And all that simplification goes down the drain when having to implement "illegal opcodes" which are instructions that are not documented and were not even intended to be implemented by the creators of the CPU. But the CPU doesn't reject them and many of them actually do more or less useful things. Some are useful to make software a little bit smaller and faster. Others can be used to obscure software, such as copy protections, by making it impossible to disassemble the code with a standard disassembler.
One of the details that's making an Amiga emulator hard is a coprocessor named Copper which is crucially important for video output. For purposes of the emulation means there are now two processors, the 680x0 and copper to be emulated and they need to run absolutely in sync.
Additional complexity from today's video output working entirely differently at a hardware level and an emulator would be running on top of a giant layer of software of a modern GUI, graphics driver and OS.
Doing too much 6502 has strange effects onthe brain. I saw the A9 written on your piece of paper and my brain still immediately said ”:LDA"” like 35 years after learning to program the 6502.
Though the 6502 was my first CPU I never got the hang of little endian. My paper notes of numbers are also big endian after all ☺
This is something that has always bugged me - Is it still possible to execute native assembly language on a modern consumer desktop processor, i know it's possible with microcontrollers but what about an intel or amd chip?
I assume the modern cross platform OS like linux/windows depends pn standardised efi boot operation to get started, so should the assembly instructions packed as an efi image?
That's why it's hard to emulate a game like Duck Hunt, which used the timing of the election beam in your CRT to know when you pulled the trigger on the light gun.
@@aravindpallippara1577 Oh those modern systems are very backward compatible. You might boot from a CD or DVD image on a USB stick and the CD/DVD image contains the image of a boot floppy. Even though your system has no actualy floppy drive or even just an interface to hookup one. EFI may be the future - but the BIOS is still around. About the only things that ever got removed from the PC BIOS are tape drive support and ROM BASIC. Similarly you still can write applications or parts of applications in assembler. Complex operating systems such as Linux still contain small parts of assembler code. Since the 80s I'm hearing it's not worth to write assembler, the compiler does a better job. Only if I'm completly plastered, undercaffeinated and tired :-) These days compilers have gotten dramatically better so there indeed more rarely is a reason to write assembler. Then there are things such as special registers or instructions which are not available through the compiler and finally very special stuff such as bug workarounds or detailed control over instruction scheduling.
Yes, EFI is different from BIOS. More complex but at the same time take the existence of things such as Linux or a whole bunch of other pet OS projects as proof it still is possible to run assembly code.
I'm deliberately ignoring stuff like secure boot here.
@@aravindpallippara1577 Most modern desktops use the x86-64/AMD64 instruction set architecture. When you compile a C(++)/Rust program, the compiler will turn your program into x86 assembly, and then run an assembler to turn that assembly into machine language code. Nothing's stopping you from writing an assembly file and passing it to an assembler, or even writing machine code yourself!
@@aravindpallippara1577 kind of. You can run software written in assembly... but that's often interpreted by microcode etc
Timing, as they say, is everything...
Not sure why, but I could listen to this guy for hours.
I recall doing this around 1986 for a machine tool firm I worked as a college student. Emulated the 6809 as it was the processor used in their tools.
That way, code could be tested in the emulator before being burned to a ROM and loaded into the tools.
This brings back some memories indeed.
It still amazes me that they use the printer paper for illustrations that I used to load into mainframe line printers in the late 80's...
Writing emulators is quite fun. Can definitly recommend. You just need to know a bit about how a CPU works (what is a register, how do bitwise operations work, what are flags,...) and then you literally just take the manual of the CPU and implement every single instruction. My progression is (including interpreters which are similar):
Brainfck (took 1 hour)
CHIP-8 (took 1-2 days)
Space Invaders / Intel 8080 (took about a week)
next I want to try to make a gameboy emulator, but now that I have to work I have very little free time/motivation.
I used to be involved in some emulation projects in the past, but I mostly did peripheral stuff (in both senses). I started writing an NES emulator which has a cpu based on the 6502, and I could never get all the instructions to work right. Eg. the tests kept saying my subtract was outputting the wrong borrow flag, but I couldn't find any issue in the logic when compared with the documentation. Despite that, I did manage to get a few very simple games to work somewhat! Although they weren't playable because I never implemented the graphics sprites in the PPU.
Emulating a simple system like the Chip-8/Superchip or emulating a more complex system in a simplified manner like Gameboy/NES is a fun passtime if you're not being worked to death as a coder already.
When "buttons" are one of the largest components on the board. Love it!
The internals of the 4-switch Atari VCS/2600 look much more neatly organized and more "modern" than its original 6-switch variety, with the later Atari 2600 Jr (fully compatible) going a few steps further. Each was a year or two apart. It's interesting how quick some processes changed, such as more organized component layout and smaller traces.
Been a while but I thought they all had six switches but moved difficulty switches to the back? Was it power, B&W/color, select, reset, difficulty A and difficulty B?
@@mikechappell4156 The 4 switch versions had the difficulty switches moved from silver toggles on the front to black switches on the back. It also moved the channel switch from the bottom (which was inset so you had to use a screwdriver) to the back.
The "six switch" versus "four switch" nomenclature strictly refers to how many silver switches were on the front.
I love this man, can never be bored when he looks 25, 47 and 57
For the timing stuff I think a simple solution for a quick & dirty implementation would be to just create a configuration file that declares the delay between sending a message for X instruction and continuing execution of the emulated CPU/s, then users of the emulator can tune things for their game and submit the timings they found worked best, finally as the submissions come in the default 0s that were originally used can be replaced with the average value for each - which is the most likely timing of how long each processor took. This in turn tells the developer that maybe they missed something or did the target processor being emulated more efficiently than the actual processor. Since each processor is expected to last the lifetime of emulation one can also use some global variables for io to reduce delays that should not be there
FYI, sound fully panned to the left at 03:29 or so, for some reason.
Audio issue - around 5:15 sound only comes out of the left channel
Already from 3:26 actually.
Nicola Salmoria used MAME as the basis for his laurea (equivalent to a master's degree) thesis in mathematics.
I chuckled at his remark about hating little-endian processors. I was team big-endian for a long time. I honestly could not think of a single good reason to represent values "backwards", but as I dove into the lower levels of computer science, I was slowly convinced that little-endian was superior.
Big Endian is nicer to the programmer, Little Endian is nicer to the hardware designer. Thankfully, we're mostly abstracted from the difference these days.
0:44 What is he saying? “We don’t do tutorials on computer [unintelligible to me]”
That realisation you've spent time looking at an atari schematic, specifically the TIA chip pins/connections, and never realised what TIA actually stood for...
Television Interface Adapter
Would love to see a follow up video about hardware-based emulation (for example, FPGA emulation)
Happy new year! Have fun emulating!
Having spent quite a bit of time modifying WinUAE (an Amiga emulator), I'm amazed emulation works as well and as fast as it does on modern machines. Just for the heck of it, I added a 68030-style cache to the 68010, with some impressive results. I also tried adding FPU support, but quickly found out AmigaOS blows up when it finds FPU support but not 68020 addressing support. Fun stuff.
I don't think it's necessary to learn assembly to be a good programmer, but everybody should at least understand what the hardware is actually doing.
An excellent introduction! But it opens up so many questions. For example, does an emulator need to be multi-threaded to emulate all the various chips and their timings? And how do you throttle the speed on a modern cpu to match the speed of the chip you're emulating? Or do you?
Unless I dreamt it, I saw a presentation from somebody who emulated a C-64 in JavaScript (yes, with SID chip support, VIC-II display semantics and everything.). How would you even dream of getting the timings right?
I only looked at this guy for 1 second, then immediately surrendered to his intelligence
I loved this video. As if somebody was watching me when I was attempting to write 6502 emulator in 2017. :)
I failed to implement proper timings. I vaguely remember there is even some sort of primitive pipelining happening - handful of instructions are sort of executed in parallel.
I had nobody around me to explain how the ticks are happening. I don't know if I remember this correctly but it actually has two clock sources? I wanted to implement a tick in software (timer interrupt) that would emulate a tick of an actual processor. Sadly, as I said, I failed to implement it. It's still in my TODO list, maybe I'll revisit some day.
The 6502 has a single clock source but for electrical design reasons it is split up. You can treat it as a single clock source. If I understood you correctly.
The "parallel execution" I believe is that a handful of instructions overlap the end of the previous instruction with fetching the next one. It's not smart out-of-order execution like modern CPUs have - all the timing is hand-designed and fully predictable.
this brought me back to playing pokemon yellow on a gameboy emulator
Some years ago I made an emulator in C for the Z80 CPU which is probably comparable (in complexity) to the 6502. And exactly like Steve says it's simple enough to start out with. Memory is just a malloc'ed chunk of bytes and the CPU registers and flags are just a struct with a couple of members. I had it do some fun things but I never got around to build any I/O. Got a bit lost in the many possibilities and, ultimately, lost interest in favour of other projects. I may revisit the project at some point.
Coming from learning simple python stuff this still seems pretty damn complicated
With c it s more straight forward. In C you learn more about memory
Those guys that did WinUAE back in the late 90s were geniuses
And Ultra64
You guys ever thought about doing a video on SGI MIPS architecture? I've always been facinated by these machines and how they work, and with so little info still avalible about them, it would be great to get your insights into how these things work on a lower level.
Regarding Apples Rosetta 2: the cores in the M-Series SOCs have a flag that puts them in TSO-mode (total store order) like x86 CPUs. Also, they implemented a way to generate two legacy flags and a mode that emulates x86 floating point behavior.
The silver lining about machines with unpredictable instruction timing is it was also unpredictable for the game developers, so the generally didn't rely on it.
Things get complicated when games unintentionally rely on timings...
@@fake12396 maybe but it's usually rare enough to throw in a hack, like pretend some time passed when accessing the disk. IIRC there was a game that would freeze if the disk was too fast. It's not an all-pervasive thing like the timing on 8/16-bit consoles.
Is it also a valid approach to brute force the emulation by going down one level and doing the all the logic gates/wiring in software? This way you don't rely on the intended function, but the actual effects of the logic flow.
An extreme version would be physical simulation, which exists for arbitrary electronic circuitry with actual voltage/current levels simulated, but I would expect a proper digital circuitry being able to be accurately simulated on the high binary logic level.
I have a vague recollection that Little Endian is faster to retrieve data than Big Endian
Never written one, but always wanted too. Add too my Bucket list
I believe there's an excellent transistor-level emulator for the 6502 out there somewhere... can't seem to remember where though
You may be thinking of Visual 6502.
@@xotmatrix Maybe
This video spoke to me and found me right where I'm at! Thanks!
Right channel audio drops severely @3:28 making the audio incredibly unbalanced and borderline unlistenable on headphones and then returns to normal @5:21.
I love your Channel so much!
Ooo, what happened to the audio balance at 3:28?
Love ic-Berlin frames. Find it strange they're not better known or more popular.
4:41 "So now I know where things are in memory"
Okay, but how did you know those things?
Someone is building an High Level Emulator for iOS. and that will bring back some of my childhood games.
I read their blogpost about how they are essentially rewriting functions to rebuild the many libraries that Apple provided at a time, but making it run in Rust and on modern Operating systems.
I can't wait for them to get Phone OS 3 running, which will allow me to continue playing Mirror's Edge mobile and speedrun it.
Video output programming was a bit easier on the Nintendo Entertainment system. It has a picture processing unit (PPU). An interrupt occurs at the beginning of the vertical blanking interval. Then the game program writes to the memory regions that control what appears on the screen for the next frame. Then the PPU starts outputting the next frame when the vertical blank is over. The game program just had to make sure all of the video memory writing is done during the vertical blank, whatever number of CPU cycles that is.
Some time ago I saw a list of architectures that a program supported, and I saw one called "mipsel". I thought, "huh, that's interesting. MIPS I know, but what's 'mipsel'?"
And then I saw it's a computer architecture dad joke: MIPS Little-Endian.
Was there not a tripod laying around to use? The wobbly cam was quite distracting lol
aaaa why does everyone get the program counter wrong
it's not 16 bits long, its' 2 8 bit values, they have functionally separate circuits and it's easier to implement them in an emulator this way
Please, please, tell me why you always use ole dot matrix printer paper?
This is why my C64 emulator I was writing didn't work right. I didn't implement the other chips that call interrupts and what not.
I'm guessing as a general rule the emulator must simulate the cpu and hardware to normal or increased clock cycles accurately as per to be finished before rendered on the host system.
128 bytes? Not 128k?
"In order to implement the CPU registers, you need to mmmff bbfff and write grrfffbfff to fffbfff, innit?" - now I understand everything, thanks! :D
6:59 nope, the C64 didn't have a 6502, it had a 6510. IIRC it ran at something like 985 kHz. crawling along by today's standards.
BTW, did you know that tia is the spanish word for aunt?
But for all intents and purposes from a programmer's point of view it is just a 6502, its not like it has an incompatible instruction set
Ha, we had the Tia Juana restaurant back home (in northern Wisconsin!), someone told me it meant Aunt Joanne.
Excellent video 🙌
Technically, the ST was a different Atari, Atari Corp.
Though I'm actually not sure that Atari themselves actually produced a lot of software for the ST?
Though of course they did include Jaguar software. Which is from Atari Corp. So gawd knows
I have some questions that I can't find an answer to. Why is there no information about port-mapped I/O?
Why is everything based on memory-mapped I/O?
If you mean the 6502, it didn't do port-mapped IO at all. You had to hardware-map all ports to memory locations. The Z80, however, did port IO so it is mentioned in the Z80 manual and often wired that way in hardware.
here is a question to break yo brain: is a 6502 or z80 cpu a 6502 or z80 emulator?
Excellent interesting video
My boi using dot matrix printer paper for making notes
No, little endian is the way to go, sure it's a little annoying to look at when debugging the memory but that's a minor inconvenience in exchange for that extra bit of speed, no matter how small the extra bit is
Endianess doesn't matter. I've been working with both systems on multiple hardwares and there are no real advantages or disadvantages of one over the other.
@@ivanskyttejrgensen7464 I've always found I program better with little endian and my bignum projects always end up faster when I use little endian over big endian for the arrays, it's simply the time spent getting the last index vs just starting with 0 every time
For variable-size byte arrays that’s entirely a matter of how you’re implementing the software for it. At the hardware level all words (longs, shorts, etc) are the same width and endianness is simply a matter of mapping address lines, so the choice doesn’t impact the speed of access. What matters more is whether your accesses are aligned to the memory’s word size.
@@trevinbeattie4888 I went as far working with just pointers and minimal information at the bottom level of my code, for example with the addition/subtraction functions which absolutely needed to use bit X to Y instead of bit 0 to N which was given in the top level functions (since they don't take a starting bit parameter, just a bit count), to get the needed last byte to start big endian I ended up having to use extra math. After to many if statements etc I decided to just abandon endian compatibility of the native hardware and just do little endian (which my system was using anyways so made it more convenient to compare results in bulk tests).
About the only support I gave after that was just a function to convert between endians
Little-endian makes sense for something like the 6502. When you need to read a byte-aligned 16-bit immediate address, then add the index register to it, it makes sense to read and add the low byte first to calculate the carry, then read the high byte. I doubt there's any real noticeable speed increase in a modern CPU.
Is there a consensus yet on simulation vs emulation?
With the timing, can't you just use the known documented clock speeds, memory data rates, and cycles per clock of the systems used to calculate it easily?
Steve's definition of "fairly easy" is different from mine. That's why he makes the video's and I'm confused.
Some of these professors are like the uncles I wish I had but never got
Fascinating! I need to rewatch this when I'm not trying to follow a crochet pattern 😂
From a self-taught C/PC Bible start in life, this episode is a dream to watch and reminisce. Thanks for putting this together!
so is Steve going to write an emulator, and if so what language is he going to use? C? C++? Rust? Javascript? Lisp? Haskell? COBOL!
Why did you put your channel name in the video title when it's already stated?
*Nintendo's lawyers have entered the chat*
Excellent video, thank you!!!
I don't know about the atari, but the 6502 can address 16 bit memory addresses. 128 bytes is way too little, that's half a page. Of course some of the address space is used for hardware addressing, some of it is used for ROM, probably the upper half of the memory since that's where the reset vectors are, which are probably on the cartridge. Page 1 and 2 are special anyway and need to be R/W, but you could use them of course. Even if you set aside half your address space for hardware, and half of what remains for the ROM, you'd still have 0x0000 to 0x3FFF free for general purpose RAM. I just can't imagine why they'd take a general purpose processor with the capability of a 16 bit address bus and limit it to half a page of memory. that's ridiculous.
The RAM was a cost reduction measure. Memory was STUPIDLY expensive in the late 1970s and Atari wanted to produce a console that was affordable to the average home. Given that the system was only supposed to run a handful of cartridges originally, 128 bytes was enough.
Of course, the console became incredibly popular and the programmers figured out how to get the Atari to run games it wasn't designed to handle.
I've had a few people suggest that a C compiler could be created for the Atari 2600. I always manage to disabuse them of the notion once I explain how little memory we're working with. It was not uncommon to restrict the stack to 4 or 5 positions just to have enough memory for the rest of the game! Not a chance you're fitting stack-based function calls in there.
There will come a day, when older hardware will be accurately emulated by running a spice simulation.
There are several fpga implementations, check mister
Do all emulators pretend to be hardware?
Or can emulators also pretend to be other software/interface?
There are "emulators" that emulate a machine that doesn't physically exist. Chip-8 and PICO-8 are virtual game consoles that only exist in emulation. You can also argue that things like the JVM is a kind of emulator for an instruction set without a hardware implementation.
@@angeldude101 I thought there was some actual hardware that implemented the JVM at some point. Probably from Sun Microsystems, given they were the ones behind Java in the first place.
DOSBox not only emulates the hardware of a 486-era PC, but it also emulates the BIOS and DOS (and by default tells DOS programs it's MS-DOS 5.0), so there's some software emulation in action. A more straightforward PC emulator would just do the hardware and require you to get a BIOS ROM dump and install MS-DOS for the software side of things.
WINE is not an emulator (which is in fact what the name stands for), but it does do a similar task in that it translates Windows system calls into UNIX equivalents, allowing you to run Windows programs on x86 Linux or MacOS X, while leaving the rest of the program running natively.
@@Roxor128 so what makes wine that different from a emulator?
I'm under the impression that they don't call it an emulator because people expect an emulator to well emulate a cpu archtecture (which it doesn't).
Couldn't you say wine 'emulates' the windows API? It appearently didn't do much with windows syscalls untill the recent syscall user dispatch (if i recall the name correctly)
@@tuxecure You've pretty-much got it. It's not an emulator because it doesn't mimic any hardware. It assumes that you're using the same hardware for Linux as you would for Windows, so there's no need for hardware mimicry, just translating the API. You won't be running x86 Windows programs on a Raspberry Pi with it, for instance. You could do it with DOSBox and a suitable Windows installation inside it, though.
Do you play classical guitar, besides teaching CS?
Personally I prefer HLE when possible. This makes it easier to hack in higher frame rates which is very important to me. Much more than unseen cycle accuracy.
Team LittleEndian FTW!
The fact that he explains it by writing on dot matrix printer paper... 😀
Never knew that is what an emulator does. It's representing what the hardware would've been but in software format? Got it. I see....
It's damn cool when you stop to really think about it, it's essentially a living, breathing schematic of the original device, all the more insane when you think about the fact that a lot of the chips that people have somehow managed to emulate are actually closely guarded secrets
Coincidentally just started writing my fourth Apple II emulator. It's a sickness.
I’d love for someone of his background to weigh in on software emulation vs emulation via fpga.
Thanks for the great description, as well as the detail that has to be implemented.
I completely agree that big endian is better overall.
Press ffff to pay respects.
I created a chip8 emulator in C,C++ and nes that works with cpu 6502
Emulated on my Personal Universe, humans lost in australia of all mysterious things.
Emu 1 at I on (at at walker starwars) (long legs can't fly but can run) perhaps walking the dog on a yoyo, the yoyo being the HDD platter just a guess
My left ear found 3:37 to 5:21 fascinating.
Can’t wait to make a ps5 emulator now
Again they draw on that printer paper. Is that some running gag? Or does their lab sit on an enourmous pile of paper from the 80ies?
Just implement the hardware using Verilog or VHDL ;)
simulate a pure pre-calc rom im-memory compute machine (every calculation one/two parameter alu math and any code logic operation stored in rom, super simple total system complexity, just one core, then multiply those cores)
try RLE compression, but make it a dictionary, and have the first bit be switch to repeat/step-over modes, then n,1-15 bits per dictionary (256 entries at max or optimum), to indicate the step or rle repeat for that dictionary, creating a sparse representation of the file to be compressed, and you can also sum up all the counts of the dictionary numbers, to act as semi-crc check
say that is a STEP-RLE-DICT compression
no interrupts, just linear execution for the pre-rom in-memory single core computer, just state bits (all in same memory) that can be read by the main program, optionally
because its only rom-ram minimum transistors active per clock, you can run the pre-computer at very high clock frequency, like optical fiber dac speeds, 1PHz
the single core at 1PHz is same as 1M x 1GHz cores, but power usage as a simple linear memory controller is very low and super simple, not compicated, 16-32 bit floats/integers in the alu, and memory jumps, if statements, control instructions, are also just an alu rom-op(s), also all ops take exactly same time every time, same number of cycles, only external memory modifications, like keyboard or pci-e compat layers will have different timings