I worked at Apple in the late 80s and most of the 90s, and I do know that the "case design" tasks that the Cray was used for were actually pretty cool and rather advanced. Whatever else the Cray might have been used for, I know they had a tool that modeled the flow of molten plastic into the *VERY* complex mold tooling that Apple was fond of creating for its distinctive enclosures. This modeling allowed virtual test runs of a tool design before it was made, which was a lengthy and complex process that cost huge amounts of money. At that time, Apple's designs favored case plastics that used minimal screws and other hardware fasteners (usually just a few, and quite often, none at all). The Mac LC, Mac IIcx/ci, and Mac Portable were all machines that I recall had zero screws. The machines could be built faster and more automatically with no fasteners, and were very easy to service as well. Also, doing a field teardown and rebuild of a Mac LC in about 30 seconds made for a pretty cool customer demo. You can debate the value of these kinds of designs, but given that they were going to be used, having a way to ensure that the mold tooling would actually work before making it was a big benefit to the company.
Impressive! Industrial design is Apple's secret sauce few understand. When it was first revealed at the West Coast Computer Show, the Apple II wasn't the most impressive piece of hardware among the exhibitors. Yet, Apple's booth was mobbed by the press and attendees. Most computer historians credited its successful launch to its color capabilities, but they dismissed the real reason even though it was right in front of them. Unlike the sea of intimidating, ugly sheetmetal boxes with flashing red LEDs at the show, the Apple II was a beauty to behold. It made it approachable and captured the minds imagination. It's amazing Apple continued to place a very high value on aesthetics well after Jobs tenure as demonstrated by the Cray's purchase. Meanwhile, its competitors continued to churn out the same dull, beige, sheetmetal boxes.
Yes, Apple really led the pack in industrial design, even back then. Their enclosures were the nicest-looking and feeling, and easiest to open of any major manufacturer. Apple spent a lot of money on plastics design, which contributed to its status as the "aspirational brand," even through the turbulent 90s. It was fun to see and demo those products when they were introduced. Everyone wanted a peek. Fun fact: the texture finish was applied to the mold tooling only after the designs were frozen, right before production. Up to that point, all samples and test articles came in smooth, untextured plastics. When we showed preproduction demo samples to customers under NDA, it was usually a unit in those smooth-finish plastics. The texture was applied by etching the mold tool in an acid bath, so it was a one-way, non-reversible process. If a mold tool had to be redesigned because of a product design change, it could impart as much as a six-month delay on the product introduction, because making the tooling had such a long lead time. A change like this would have cost many millions of dollars in tooling changes alone. After I left Apple, they started building finished products using smooth, glossy-finish, untextured plastics. This is now much more common than textured plastics, but at the time was unheard-of. Products finished this way require lots of special protective packaging and handling to ensure that the finish is not marred before the customer receives it. Everyone makes electronics that way now, but Apple was the leader. I'm pretty sure the iPod was the first product Apple sold in smooth plastics.
Thanks for sharing! When I got to that part of the video I figured they must have been using the Cray for case air-flow / thermal simulations or something...
Should have included a few words on Apple's late 1980s project Mobius. The ARM based prototype that could emulate both the Apple IIe and Macintosh at faster than the native speeds, and went on to power the Newton / eMate line, for 36+ hours of continuous use, on just 4 AA batteries.
I have worked at a Hugh Martin company! He's a terrific executive; honest, transparent, and working in the best interest of the company and its employees, not just the bottom line.
Compiler engineer here. I have literally never once heard someone claim that it's hard to write a compiler for a stack-based machine. Not only is it in fact easier than writing a compiler for a register-based system, but there are a number of production compilers for stack-based systems - the Java Virtual Machine, for instance, is entirely stack-based, and consequently javac is a stack-based compiler. Something else had to have been going on there, because this story simply does not hold water.
It not about writing a basic complier like the for the JVM. They needed a compiler that would be performant on thier hardware. This is where it get difficult, the compiler has to schedule all the transfers between ram and the stack. The compiler must get the timing of the right so what is needed on the stack is there when it is needed. Doing that for compiling just one application is tricky. Now imagine doing it when it's for a task switching OS, how does the compiler do that in a optimal way.
Compilers targeting stack based VMs can be nice and efficient as they can ignore the bandwidth limits between the stack and ram. As the native CPU does its best to optimise that with cache pre fetch, branch predication. If it's for a HW stack processor everything is the compilers problem.
Do you think the difference in compiler complexity might be between a conceptually infinite stack growing in general RAM with operations which can source from arbitrary offsets on the stack to a system, and a small on-chip stack for which operations (possibly) have fixed offsets that they source from? Then there'd have to be a general program stack to track recursive procedure calls per thread as well as this hot on-chip stack that data gets pushed onto and popped out of from and to external memory.
"At some point I really should do a video about [the Transputer]" YES, PLEASE DO! I LOVED working with them in the early nineties, they were so advanced that even now we struggle with concepts they had nailed 35 years ago. We had a 32-node rig back then (plus some smaller ones), and used it for research in advanced robot control, like controlling flexible robots. As well as teaching PROPER software development.
I'm literally creating a clone of it in a FPGA now because everyone seems to be doing traditional 8bit computers on youtube. I though I might try something different. I love that thing and I think now its the time for such architecture to give fruits. Specially with all systolic array things that's on GPUs nowadays.
@@johndododoe1411 Certainly, that was one of the proposed use cases when the Transputer was created. They had started with some general purpose cores, but were planning specialised ones, including GPU's.
@@johndododoe1411 No. The idea is similar to modern GPUs with programmable pipelines (especially the compute shaders), but they have access to far more RAM with a far wider bus and are much more closely integrated. Transputer is closer in funtion to a modern DSP anyway.
One note at 17:28 the PowerPC601 is the result of AIM combining the cache controller of the MC88100 with the single chip version of the POWER1 CPU and then doing a lot modifications at Apple's behest including eliminating thirty some instructions and adding almost 60 new instructions. It's very largely an enhancement of the POWER1 architecture with extras but some facets of operation were fundamentally changed like how system calls are performed at the lowest levels. It's gnit picking but I think it all becomes perfect if you say IBM came along with their new RSC (RISC Single Chip) CPU and asked Apple about OS, finally resulting in the specification for the 601. Very entertaining video, but you could cut the music volume by 3/4 to make future production even better.
Amazing, that there really are just a few enduring efficient processor designs. Us apple users are running RISC processors based on 1980s designs, with an OS originating in the 60s.
Not all that amazing really. It comes down to time and money. Software developers and operating system developers simply didn't, and even today don't have the time and resources (cash and technical know-how) to support more than about 3 CPU architectures across 1-3 operating systems. I'm a software developer who owns a company doing LIDAR development and I only support 1 operating system across 2 CPU architectures. Paying the salaries for programmers who are coding for rare CPUs and rare operating systems simply isn't wise or feasible from a business standpoint. When the market settled on the 3 major CPU architectures (ARM, Intel, AMD), and 2 major operating systems (Windows and MacOS) it allowed developers to produce better quality products for these few architectures/operating systems instead of creating a crap-ton of shoddy products for more varied operating systems and architectures. It became a matter of quality over quantity from a software development standpoint. Software testing and validation is also quite expensive so supporting 3 architectures triples the cost over just one. I dropped MacOS development because it made up less than 1% of my sales, yet the cost for MacOS development was 2/3 more (mainly due to Apple's app store 30% fees) than developing the same apps for Windows, where 99% of my revenue originated.
@@rosomak8244 It's my understanding that ARM RISC designs originated from Acorn in the UK in 1983 (Acorn RISC Machine) which joined into ARM holdings in 1990 (Apple being a partner.) The A and M chips in Apple devices are children of this architecture.
Genuinely delighted each time I see a new video from you pop up, you're an excellent presenter and it's always an fun journey! You certainly made the transputer sound fascinating and very deserving of it's own video as well ❤
Very interesting video. Small history quibble: The Apple team built some proto boards with 88k chips and so had some ASICs built around the 88k bus before they canceled that project. When IBM and Apple talked about developing the POWER-derived RISC Single Chip for use in Macs, they convinced IBM to bring Motorola in and graft the RSC core onto the 88k bus to save Apple money on redesigning ASICs and logic boards (to the detriment of IBM's team who already had some designs based around a different bus). Thus, the 88k bus became the 60x bus when the RSC project became PowerPC.
I think I have found the source for the "compilers for stack machines are hard" statement in the section from 03:50. An article in The Chip Letter substack. I think its a misreading of this with my ephasis added: "The first version of Aquarius wasn’t RISC at all, but a stack architecture design. Rather than use a conventional cache, the design *also* placed heavy reliance on software and the compiler to manage the transfer of memory into and out of the processor when needed. When the compiler team indicated that building the software to do this was impossible, the design was changed to a more conventional RISC design." So it used a user-managed on-chip memory like the Tightly Coupled Memory of an ARM microcontroller, the local/shared memory of a GPU in compute mode, or the local memory of a Cell BBE instead of a transparently mapped conventional cache. That was what made it hard to code gen for, not the fact that it was stack-based. I'd add a link but that might delete my comment.
That's basically it, with stack processors of the time, the stack the processor could access was held entirely on chip. Instructions (expect load/store to and from the stack) could not access conventional ram. They only operated on the stack. This leaves the compiler having to schedule all load/store operations to conventional ram, a more or less impossible task for it todo optimally.
Other none pure stack designs muddy that water a bit. The key advantage of a pure hw stack processor is everything you need is in the stack on the chip, there are no slow calls to/from ram. Which is great for things where all the code that will be ran fits in the on chip stack. Where that's not the case these designs have an issue with scheduling stuff off/on the stack. Some improvements on that type of design (in terms of making it more useful for regular computers) is the stack register, which allows for mutplie stacks to exist on chip with the stack register controlling which stack is in use. As the os schedules different processes (or threads) the stack register is updated so it points to the stack related to the scheduled process. That can be combined with load/store operations for the none active stack, so when the task related to that stack is running its not waiting on load/store operations.
There was a joke about Apple brought Cray to design the next Mac, Saymour Cray bought Mac to design the next Cray. Also, some ARM processors supported Jazelle extension, which is basically a hardware version for Java bytecode. It was mandatory to run Java on early phones.
@@RetroBytesUK There were two versions of Jazelle, originally called 1 and 2 but later renamed to DBX (direct bytecode execution) and RCT (runtime compilation target). The latter eventually merged with Thumb and is still available while Jazelle 1 was completely killed off. There wasn't much information about it to start with but ARM has put some effort on making what was there go away. Jazelle 1 could translate the simpler Java bytecodes directly to a corresponding ARM instruction with a little bit of register renaming so a group of 4 (I think) registers cached the top of the stack. For the more complicated Java bytecodes it would jump to a small routine in ARM machine language to interpret. But you replaced the main interpreter loop with hardware
TI’s TMS9900 series CPUs were stack based, and were found in the TI99/4a. They had one register, which was a pointer to the current stack location in memory. The problem with stack machines was that they didn’t scale well with the improving speed of the CPU. RAM didn’t keep pace with the improvements in silicon (which is why cache memory, and different levels of cache were invented).
TMS9900 was not stack based . It was a simplistic design that used a 32 byte memory structure to hold all the architectural registers in what would otherwise have been a neat RISC design . If only they had loaded those registers into a CPU buffer, it could have been much faster . 8087 was a stack processor without all the basic CPU functions that were done by the 8086 CISC processor .
Using faster (local sram) memory for the registers and slower memory (external dram) for the stack memory is more tradition, not a technical hardware decision. Modern ARM processors talk about register files (physical local sram blocks with individual addresses). That local sram block which could very well hold a fast stack instead of registers. Architecturally, you need pipeline and flag registers all over the place in modern designs - so you might as well use the "register" as the base abstraction, and physically as a design block. That's why the stack becomes a second order abstraction, even though it's crucial for all compilers with subroutines, and a great model for general computing. The control logic defines what abstraction you're implementing to make sense of a memory block. The ALU can work with either register concepts or stack concepts for input and output, it's all just latches at the end of a wire. C compilers want fast registers(because that's all of the fast memory they could fit into old hardware), so hardware designers implement local sram registers. That's why. If Forth was more popular than the fortran and c languages, we'd have fast memory mapped as stacks.
@@PaulSpades Going way back, some early CPU's had the registers on mechanical drum memory in the days when even DRAM was still wild sci fi. I think the formal technical definition is that registers are in whatever is the fastest memory you can afford to pay for at the time.
My Dad had to deal with John Sculley in the Pepsi days. His very first impression of the man was a fucking douchebag that doesn't at all care what he is selling or if he even understands what he is selling at all. The poaching of John Sculley was Steve Jobs' worst business decision of all time, and I am not even talking about the later Jobs firing/betrayal. The entire decision was the worst thing for Apple.
He may have been a bastard, but I don't credit the claim that he didn't understand Pepsi. It's sugar water with a colorant and carbon dioxide. What's hard to understand about that?
@@paulie-g The problem was that he didn't care what the business was. Being hired by Jobs (someone who is overly passionate about products and their impact in society) the end result was Skulley just running a computer company like he did a softdrink company. In a PURELY money sense, maybe those differences don't matter, but it does very much matter. Skulley left the company in a state that did not recover (and almost insolvent) until Apple rehired Jobs after the Next acquisition. I don't idolize Jobs and I own no Apple products, but Skulley was always wrong for this company. He did do wonders for Pepsi in his time there, but I just don't see him ever caring about the difference in importance between softdrinks and technology. Jobs famous quip while convincing Skulley to come on board is very popular(Sugar water vs changing the world) was a good pitch, but it doesn't seem like he toook it to heart and helped ruin the company anyway.
@@MicrophonicFool So, long story short, dirtbag Jobs nearly ruined his own company and got himself the boot by foolishly hiring Skully, a bigger douchebag than himself? LOL
It would be interesting to see Apples *"White Magic"* project covered. It was a hardware 3D accelerator that was very close in concept to the Power VR GPU. The Quickdraw 3D Accelerator was the only product that was released from this project, though. The follow-up design might have been available in 1997 if Steve Jobs had not fired every employee who was not directly involved with the Macintosh core development teams when he came back to Apple.
Stack architectures are trivial to write compilers for. The reason you don't see them now is that you can't load a value from memory long before you use it. Memory latency has been the dominant force in processor design since before risc was a thing; if you go back to when memory was fast, you'll find high end mainframes like the burroughs b5000 series (from the early 60s) were stack machines.
So its trival to write a compiler if you dont mind if the resultent code performs very baddly, on a modern hardware design. If you have to write a compiler so things get taken from memory to the stack so they are there ready when needed, it gets alot more complex. Thats what the compiler team where trying to get across, sure they could make a compiler, but it would perform baddly as they couldn't get to the point where it would be good at get things on the stack in an optimal way.
@@RetroBytesUK compilers for stack machines are much simpler and initially performed as well as compiled code for register machines. That is why machines such as Burroughs 5000, HP3000, Transputer, iAPX432, RISC I to III and Sparc (a mix of stack and registers) and so many others took that route. Improvements to compiler technology in the 1980s in the form of new register allocation algorithms changed this dynamic making register machines more desirable since they were simpler to pipeline. I wonder if advanced out-of-order implementations with register renaming wouldn't make stack machines interesting again?
@@Bebtelovimab What makes you think CISC code is smaller/denser? The demand for small code hasn't gone away - the ability to fit code into L1 (especially) and L2 is still king. My theory of the problem with stack machines is that they create unnecessary dependency chains, which makes extracting insn-level paralellism, super-scalar, pipelining and OoO harder - all things that are core to the performance of modern processors. It's why the few modern stack implementations are very small microcontrollers focusing on realtime control.
@@paulie-g Literally textbooks to this day teach CISC instructions as being more dense than RISC. They use Chinese versus English as an example. And this was not that old for a college textbook (2018), so they knew of RISC. Although they mention Itanium's EPIC without mentioning the whole Itanic thing. I had a laugh when reading it before class (not in public). Although they also still cover optical media (i can understand LTO). For reference, this was a basic introductory computer course (in Information Systems, Comp Sci with less low-level programming and more business admin), that dealt with basic concepts.
@@alext3811 OK, so the answer is "someone said so in a textbook", with the textbook being generalist rather than specialist. Gotcha. Density of instruction set =/= density of code. The comparison is not RISC per se but load-store architectures. I bet they hadn't heard of Thumb. There's enough pointers for you to educate yourself if interested.
"The SI unit of failure": Excellent! Another +1 from me for that Transputer video. I once worked next to an office in Notting Dale that had a large poster of a Transputer on the wall, funnily enough above their single original model Mac, back when it was 'the next big thing' and would love to see you cover what happened.
Some thoughts: The loud background music and the way you pronounce words in a rather "slurry" way makes it hard for a non-UK citizen to understand what is beeing said. That beeing said, the content and the enthusiastic way you portrait it makes for an outstanding interesting video! I don't mind a bit of dialect either. This is meant to be a positive and constructive critique. Keep up the good work!
Commenting before watching further- my ears legitimately were ringing after that loud white noise at 3:23. Please duck it next time. (Otherwise a huge fan of your videos)
You claimed that it would be difficult to write a compiler for a stack-based machine, but actually it is the reverse: it is much easier to write a compiler for those machines. That is also the reason why a JVM is stack-based. It is a machine designed by compiler writers instead of hardware designers. In a stack-based machine the compiler does not have to worry about efficiently using the available processor registers, which can be quite difficult especially on machines where there are only a few of them and they have specialized purposes. The reason why stack-based machines are not popular is because they normally are slower.
Indeed. I was also a bit surprised by that statement. Virtual Machines (in the language area, not the whole computer like vmware) are a intermediate step during compilation of a program. It makes it (comparably) easy to write the compiler from the language to the intermediate code. And the second compile phase (just in time) is when the code for the stack machine is translated into architecture specific code. But you are right. Real world applications would run incredible slow with a stack based machine because it would require lots of (slow) memory access. And it would probably also not work well with pipelines (where the processor executes several instructions at the same time by slicing them into microcode operations like fetch, load, execute and store.)
I probably should have been more clear, its hard to write a performant complier for HW based stack designes, as working out when/what to get into the stack is hard, so its there ready in the stack when needed. Making it the compliers problem rather than the cpus (prefetch, branch prediciton etc) hugely increases the complexity for the complier writter. For solutions like the JVM you can basically ingore that stuff as the native cpu will do all the prefetch etc. They where targeting performance, so when the compiler team told them it was too complex to make a perfomant compiler they moved on. Ocam and the transputer cleaverly avoided the big performance issue, by using message passing (which everything was optimised for), where other system used shared memory, this avoided using the technique that was a big source of issues with parallel computing and also the preformace issues of having the complier dealing with planning all transfers from ram to stack.
@@RetroBytesUK In a stack-based machine, all data lives in the stack. It either is at the top-of-stack or just below it, for temporary data used in expression evaluation, or in stack frames below that for local variables of the functions. That is why stack architecture fits compilers for structured language so well: there is a direct mapping between variables in the language and locations in the stack, without the detour via processor registers. This has other advantages as well: context switching becomes much simpler. To switch thread, you just reload the stack pointer and program counter. No need to save other registers because there aren't any. Process context switching of course is more complex as memory management is also involved. But there are performance problems because everything is in memory. However, an optimized processor would have cache to hold the most frequently accessed stack locations, at least the words at the top of the stack and a dynamically assigned cache for the locations beyond that. It is not the task of the compiler writer to manage that. Maybe at Apple they decided that (at that time) making the processor clever enough to manage the cache was too difficult, and that task would be passed on to the compiler. Yes, that would make it complex. But it also is a bad design decision to begin with.
Great video! And, yes, I'd love to see one on the Transputer. In the late 90's, I wrote code for several network analyzer platforms, which all used transputer-based coprocessor cards in portable PCs (along with some FPGA-type programmable hardware).
Burroughs used a stack machine architecture in their mainframes and minicomputers. They were primarily aimed at COBOL applications, but there was also a C compiler which made porting applications from Unix systems possible. Possible, but by no means easy. BTW, the Cray computer at Manchester University in the '70s was purple too.
Burroughs is a more complicated story because it was specifically hardware and software co-designed (something we desperately need in the modern era) for (what was then) high level languages. The system-level language was ALGOL (60 probably) with a really fast one pass compiler because they departed from the standard by requiring definition before use enabling one pass compilation, which was a huge deal in the days of punch cards. Incidentally, none of those HLLs would be implemented with a stack machine today. A good example of a modern attempt at something similar is the Mill, at least conceptually.
Damn, I love hearing Django Reinhardt during a video like this :) (gotta admit it was pretty distracting though, my brain just goes dancing when Minor Swing kickes in xD)
1:56. The PPC was a three way developed effort between Motorola, Apple, and IBM. ....so it's not exactly fair to say they opted for the PPC over their own design, as they had their fingers in the PPC design.
There was an Apple connection to ARM as well, since it was Apple and Acorn together who worked to spin off ARM from Acorn. Apple sold of ARM stock for years after Jobs returned and didn’t start on their own ARM designs until well after they had divested their shares in ARM.
The MMX did have its own little jingle on top of the Intel one though. So that's a thing.... Fascinating story, thanks RetroBytes! Always lovely seeing a Cray.
In all fairness, the AMD jaguar architecture found its way into the PS4 and Xbox One. Although low power semi-custom chips probably aren't the biggest money maker, getting that much coverage in the console market was a bright spot on their balance sheet before Zen came around. So we somehow got one non failing jaguar
it's actually fascinating how often in history something is invented or almost invented by someone years before it's re-invented by someone else and makes it in to mainstream much later on in time.
Great video! Would love more info on the 88K , I played with one at VCF this year running OpenBSD and it was actually snappy. So I wonder why no one wanted to move to it.
That stupid old time music in the background was just annoying as distracting when I was trying to actually listen to the dialog. @1:28 specifically is where I had to pause and write this.
indeed what if they had gone ARM in the 1st place & saved all that chip development money. i remember getting the specs for the instruction set in the late 80s and being quite excited so i ended up buying a A440 (still works).
Same: I saw the ISA and hardware specs, and immediately realised its brilliance, so rushed out immediately and bought an A310, which is still in full working order, including the monitor, and still in very good condition with all the original manuals and packaging, and no, it is absolutely not for sale. I also had the same thought on RBUK doing a Transputer episode, about 15 seconds into this video.
I think the assumption that a chip that included a bunch of features that ended up being important a decade later would automatically be a great chip is probably misplaced. Multiple cores, simd, and the other advanced architectures are cool, but, the main reason we see them in mainstream computing today is that we hit a wall in chip design where you couldn’t just keep cranking the clock speed and making the execution pipeline more advanced. A hypothetical chip with multiple cores would have been de facto slower than a comparably priced single core. It may have theoretically been faster but also would have much harder to actually get that performance out of it, especially with the development tools available at the time. It likely would have ended up being remembered a lot like the cell chip in the PS3, a chip with a ton of potential that never really got utilized outside of a few genius devs.
The Cell chip in the PS3 had some serious design-level mistakes culminating in serious performance issues. There is certainly a way to make multi-core work fast by reducing synchronisation overhead, eg by loosening the coherence guarantees like Alpha did.
@@paulie-g right but the alpha and other multi-processor designs were going for performance beyond what could be practically achieved with a single core/chip using the mfg capabilities of the time and had costs to go along with that capability. There was still a lot of room to just make a faster single core chip at the consumer end of the market, and even if it was theoretically slower than a fully-utilized multi core design it still would have been much easier to program for and so most software would have run better.
Forget the CPU - Apple's far bigger problem during this period was on the software side. They had bad ideas, executed badly, leading to the clustershambles of Copland. (They also built a perfectly serviceable Unix fusion with Mac OS in A/UX and did nothing with it.) Those missteps, more than any of this, were what nearly killed them. They were what led to the reverse takeover by NeXT, who had the one thing Apple could never build - a great Unix-based operating system that was portable to any underlying CPU. (There aren't many good TH-cam videos on this topic BTW.)
Excellent video. I've had an interest in retro for many years and have fond memories of reading about all the different RISC processors in BYTE magazine, but was completely unaware of Apple building their own CPU. Would definitely be interested in a video on the Transputer and love your presentation and delivery. A video on Pink and/or Taligent would also be very interesting.
The Sun workstations ran the code at the same speed as the Cray because the code was serial and didn’t make use of the parallel vector processing in the Cray. The first of many reasons why this project failed all based on the fact the people involved in the project just didn’t understand processors.
That RISC core was copied for it’s GPU, too. Problem is it was a new bespoke RISC architecture, rushed out with hardware bugs and immature dev tools. Makes sense to write code for the processor you already understand, rather than the one you don’t and was a nightmare to debug.
@@Toothily I think the difficulty for Atari was they put every last dollar they had into the jag and it was still not enough. So they had no money left todo dev support properly, hence the lack of dev tools which ultimately under minded the whole effort. Its a huge contrast to the effort Sony put into supporting devs for the PlayStation.
@@RetroBytesUK Yeah it was do or die, but I respect what the engineers were trying to do. It’s still a clever design, even if it only got to ~80% of were it needed to be.
@@Toothily It was a small platform with a small install base. Devs didn't want to write exclusive games for it, so they either wrote games they could port to other platforms or ported existing games, all of which dictated the exclusive use of the 68k.
I was only a young lad back then, but you're right - it did seem like there was a gap in the story, where Apple was introducing '040 machines at 33 MHz for a long time rather than pushing anything faster. (They did have the Quadra 840 AV at 40 MHz though).
It would seem that the 68040 design just couldn't be run at frequencies beyond a certain point, unlike the 80486 which went into its DX2 and even DX4 phases. So, even though the 68040 kept Motorola competitive with the 80486 initially, it just wasn't able to keep up. Eventually, the 68060 did come along and seems to have been more scalable in terms of frequency, but that was only useful to those few platforms that had stuck with the 68000 family like the Amiga (although there were other accelerators for that, too).
Funny to think that that huge purple Cray supercomputer has lower computing power than an IPhone. The japanese also poured lots of resources in the 80's in parallel computing without much to show for.
HP did in fact create its own stack based processor called 'Focus' in the early 1980s. It was actually the first commercial single chip 32 bit cpu. It was used in UNIX (HP-UX) servers and workstations (HP9000 series 500). So they had a working C compiler, obviusly. HP did have prior experience with 16 bit stack based CPUs (HP3000 series). Maybe Apple did get some inspiration from HP?
Interestingly, I've been thinking about the whole RISC/ARM project this week, which did of course involve Apple, and what you said in a previous video about how the first prototype chip had powered up before they even connected the power, because the tiny voltages from the test leads were enough...
To be fair, the voltages were not tiny, they were the regular operating voltage of the CPU and the CPU was operating from normal voltage minus one Schottky junction. It's uncommon to parasitically power a chip through abusing the I/O protection diodes but I've seen it done a number of times in circuit challenges and the like.
Excellent video. Small correction though. Apple uses their own silicon designs in their phones since the A4 for the iPhone 4. Before that they used Samsung based ARM designs. They “just“ scaled up their mobile designs for the desktop now. (Just in the biggest quotation marks you can think of because scaling up a design is incredibly hard)
Apple is a company that is best to its customers when it's struggling, and can't afford to force people into walled gardens and locked down hardware. So all I can say is bring on the struggle.
It did mean that, sadly quiet a while back now they changed it to mean Advanced Risc Machine. I guess they wanted to get away from their Acorn roots back then.
Where's the location at 3:50? After sleuthing for 30 minutes i can't seem to come up with it....... anyway, sharp looking building with the atrium in front.
Thumbs up for a Transputer video please! I was so enamoured with the Atari Abaq/ATW when it was announced, and follow ons like the Kuma Transputer add-on for the Atari ST. Later in life I was doing PC support and visiting one of my customers, he had an Inmos T400 encased in a block of acrylic, presented to him by Inmos before they imploded...
Scorpius would’ve been killer for creative applications, in theory. No way they could’ve fit 4 cores + SIMD onto a single affordable die back then though, what were they smoking? The skunkworks vibe is cool, still.
They really did seam to think for quiet a while this would be the new cpu for the mac at the highest level of the company. As far as I can tell they where still a way out from doing any layout work, so as you said it may have been far too expensive to be practical in the time frame they where planning on. Its the problem with a secret project, there are not many source to go off, so you never can be sure.q
I remember reading in Byte that Apple had a prototype machines running on an AM29000 RISC chip that had comparable performance to a 68LC040. So they had Rosetta JITting 68K code to run well on the 88K and Am29K and maybe MIPS too before PowerPC.
Apple was one of the three in the AIM Alliance, where AIM means Apple, IBM, and Motorola. Apple 🍎 came first in the acronym. It wasn't IAM Oor MIA or IMA or MAI. It was AIM. I daresay if IBM hadn't been involved, Apple would be still making those CPU chips. And arm aarch64 architecture, . . . well, right? RISC again. 🍎 Apple already (once before late last century) used and developed its operating system for RISC architecture. This Apple silicon is the second time around for Apple with RISC.
I just love the idea of we must keep the project a secret, then one the first things they do is buy the least subtle computer know to man, and whats more they paint it purple.
Yep. Even some of the WDC 65C02 was designed in deference to Apple. One or more instructions were made slightly less efficient and retained certain MOS 6502 quirks to remain compatible with the Disk II controller firmware.
A processor like Scorpius would indeed have jumped technology forward, but you know it would have also been fantastically expensive, given how much less mature semiconductor processes were at that time.
It really is wild how if just a few hallway conversations and meetings had gone a bit differently, there are dozens of significant ways in which the whole story could have turned out *completely* differently. If the first design had been a bit more grounded, they could have gotten to market with a first implementation very quickly and left the quad-core design for a Version 2.0. The "big" RISC CPU's of the 80's they would have been competing with were really all very simple. Designing a World Class CPU around 1987 really didn't take a huge investment. Designing what would be a World Class CPU to come out some years later in the 90's absolutely was, and they made a huge mistake by trying to boil the ocean as the first step in the process, while everybody else got entrenched and revised their designs with incremental complexity.
Writing compiler for stack machine is MUCH easier than for register machine. Lexer and parser are, obviously, exactly the same. Most of the optimizations work the same. Local variables and function parameter passing require some load/store-like instructions. But otherwise, code generation is super easy (no register allocation, no register aliasing, no instructions that operate only on specific registers or clobber them, no register pressure and expression evaluation is stack-like in nature). To the point that some compiler designs use stack machine-style code as their intermediate representation. The problem with stack CPUs is memory performance. In such CPU, L1 cache would be the fastest storage space, since there are no general purpose registers. And most of the instructions generate quite a few memory/cache cycles. This is the performance killer.
That was the key to Apples problem to make a compiler that produced code that would run in a performant way is difficult. How does the compiler know when a good time to move data from ram to stack and back. Espically when using a task switching OS. Thats the hard part requiring a degree of presience no compiler can have hence the switch to risc. A vm stack processor is an easier prospect as the native code generated from the pcode runs on a cpu with cache pre-fetch, branch predicition etc. Thats why stack is a popular design for virtual machines, the compiler is relatively simple (as you pointed out) and you dont have to worry about managing when things move from ram to stack.
@@RetroBytesUK I've seen you use the RAM / stack distinction several times when answering comments which say some version of _compilers for stack machines are easy_. I think all our default assumptions are that the stack in a stack machine is just a region of ram with the processor holding a single stack top address on-chip in its architectural state. Are you implying that this particular architecture used a fixed-size small on-chip SRAM buffer to hold the stack? A SRAM stack accessible only via push-from-RAM and pop-to-RAM instructions, with ALU operations sourcing the top one or two stack entries and writing to top of stack implicitly? Perhaps something too small to hold the full state of each thread's function call stack, so now the compiler's problem is shuffling data between the conceptually infinite stack in general RAM, which is the easy one to code gen for, and the fast on-chip stack which imposes tricky constraints on code gen?
It was indeed an XMP, and yes incredibly an iPhone 11 would be able to out perform it all almost all regards. Apart from being a space heater, as a heater the XMP still out performs it.
You got your Iphone numbers way off. That is capable of hundreds of giga flops, depending on what benchmark you are looking at, but even a $5 raspberry pi zero from five years ago can do several giga flops.
The thing is if it had made a to manufacture this thing could have been a massive basket case. Either it wouldn't perform because the complexity of the chip meant they just couldn't push the clock speed or it would end up costing too much to go into a desktop machine. The thought of them jumping straight to ARM is fascinating. You could almost imagine a parallel universe where instead of using Next as a basis for a reborn Mac it was Acorn and RISC OS. Possibly with Olivetti somehow in the mix.
As soon as you mentioned Stack processor, my mind immediately went to a slightly more successful company (that went belly up as soon as the second winter of AI came in). If memory serves, Symbolics and their LISP processor kin, worked on stack processors and eventually created a VLSI version...but MIPS, SGI and a certain version of CLISP outshined them in the end (there is to this day a Symbolics commercial emulator for the VLSI version that used to only work on the Digital Alpha RISC processor...seems the source was shared) I think there was also a stack processor created by Sun Microsystems to run Java natively...on a network computer? Anyways they're all just anals in computer history...
'Own design' wrt ARM is a spectrum. You could just license the ISA and do everything yourself, but almost no one does that. The next step is doing some bits yourself and glueing it to bits an ARM partner fab like TSMC provides as a ready-made high-level design + process implementation. Then you can combine IP cores from various ARM partners, like a Qualcomm CPU and someone's GPU into a SoC. Or you can get a whole SoC either from ARM or one of the partners. This also interacts with fabs, where these partners have process implementations and possibly orders reserved in the pipeline.
And Forth - you see it in Forth, and it works EXTREMELY well if it's done with half a drop of sense. There really isn't a better way to bring up a new embedded design than with Forth.
They didn't start it in 2019/2020, they just knew it had got to an inflection point of useful power and speed. (see A7 customised chips as laying a groundwork for Apple to improve their processors more and more)
@@RetroBytesUK For me it is your voice being to low compared to the music that is the problem. I need to concentrate very hard to get everything. I can live with the overall volume being low since my listening device has a volume control.
@@RetroBytesUK the video in general is too quiet, I have to artificially pump up loudness above 100% on my phone for it to be hearable with any sort of background noise (open window etc.)
I don't understand why stack machine could be hard for compiler? Stack architecture has other flaw in those times... it suffered very badly from memory vs cpu speed gap and this times gap widened quickly
The difficulty is the compiler must work out when to transfer things to/from ram. If the stack is to have the what it needs on it when it needs them. This is an extremely complex thing to get right, especially when you factor in a task switching OS. That's for a hardware Stack based CPU, for Stack based VMs (e.g. Java) you can just ignore this problem, as the native cpu optimises for you (cache pre fetch, branch prediction etc) at run time. Anything that needs a compiler to predict things in advance is going to create problems for the compiler author. Reply
@@RetroBytesUK tell it to Forth programmers, they do it in a head 😅 It is not easy but allocating registers is not easy too. If you you have stack in memory it is can be organised or have hardware spill/fill but if you have small internall stock and must manage it manually it could became nightmare especially with multitasking.
@@AK-vx4dy You can see why their compiler team told them getting a decent performance out the of the CPU was going to be challenging to say the least. With registers there are at least some commonly used techniques for doing a reasonable job of optimising things. As well as common approaches with privileged mode instructions for managing task switching, with registers getting stored and retrieved in a relatively low latency way (in the majority case).
Ah, mention of the Cray-1 - the CPU cabinets are a work of art and I guess for $15m you can choose whatever colour you like 😊 Some argue that RISC stands for Really Invented by Seymour Cray.
Yeah the shift to ARM would have been cool, but why did they drop Power while IBM kept it to this day, and even the Sony PS3 used it? I can see it not being great for mobile use, but for workstations and servers... What am I missing? I realize that's a little beyond the scope of this video. Maybe next time :)
You answered your own question: the PS3 used it. So did the Xbox 360 and the Nintendo Wii. Steve Jobs was suddenly demoted from IBM's number 1 client to number 4. Intel promised he would always be number 1 with them. Obviously there were technical excuses for the move, with the idea that PowerPC was great for performance but Intel was better for performance per watt which was needed for mobile. It was very odd that two weeks after this announcement IBM came out with a new version of the PowerPC G5 optimized for low power. I find it hard to believe Jobs was not aware this was coming.
@@jecelassumpcaojr890While that's true, a mobile G5 was already too little too late. Apple had to keep the aging G4 around for the PowerBook line for ages due to the lack of a mobile G5; and while the G5s were fast they weren't all that much quicker and they used the power of a mid-sized town. The AIM alliance just wasn't producing the kind of chips that Apple needed, when they needed them.
@@3rdalbum My experience with a G5 iMac was that it was between 3 and 4 years before Intel iMacs seemed as fast running native applications (of course emulated software was expected to be slower). Perhaps IBM's "power efficient" G5 was still too hot for Apple laptops - I didn't look at its specification in detail at the time
The place I worked at the time was a big sybase customer, when Microsoft bought it (and renamed it sql server) they persuaded management to move from the Unix version to NT. It did not go well, first we hit issues with memory limits so MS moved us to pentium pro and a specially patched version of NT that could address more memory. However It still performed like a snail on mogadons. Finally MS surrended on x86 and spec'd out the most expensive Alpha box they could find, and switched us over. MS covered the whole cost of the machine, I seam to remember it having 4 Alpha cpus, and it had more than 4gb of ram, but I can't remember exactly how much. This is in the mid‐late 90s where 64mb was a lot of ram, it had an incredible number of simm slots all on removable cards. It was badged up as compaq but it was clearly a DEC design with a compaq badge just suck on it. That thing finally managed to outperform, the Unix version of sybase running on the sun kit we had, it must have cost MS a fortune. They really wanted to show off migrating a big Unix instance to the new MS Sql server on NT, so I guess they figured it was worth it.
You shouldn't NEED a compiler - the whole point of a stack machine is that you can program it directly. That's how Forth works - you gradually build up higher and higher power constructs, until you have a "lexicon" for your application at hand. No - I imagine if you need to use a lot of existing legacy code then a stack machine likely wouldn't be the best way to go.
I appreciate your videos and always have, but I would be remiss to say that while the jazz background is a trademark of the channel, it makes it very difficult to follow along especially for people like myself who have hearing difficulties. Background music is always wonderful but maybe something without a lively beat and lyrics could help create a more inclusive audience. Thank you.
I worked at Apple in the late 80s and most of the 90s, and I do know that the "case design" tasks that the Cray was used for were actually pretty cool and rather advanced. Whatever else the Cray might have been used for, I know they had a tool that modeled the flow of molten plastic into the *VERY* complex mold tooling that Apple was fond of creating for its distinctive enclosures. This modeling allowed virtual test runs of a tool design before it was made, which was a lengthy and complex process that cost huge amounts of money. At that time, Apple's designs favored case plastics that used minimal screws and other hardware fasteners (usually just a few, and quite often, none at all). The Mac LC, Mac IIcx/ci, and Mac Portable were all machines that I recall had zero screws. The machines could be built faster and more automatically with no fasteners, and were very easy to service as well. Also, doing a field teardown and rebuild of a Mac LC in about 30 seconds made for a pretty cool customer demo. You can debate the value of these kinds of designs, but given that they were going to be used, having a way to ensure that the mold tooling would actually work before making it was a big benefit to the company.
😊😊😊plll😊
Impressive! Industrial design is Apple's secret sauce few understand. When it was first revealed at the West Coast Computer Show, the Apple II wasn't the most impressive piece of hardware among the exhibitors. Yet, Apple's booth was mobbed by the press and attendees. Most computer historians credited its successful launch to its color capabilities, but they dismissed the real reason even though it was right in front of them. Unlike the sea of intimidating, ugly sheetmetal boxes with flashing red LEDs at the show, the Apple II was a beauty to behold. It made it approachable and captured the minds imagination. It's amazing Apple continued to place a very high value on aesthetics well after Jobs tenure as demonstrated by the Cray's purchase. Meanwhile, its competitors continued to churn out the same dull, beige, sheetmetal boxes.
Yes, Apple really led the pack in industrial design, even back then. Their enclosures were the nicest-looking and feeling, and easiest to open of any major manufacturer. Apple spent a lot of money on plastics design, which contributed to its status as the "aspirational brand," even through the turbulent 90s. It was fun to see and demo those products when they were introduced. Everyone wanted a peek. Fun fact: the texture finish was applied to the mold tooling only after the designs were frozen, right before production. Up to that point, all samples and test articles came in smooth, untextured plastics. When we showed preproduction demo samples to customers under NDA, it was usually a unit in those smooth-finish plastics. The texture was applied by etching the mold tool in an acid bath, so it was a one-way, non-reversible process. If a mold tool had to be redesigned because of a product design change, it could impart as much as a six-month delay on the product introduction, because making the tooling had such a long lead time. A change like this would have cost many millions of dollars in tooling changes alone. After I left Apple, they started building finished products using smooth, glossy-finish, untextured plastics. This is now much more common than textured plastics, but at the time was unheard-of. Products finished this way require lots of special protective packaging and handling to ensure that the finish is not marred before the customer receives it. Everyone makes electronics that way now, but Apple was the leader. I'm pretty sure the iPod was the first product Apple sold in smooth plastics.
Thanks for sharing! When I got to that part of the video I figured they must have been using the Cray for case air-flow / thermal simulations or something...
If only they stuck to the minimal screw methodology 😤
Should have included a few words on Apple's late 1980s project Mobius. The ARM based prototype that could emulate both the Apple IIe and Macintosh at faster than the native speeds, and went on to power the Newton / eMate line, for 36+ hours of continuous use, on just 4 AA batteries.
Cheers!
I have worked at a Hugh Martin company! He's a terrific executive; honest, transparent, and working in the best interest of the company and its employees, not just the bottom line.
I get the impression hes a stand up guy. Lots of people have very nice things to say about him.
i thout you made a typo for a second and meant a 'huge martini' company🤓
Compiler engineer here. I have literally never once heard someone claim that it's hard to write a compiler for a stack-based machine. Not only is it in fact easier than writing a compiler for a register-based system, but there are a number of production compilers for stack-based systems - the Java Virtual Machine, for instance, is entirely stack-based, and consequently javac is a stack-based compiler.
Something else had to have been going on there, because this story simply does not hold water.
It not about writing a basic complier like the for the JVM. They needed a compiler that would be performant on thier hardware. This is where it get difficult, the compiler has to schedule all the transfers between ram and the stack. The compiler must get the timing of the right so what is needed on the stack is there when it is needed. Doing that for compiling just one application is tricky. Now imagine doing it when it's for a task switching OS, how does the compiler do that in a optimal way.
Compilers targeting stack based VMs can be nice and efficient as they can ignore the bandwidth limits between the stack and ram. As the native CPU does its best to optimise that with cache pre fetch, branch predication. If it's for a HW stack processor everything is the compilers problem.
Do you think the difference in compiler complexity might be between a conceptually infinite stack growing in general RAM with operations which can source from arbitrary offsets on the stack to a system, and a small on-chip stack for which operations (possibly) have fixed offsets that they source from? Then there'd have to be a general program stack to track recursive procedure calls per thread as well as this hot on-chip stack that data gets pushed onto and popped out of from and to external memory.
"At some point I really should do a video about [the Transputer]"
YES, PLEASE DO!
I LOVED working with them in the early nineties, they were so advanced that even now we struggle with concepts they had nailed 35 years ago.
We had a 32-node rig back then (plus some smaller ones), and used it for research in advanced robot control, like controlling flexible robots. As well as teaching PROPER software development.
I'm literally creating a clone of it in a FPGA now because everyone seems to be doing traditional 8bit computers on youtube. I though I might try something different.
I love that thing and I think now its the time for such architecture to give fruits. Specially with all systolic array things that's on GPUs nowadays.
@@monad_tcpCould transputers be used for a new GPU or NPU design?
SARA University compute center in The Netherlands had a Parsytec transputer-based cluster at one time.
@@johndododoe1411 Certainly, that was one of the proposed use cases when the Transputer was created. They had started with some general purpose cores, but were planning specialised ones, including GPU's.
@@johndododoe1411 No. The idea is similar to modern GPUs with programmable pipelines (especially the compute shaders), but they have access to far more RAM with a far wider bus and are much more closely integrated. Transputer is closer in funtion to a modern DSP anyway.
Anyone who makes a Red Dwarf reference in their tech history video deserves a thumbs-up.
Came here to say exactly this.
agreed
One note at 17:28 the PowerPC601 is the result of AIM combining the cache controller of the MC88100 with the single chip version of the POWER1 CPU and then doing a lot modifications at Apple's behest including eliminating thirty some instructions and adding almost 60 new instructions. It's very largely an enhancement of the POWER1 architecture with extras but some facets of operation were fundamentally changed like how system calls are performed at the lowest levels. It's gnit picking but I think it all becomes perfect if you say IBM came along with their new RSC (RISC Single Chip) CPU and asked Apple about OS, finally resulting in the specification for the 601. Very entertaining video, but you could cut the music volume by 3/4 to make future production even better.
Amazing, that there really are just a few enduring efficient processor designs. Us apple users are running RISC processors based on 1980s designs, with an OS originating in the 60s.
Not all that amazing really. It comes down to time and money. Software developers and operating system developers simply didn't, and even today don't have the time and resources (cash and technical know-how) to support more than about 3 CPU architectures across 1-3 operating systems. I'm a software developer who owns a company doing LIDAR development and I only support 1 operating system across 2 CPU architectures. Paying the salaries for programmers who are coding for rare CPUs and rare operating systems simply isn't wise or feasible from a business standpoint. When the market settled on the 3 major CPU architectures (ARM, Intel, AMD), and 2 major operating systems (Windows and MacOS) it allowed developers to produce better quality products for these few architectures/operating systems instead of creating a crap-ton of shoddy products for more varied operating systems and architectures. It became a matter of quality over quantity from a software development standpoint. Software testing and validation is also quite expensive so supporting 3 architectures triples the cost over just one. I dropped MacOS development because it made up less than 1% of my sales, yet the cost for MacOS development was 2/3 more (mainly due to Apple's app store 30% fees) than developing the same apps for Windows, where 99% of my revenue originated.
Neither is true. The modern CPUs used in macs have absolutely nothing in common with the 1980-ties designs. Nothing.
@@rosomak8244 It's my understanding that ARM RISC designs originated from Acorn in the UK in 1983 (Acorn RISC Machine) which joined into ARM holdings in 1990 (Apple being a partner.) The A and M chips in Apple devices are children of this architecture.
Genuinely delighted each time I see a new video from you pop up, you're an excellent presenter and it's always an fun journey! You certainly made the transputer sound fascinating and very deserving of it's own video as well ❤
I really need todo a video on that.
Very interesting video. Small history quibble: The Apple team built some proto boards with 88k chips and so had some ASICs built around the 88k bus before they canceled that project. When IBM and Apple talked about developing the POWER-derived RISC Single Chip for use in Macs, they convinced IBM to bring Motorola in and graft the RSC core onto the 88k bus to save Apple money on redesigning ASICs and logic boards (to the detriment of IBM's team who already had some designs based around a different bus). Thus, the 88k bus became the 60x bus when the RSC project became PowerPC.
I think I have found the source for the "compilers for stack machines are hard" statement in the section from 03:50. An article in The Chip Letter substack. I think its a misreading of this with my ephasis added: "The first version of Aquarius wasn’t RISC at all, but a stack architecture design. Rather than use a conventional cache, the design *also* placed heavy reliance on software and the compiler to manage the transfer of memory into and out of the processor when needed. When the compiler team indicated that building the software to do this was impossible, the design was changed to a more conventional RISC design."
So it used a user-managed on-chip memory like the Tightly Coupled Memory of an ARM microcontroller, the local/shared memory of a GPU in compute mode, or the local memory of a Cell BBE instead of a transparently mapped conventional cache. That was what made it hard to code gen for, not the fact that it was stack-based. I'd add a link but that might delete my comment.
That's basically it, with stack processors of the time, the stack the processor could access was held entirely on chip. Instructions (expect load/store to and from the stack) could not access conventional ram. They only operated on the stack. This leaves the compiler having to schedule all load/store operations to conventional ram, a more or less impossible task for it todo optimally.
Other none pure stack designs muddy that water a bit. The key advantage of a pure hw stack processor is everything you need is in the stack on the chip, there are no slow calls to/from ram. Which is great for things where all the code that will be ran fits in the on chip stack. Where that's not the case these designs have an issue with scheduling stuff off/on the stack. Some improvements on that type of design (in terms of making it more useful for regular computers) is the stack register, which allows for mutplie stacks to exist on chip with the stack register controlling which stack is in use. As the os schedules different processes (or threads) the stack register is updated so it points to the stack related to the scheduled process. That can be combined with load/store operations for the none active stack, so when the task related to that stack is running its not waiting on load/store operations.
However there is always an issue with how much sram you can fit on die, as the gate count for doing that soon gets very large.
There was a joke about Apple brought Cray to design the next Mac, Saymour Cray bought Mac to design the next Cray. Also, some ARM processors supported Jazelle extension, which is basically a hardware version for Java bytecode. It was mandatory to run Java on early phones.
Very good point, I keep meaning to dig into how arm implemented that. Next time I'm in the pub with one of them I'm going to try to find out.
@@RetroBytesUK There were two versions of Jazelle, originally called 1 and 2 but later renamed to DBX (direct bytecode execution) and RCT (runtime compilation target). The latter eventually merged with Thumb and is still available while Jazelle 1 was completely killed off. There wasn't much information about it to start with but ARM has put some effort on making what was there go away. Jazelle 1 could translate the simpler Java bytecodes directly to a corresponding ARM instruction with a little bit of register renaming so a group of 4 (I think) registers cached the top of the stack. For the more complicated Java bytecodes it would jump to a small routine in ARM machine language to interpret. But you replaced the main interpreter loop with hardware
TI’s TMS9900 series CPUs were stack based, and were found in the TI99/4a. They had one register, which was a pointer to the current stack location in memory. The problem with stack machines was that they didn’t scale well with the improving speed of the CPU. RAM didn’t keep pace with the improvements in silicon (which is why cache memory, and different levels of cache were invented).
TMS9900 was not stack based . It was a simplistic design that used a 32 byte memory structure to hold all the architectural registers in what would otherwise have been a neat RISC design . If only they had loaded those registers into a CPU buffer, it could have been much faster .
8087 was a stack processor without all the basic CPU functions that were done by the 8086 CISC processor .
Using faster (local sram) memory for the registers and slower memory (external dram) for the stack memory is more tradition, not a technical hardware decision. Modern ARM processors talk about register files (physical local sram blocks with individual addresses). That local sram block which could very well hold a fast stack instead of registers.
Architecturally, you need pipeline and flag registers all over the place in modern designs - so you might as well use the "register" as the base abstraction, and physically as a design block. That's why the stack becomes a second order abstraction, even though it's crucial for all compilers with subroutines, and a great model for general computing.
The control logic defines what abstraction you're implementing to make sense of a memory block. The ALU can work with either register concepts or stack concepts for input and output, it's all just latches at the end of a wire.
C compilers want fast registers(because that's all of the fast memory they could fit into old hardware), so hardware designers implement local sram registers. That's why. If Forth was more popular than the fortran and c languages, we'd have fast memory mapped as stacks.
@@PaulSpades Going way back, some early CPU's had the registers on mechanical drum memory in the days when even DRAM was still wild sci fi. I think the formal technical definition is that registers are in whatever is the fastest memory you can afford to pay for at the time.
Are you perhaps confusing the TMS9900 workspace pointer register with an overall stack architecture?
My Dad had to deal with John Sculley in the Pepsi days. His very first impression of the man was a fucking douchebag that doesn't at all care what he is selling or if he even understands what he is selling at all. The poaching of John Sculley was Steve Jobs' worst business decision of all time, and I am not even talking about the later Jobs firing/betrayal. The entire decision was the worst thing for Apple.
That chimes with what more or less everyone who has spoken about meeting him had to say about Sculley.
In the developer community in the late 1980s there was a common saying... "Lynch the rat-bastard Scully."
He may have been a bastard, but I don't credit the claim that he didn't understand Pepsi. It's sugar water with a colorant and carbon dioxide. What's hard to understand about that?
@@paulie-g The problem was that he didn't care what the business was. Being hired by Jobs (someone who is overly passionate about products and their impact in society) the end result was Skulley just running a computer company like he did a softdrink company. In a PURELY money sense, maybe those differences don't matter, but it does very much matter. Skulley left the company in a state that did not recover (and almost insolvent) until Apple rehired Jobs after the Next acquisition. I don't idolize Jobs and I own no Apple products, but Skulley was always wrong for this company. He did do wonders for Pepsi in his time there, but I just don't see him ever caring about the difference in importance between softdrinks and technology. Jobs famous quip while convincing Skulley to come on board is very popular(Sugar water vs changing the world) was a good pitch, but it doesn't seem like he toook it to heart and helped ruin the company anyway.
@@MicrophonicFool So, long story short, dirtbag Jobs nearly ruined his own company and got himself the boot by foolishly hiring Skully, a bigger douchebag than himself? LOL
It would be interesting to see Apples *"White Magic"* project covered. It was a hardware 3D accelerator that was very close in concept to the Power VR GPU. The Quickdraw 3D Accelerator was the only product that was released from this project, though. The follow-up design might have been available in 1997 if Steve Jobs had not fired every employee who was not directly involved with the Macintosh core development teams when he came back to Apple.
Stack architectures are trivial to write compilers for. The reason you don't see them now is that you can't load a value from memory long before you use it. Memory latency has been the dominant force in processor design since before risc was a thing; if you go back to when memory was fast, you'll find high end mainframes like the burroughs b5000 series (from the early 60s) were stack machines.
So its trival to write a compiler if you dont mind if the resultent code performs very baddly, on a modern hardware design. If you have to write a compiler so things get taken from memory to the stack so they are there ready when needed, it gets alot more complex. Thats what the compiler team where trying to get across, sure they could make a compiler, but it would perform baddly as they couldn't get to the point where it would be good at get things on the stack in an optimal way.
@@RetroBytesUK compilers for stack machines are much simpler and initially performed as well as compiled code for register machines. That is why machines such as Burroughs 5000, HP3000, Transputer, iAPX432, RISC I to III and Sparc (a mix of stack and registers) and so many others took that route. Improvements to compiler technology in the 1980s in the form of new register allocation algorithms changed this dynamic making register machines more desirable since they were simpler to pipeline. I wonder if advanced out-of-order implementations with register renaming wouldn't make stack machines interesting again?
@@Bebtelovimab What makes you think CISC code is smaller/denser? The demand for small code hasn't gone away - the ability to fit code into L1 (especially) and L2 is still king. My theory of the problem with stack machines is that they create unnecessary dependency chains, which makes extracting insn-level paralellism, super-scalar, pipelining and OoO harder - all things that are core to the performance of modern processors. It's why the few modern stack implementations are very small microcontrollers focusing on realtime control.
@@paulie-g Literally textbooks to this day teach CISC instructions as being more dense than RISC. They use Chinese versus English as an example. And this was not that old for a college textbook (2018), so they knew of RISC. Although they mention Itanium's EPIC without mentioning the whole Itanic thing. I had a laugh when reading it before class (not in public). Although they also still cover optical media (i can understand LTO). For reference, this was a basic introductory computer course (in Information Systems, Comp Sci with less low-level programming and more business admin), that dealt with basic concepts.
@@alext3811 OK, so the answer is "someone said so in a textbook", with the textbook being generalist rather than specialist. Gotcha. Density of instruction set =/= density of code. The comparison is not RISC per se but load-store architectures. I bet they hadn't heard of Thumb. There's enough pointers for you to educate yourself if interested.
"The SI unit of failure": Excellent!
Another +1 from me for that Transputer video. I once worked next to an office in Notting Dale that had a large poster of a Transputer on the wall, funnily enough above their single original model Mac, back when it was 'the next big thing' and would love to see you cover what happened.
Some thoughts:
The loud background music and the way you pronounce words in a rather "slurry" way makes it hard for a non-UK citizen to understand what is beeing said.
That beeing said, the content and the enthusiastic way you portrait it makes for an outstanding interesting video! I don't mind a bit of dialect either. This is meant to be a positive and constructive critique. Keep up the good work!
Commenting before watching further- my ears legitimately were ringing after that loud white noise at 3:23. Please duck it next time. (Otherwise a huge fan of your videos)
Sorry, I will duck that one a bit more in future.
That background muzak though 😪
I love learning about the history of computing and have to say finding your channel has been a real gem. Thank you for all your efforts.
You claimed that it would be difficult to write a compiler for a stack-based machine, but actually it is the reverse: it is much easier to write a compiler for those machines.
That is also the reason why a JVM is stack-based. It is a machine designed by compiler writers instead of hardware designers.
In a stack-based machine the compiler does not have to worry about efficiently using the available processor registers, which can be quite difficult especially on machines where there are only a few of them and they have specialized purposes.
The reason why stack-based machines are not popular is because they normally are slower.
Indeed. I was also a bit surprised by that statement. Virtual Machines (in the language area, not the whole computer like vmware) are a intermediate step during compilation of a program. It makes it (comparably) easy to write the compiler from the language to the intermediate code. And the second compile phase (just in time) is when the code for the stack machine is translated into architecture specific code.
But you are right. Real world applications would run incredible slow with a stack based machine because it would require lots of (slow) memory access. And it would probably also not work well with pipelines (where the processor executes several instructions at the same time by slicing them into microcode operations like fetch, load, execute and store.)
I probably should have been more clear, its hard to write a performant complier for HW based stack designes, as working out when/what to get into the stack is hard, so its there ready in the stack when needed. Making it the compliers problem rather than the cpus (prefetch, branch prediciton etc) hugely increases the complexity for the complier writter. For solutions like the JVM you can basically ingore that stuff as the native cpu will do all the prefetch etc. They where targeting performance, so when the compiler team told them it was too complex to make a perfomant compiler they moved on. Ocam and the transputer cleaverly avoided the big performance issue, by using message passing (which everything was optimised for), where other system used shared memory, this avoided using the technique that was a big source of issues with parallel computing and also the preformace issues of having the complier dealing with planning all transfers from ram to stack.
@@RetroBytesUK In a stack-based machine, all data lives in the stack. It either is at the top-of-stack or just below it, for temporary data used in expression evaluation, or in stack frames below that for local variables of the functions. That is why stack architecture fits compilers for structured language so well: there is a direct mapping between variables in the language and locations in the stack, without the detour via processor registers.
This has other advantages as well: context switching becomes much simpler. To switch thread, you just reload the stack pointer and program counter. No need to save other registers because there aren't any. Process context switching of course is more complex as memory management is also involved.
But there are performance problems because everything is in memory. However, an optimized processor would have cache to hold the most frequently accessed stack locations, at least the words at the top of the stack and a dynamically assigned cache for the locations beyond that.
It is not the task of the compiler writer to manage that. Maybe at Apple they decided that (at that time) making the processor clever enough to manage the cache was too difficult, and that task would be passed on to the compiler. Yes, that would make it complex. But it also is a bad design decision to begin with.
Great video! And, yes, I'd love to see one on the Transputer. In the late 90's, I wrote code for several network analyzer platforms, which all used transputer-based coprocessor cards in portable PCs (along with some FPGA-type programmable hardware).
Xilinx FPGAs ( 4000 Series) ? 2 32 Bit Transputers and 2 16 Bit Transputers ?
Burroughs used a stack machine architecture in their mainframes and minicomputers. They were primarily aimed at COBOL applications, but there was also a C compiler which made porting applications from Unix systems possible. Possible, but by no means easy.
BTW, the Cray computer at Manchester University in the '70s was purple too.
Burroughs is a more complicated story because it was specifically hardware and software co-designed (something we desperately need in the modern era) for (what was then) high level languages. The system-level language was ALGOL (60 probably) with a really fast one pass compiler because they departed from the standard by requiring definition before use enabling one pass compilation, which was a huge deal in the days of punch cards. Incidentally, none of those HLLs would be implemented with a stack machine today. A good example of a modern attempt at something similar is the Mill, at least conceptually.
@@paulie-g Interesting, thanks. My only direct experience was on an A series using that C compiler I mentioned.
Good video, but the music is too loud compared to the narrative. Makes it difficult to follow.
Damn, I love hearing Django Reinhardt during a video like this :) (gotta admit it was pretty distracting though, my brain just goes dancing when Minor Swing kickes in xD)
I would love a video on Hitachis Super H line of CPUs
1:56. The PPC was a three way developed effort between Motorola, Apple, and IBM. ....so it's not exactly fair to say they opted for the PPC over their own design, as they had their fingers in the PPC design.
There was an Apple connection to ARM as well, since it was Apple and Acorn together who worked to spin off ARM from Acorn. Apple sold of ARM stock for years after Jobs returned and didn’t start on their own ARM designs until well after they had divested their shares in ARM.
The MMX did have its own little jingle on top of the Intel one though. So that's a thing....
Fascinating story, thanks RetroBytes! Always lovely seeing a Cray.
In all fairness, the AMD jaguar architecture found its way into the PS4 and Xbox One. Although low power semi-custom chips probably aren't the biggest money maker, getting that much coverage in the console market was a bright spot on their balance sheet before Zen came around.
So we somehow got one non failing jaguar
there was also the mac os jaguar that was pretty successful too.
One of my favorite videos you’ve done so far! Very interesting.
What did the A in AIM stand for again?
“Other companies”, apparently! 🤣
it's actually fascinating how often in history something is invented or almost invented by someone years before it's re-invented by someone else and makes it in to mainstream much later on in time.
Completely agree, it just goes to show how even the best ideas need the right timing.
...or stolen, like the "iPod", look up that story and weep.
Great video! Would love more info on the 88K , I played with one at VCF this year running OpenBSD and it was actually snappy. So I wonder why no one wanted to move to it.
One more or less common example of popular stack processor is the 8087 FPU and its successors.
That stupid old time music in the background was just annoying as distracting when I was trying to actually listen to the dialog.
@1:28 specifically is where I had to pause and write this.
indeed what if they had gone ARM in the 1st place & saved all that chip development money. i remember getting the specs for the instruction set in the late 80s and being quite excited so i ended up buying a A440 (still works).
Cracking machine for its time, saddly I only have later Acorn Arm machines, but I do love them.
Same: I saw the ISA and hardware specs, and immediately realised its brilliance, so rushed out immediately and bought an A310, which is still in full working order, including the monitor, and still in very good condition with all the original manuals and packaging, and no, it is absolutely not for sale. I also had the same thought on RBUK doing a Transputer episode, about 15 seconds into this video.
Excellent as usual and some fascinating insights into Apple's history. Thanks very much, keep them coming!
I think the assumption that a chip that included a bunch of features that ended up being important a decade later would automatically be a great chip is probably misplaced.
Multiple cores, simd, and the other advanced architectures are cool, but, the main reason we see them in mainstream computing today is that we hit a wall in chip design where you couldn’t just keep cranking the clock speed and making the execution pipeline more advanced. A hypothetical chip with multiple cores would have been de facto slower than a comparably priced single core. It may have theoretically been faster but also would have much harder to actually get that performance out of it, especially with the development tools available at the time. It likely would have ended up being remembered a lot like the cell chip in the PS3, a chip with a ton of potential that never really got utilized outside of a few genius devs.
The Cell chip in the PS3 had some serious design-level mistakes culminating in serious performance issues. There is certainly a way to make multi-core work fast by reducing synchronisation overhead, eg by loosening the coherence guarantees like Alpha did.
@@paulie-g right but the alpha and other multi-processor designs were going for performance beyond what could be practically achieved with a single core/chip using the mfg capabilities of the time and had costs to go along with that capability. There was still a lot of room to just make a faster single core chip at the consumer end of the market, and even if it was theoretically slower than a fully-utilized multi core design it still would have been much easier to program for and so most software would have run better.
Forget the CPU - Apple's far bigger problem during this period was on the software side. They had bad ideas, executed badly, leading to the clustershambles of Copland. (They also built a perfectly serviceable Unix fusion with Mac OS in A/UX and did nothing with it.) Those missteps, more than any of this, were what nearly killed them. They were what led to the reverse takeover by NeXT, who had the one thing Apple could never build - a great Unix-based operating system that was portable to any underlying CPU. (There aren't many good TH-cam videos on this topic BTW.)
Excellent video. I've had an interest in retro for many years and have fond memories of reading about all the different RISC processors in BYTE magazine, but was completely unaware of Apple building their own CPU. Would definitely be interested in a video on the Transputer and love your presentation and delivery.
A video on Pink and/or Taligent would also be very interesting.
I would love a video about PowerPC architecture chips and also the new M series chips
The Sun workstations ran the code at the same speed as the Cray because the code was serial and didn’t make use of the parallel vector processing in the Cray. The first of many reasons why this project failed all based on the fact the people involved in the project just didn’t understand processors.
Interesting, the Fabled Atari Jaguar' had a pretty efficient RISC CPU in it. (Pretty sure the Atari Jaguar RISC design was licensed though...)
Shame almost everything released for it only used the 68000 it had.
That RISC core was copied for it’s GPU, too. Problem is it was a new bespoke RISC architecture, rushed out with hardware bugs and immature dev tools. Makes sense to write code for the processor you already understand, rather than the one you don’t and was a nightmare to debug.
@@Toothily I think the difficulty for Atari was they put every last dollar they had into the jag and it was still not enough. So they had no money left todo dev support properly, hence the lack of dev tools which ultimately under minded the whole effort. Its a huge contrast to the effort Sony put into supporting devs for the PlayStation.
@@RetroBytesUK Yeah it was do or die, but I respect what the engineers were trying to do. It’s still a clever design, even if it only got to ~80% of were it needed to be.
@@Toothily It was a small platform with a small install base. Devs didn't want to write exclusive games for it, so they either wrote games they could port to other platforms or ported existing games, all of which dictated the exclusive use of the 68k.
The Atari Jaguar is now considered the SI unit of failure. 👏👏👏👏
I'm totally stealing that.
Sooo good!😂
Very detailed, thoroughly enjoyed it. Yes to Transputer!
I was only a young lad back then, but you're right - it did seem like there was a gap in the story, where Apple was introducing '040 machines at 33 MHz for a long time rather than pushing anything faster. (They did have the Quadra 840 AV at 40 MHz though).
It would seem that the 68040 design just couldn't be run at frequencies beyond a certain point, unlike the 80486 which went into its DX2 and even DX4 phases. So, even though the 68040 kept Motorola competitive with the 80486 initially, it just wasn't able to keep up. Eventually, the 68060 did come along and seems to have been more scalable in terms of frequency, but that was only useful to those few platforms that had stuck with the 68000 family like the Amiga (although there were other accelerators for that, too).
Fascinating. And so well produced! What a great video about something I literally knew nothing about. Namaste.
Funny to think that that huge purple Cray supercomputer has lower computing power than an IPhone.
The japanese also poured lots of resources in the 80's in parallel computing without much to show for.
HP did in fact create its own stack based processor called 'Focus' in the early 1980s.
It was actually the first commercial single chip 32 bit cpu.
It was used in UNIX (HP-UX) servers and workstations (HP9000 series 500). So they had a working C compiler, obviusly. HP did have prior experience with 16 bit stack based CPUs (HP3000 series).
Maybe Apple did get some inspiration from HP?
Interestingly, I've been thinking about the whole RISC/ARM project this week, which did of course involve Apple, and what you said in a previous video about how the first prototype chip had powered up before they even connected the power, because the tiny voltages from the test leads were enough...
To be fair, the voltages were not tiny, they were the regular operating voltage of the CPU and the CPU was operating from normal voltage minus one Schottky junction. It's uncommon to parasitically power a chip through abusing the I/O protection diodes but I've seen it done a number of times in circuit challenges and the like.
"SI unit of failure..." LOL.
Excellent video.
Small correction though. Apple uses their own silicon designs in their phones since the A4 for the iPhone 4. Before that they used Samsung based ARM designs. They “just“ scaled up their mobile designs for the desktop now.
(Just in the biggest quotation marks you can think of because scaling up a design is incredibly hard)
Awesome content, Great to learn about how the tech industry has evolved.
Great video mate! I love nostalgia. 😁
Apple is a company that is best to its customers when it's struggling, and can't afford to force people into walled gardens and locked down hardware. So all I can say is bring on the struggle.
Are you sure about private instructions set? Isn't Mx series is all ARM instructions set actually?
Love the Red Dwarf reference.
Things I never knew I wanted include: A giant, purple supercomputer!!!!!
ARM means Acorn Risk Machines and those are the same people that created the BBC computer.
It did mean that, sadly quiet a while back now they changed it to mean Advanced Risc Machine. I guess they wanted to get away from their Acorn roots back then.
Where's the location at 3:50? After sleuthing for 30 minutes i can't seem to come up with it....... anyway, sharp looking building with the atrium in front.
Thumbs up for a Transputer video please! I was so enamoured with the Atari Abaq/ATW when it was announced, and follow ons like the Kuma Transputer add-on for the Atari ST. Later in life I was doing PC support and visiting one of my customers, he had an Inmos T400 encased in a block of acrylic, presented to him by Inmos before they imploded...
Scorpius would’ve been killer for creative applications, in theory. No way they could’ve fit 4 cores + SIMD onto a single affordable die back then though, what were they smoking? The skunkworks vibe is cool, still.
They really did seam to think for quiet a while this would be the new cpu for the mac at the highest level of the company. As far as I can tell they where still a way out from doing any layout work, so as you said it may have been far too expensive to be practical in the time frame they where planning on. Its the problem with a secret project, there are not many source to go off, so you never can be sure.q
You didn't mention that MMX forced the implementation of a multi planar system because, well you can make a vid on that debacle.
I remember reading in Byte that Apple had a prototype machines running on an AM29000 RISC chip that had comparable performance to a 68LC040. So they had Rosetta JITting 68K code to run well on the 88K and Am29K and maybe MIPS too before PowerPC.
Apple was one of the three in the AIM Alliance, where AIM means Apple, IBM, and Motorola. Apple 🍎 came first in the acronym. It wasn't IAM Oor MIA or IMA or MAI. It was AIM.
I daresay if IBM hadn't been involved, Apple would be still making those CPU chips.
And arm aarch64 architecture, . . . well, right? RISC again.
🍎 Apple already (once before late last century) used and developed its operating system for RISC architecture. This Apple silicon is the second time around for Apple with RISC.
The transputer is one of my absolute favorites!! Do one!!
Loved the red dwarf reference. Albanian state washing machine company
Thank goodness someone spotted it, last time I made a red drawf reference, most missed it and told me I'd got what ascii stood for wrong.
I came here to see whether anyone else had spotted it and two people already had! I remember the ASC-II thing too :)@@RetroBytesUK
Slight correction, Apple has been designing mobile ARM chips since the A4 introduced in 2010 with the iPhone 4.
16:50 - Data General also used the 88K, a little anyway…
The bit about having a cray you have no idea what to do with is the only truly endearing thing apple has ever done, well that and the IIGS
I just love the idea of we must keep the project a secret, then one the first things they do is buy the least subtle computer know to man, and whats more they paint it purple.
Often also gets forgotten: Apple had consulted on and considerable input into the WDC 65816 which they used for the Apple IIGS.
Yep. Even some of the WDC 65C02 was designed in deference to Apple. One or more instructions were made slightly less efficient and retained certain MOS 6502 quirks to remain compatible with the Disk II controller firmware.
A processor like Scorpius would indeed have jumped technology forward, but you know it would have also been fantastically expensive, given how much less mature semiconductor processes were at that time.
It really is wild how if just a few hallway conversations and meetings had gone a bit differently, there are dozens of significant ways in which the whole story could have turned out *completely* differently.
If the first design had been a bit more grounded, they could have gotten to market with a first implementation very quickly and left the quad-core design for a Version 2.0. The "big" RISC CPU's of the 80's they would have been competing with were really all very simple. Designing a World Class CPU around 1987 really didn't take a huge investment. Designing what would be a World Class CPU to come out some years later in the 90's absolutely was, and they made a huge mistake by trying to boil the ocean as the first step in the process, while everybody else got entrenched and revised their designs with incremental complexity.
Interpreted virtual machines use stack. VMs targeting JITs mostly use virtual cpus with registers(three address code).
Writing compiler for stack machine is MUCH easier than for register machine. Lexer and parser are, obviously, exactly the same. Most of the optimizations work the same. Local variables and function parameter passing require some load/store-like instructions. But otherwise, code generation is super easy (no register allocation, no register aliasing, no instructions that operate only on specific registers or clobber them, no register pressure and expression evaluation is stack-like in nature). To the point that some compiler designs use stack machine-style code as their intermediate representation. The problem with stack CPUs is memory performance. In such CPU, L1 cache would be the fastest storage space, since there are no general purpose registers. And most of the instructions generate quite a few memory/cache cycles. This is the performance killer.
That was the key to Apples problem to make a compiler that produced code that would run in a performant way is difficult. How does the compiler know when a good time to move data from ram to stack and back. Espically when using a task switching OS. Thats the hard part requiring a degree of presience no compiler can have hence the switch to risc. A vm stack processor is an easier prospect as the native code generated from the pcode runs on a cpu with cache pre-fetch, branch predicition etc. Thats why stack is a popular design for virtual machines, the compiler is relatively simple (as you pointed out) and you dont have to worry about managing when things move from ram to stack.
@@RetroBytesUK I've seen you use the RAM / stack distinction several times when answering comments which say some version of _compilers for stack machines are easy_. I think all our default assumptions are that the stack in a stack machine is just a region of ram with the processor holding a single stack top address on-chip in its architectural state. Are you implying that this particular architecture used a fixed-size small on-chip SRAM buffer to hold the stack? A SRAM stack accessible only via push-from-RAM and pop-to-RAM instructions, with ALU operations sourcing the top one or two stack entries and writing to top of stack implicitly? Perhaps something too small to hold the full state of each thread's function call stack, so now the compiler's problem is shuffling data between the conceptually infinite stack in general RAM, which is the easy one to code gen for, and the fast on-chip stack which imposes tricky constraints on code gen?
PCBWAY is the way!
The Cray appears to be an XMP; quick Google indicates its max speed was 500 MFLOPS. An iPhone 11 can do 690 MFLOPS...
It was indeed an XMP, and yes incredibly an iPhone 11 would be able to out perform it all almost all regards. Apart from being a space heater, as a heater the XMP still out performs it.
You got your Iphone numbers way off. That is capable of hundreds of giga flops, depending on what benchmark you are looking at, but even a $5 raspberry pi zero from five years ago can do several giga flops.
The thing is if it had made a to manufacture this thing could have been a massive basket case. Either it wouldn't perform because the complexity of the chip meant they just couldn't push the clock speed or it would end up costing too much to go into a desktop machine.
The thought of them jumping straight to ARM is fascinating. You could almost imagine a parallel universe where instead of using Next as a basis for a reborn Mac it was Acorn and RISC OS. Possibly with Olivetti somehow in the mix.
As soon as you mentioned Stack processor, my mind immediately went to a slightly more successful company (that went belly up as soon as the second winter of AI came in). If memory serves, Symbolics and their LISP processor kin, worked on stack processors and eventually created a VLSI version...but MIPS, SGI and a certain version of CLISP outshined them in the end (there is to this day a Symbolics commercial emulator for the VLSI version that used to only work on the Digital Alpha RISC processor...seems the source was shared)
I think there was also a stack processor created by Sun Microsystems to run Java natively...on a network computer?
Anyways they're all just anals in computer history...
On the intro it’s mentioned Apple was using ARM designs for iPhone. But I think since A4 it’s the Apple own design and this is the base for the M1+
'Own design' wrt ARM is a spectrum. You could just license the ISA and do everything yourself, but almost no one does that. The next step is doing some bits yourself and glueing it to bits an ARM partner fab like TSMC provides as a ready-made high-level design + process implementation. Then you can combine IP cores from various ARM partners, like a Qualcomm CPU and someone's GPU into a SoC. Or you can get a whole SoC either from ARM or one of the partners. This also interacts with fabs, where these partners have process implementations and possibly orders reserved in the pipeline.
And Forth - you see it in Forth, and it works EXTREMELY well if it's done with half a drop of sense. There really isn't a better way to bring up a new embedded design than with Forth.
No mention of Apple also looking at Sparc?
They didn't start it in 2019/2020, they just knew it had got to an inflection point of useful power and speed. (see A7 customised chips as laying a groundwork for Apple to improve their processors more and more)
Is it just my computer or does this audio get quieter as it goes along? (great video though!)
Wow. I love when I find out about something I didn't know about.
I have your SSE extensions right here, in my PC computer purchased with money I got from the ATM machine using my PIN number.
Did you buy a CPU unit at 10am in the morning?
Amazing video as usual, still too quiet though!
I'll look into that, my editing software has gained better support for YTs lufs level, so I should get a better view on loudness.
@@RetroBytesUK For me it is your voice being to low compared to the music that is the problem. I need to concentrate very hard to get everything.
I can live with the overall volume being low since my listening device has a volume control.
@@RetroBytesUK the video in general is too quiet, I have to artificially pump up loudness above 100% on my phone for it to be hearable with any sort of background noise (open window etc.)
@@balukrol Same: I needed +3 db to follow the narrative, but apart from that it is an excellent video.
please please please do a transmeta video. i loved my little fujisu laptop with one of those in it back in 2003.
Have to admit, this is the first time I ever heard that Al Alcorn ever worked at Apple!
I don't understand why stack machine could be hard for compiler?
Stack architecture has other flaw in those times... it suffered very badly from memory vs cpu speed gap and this times gap widened quickly
The difficulty is the compiler must work out when to transfer things to/from ram. If the stack is to have the what it needs on it when it needs them. This is an extremely complex thing to get right, especially when you factor in a task switching OS. That's for a hardware Stack based CPU, for Stack based VMs (e.g. Java) you can just ignore this problem, as the native cpu optimises for you (cache pre fetch, branch prediction etc) at run time. Anything that needs a compiler to predict things in advance is going to create problems for the compiler author.
Reply
@@RetroBytesUK tell it to Forth programmers, they do it in a head 😅
It is not easy but allocating registers is not easy too.
If you you have stack in memory it is can be organised or have hardware spill/fill but if you have small internall stock and must manage it manually it could became nightmare especially with multitasking.
@@AK-vx4dy You can see why their compiler team told them getting a decent performance out the of the CPU was going to be challenging to say the least. With registers there are at least some commonly used techniques for doing a reasonable job of optimising things. As well as common approaches with privileged mode instructions for managing task switching, with registers getting stored and retrieved in a relatively low latency way (in the majority case).
Was this the Apple which put its tongue out in the advert in the 90s? 😝😉
Love Mrs Retro Bytes x
🤣
It also got its speed by being implemented with ECL logic (which is also why it was so power hungry).
Brilliant Again!
Ah, mention of the Cray-1 - the CPU cabinets are a work of art and I guess for $15m you can choose whatever colour you like 😊 Some argue that RISC stands for Really Invented by Seymour Cray.
Yeah the shift to ARM would have been cool, but why did they drop Power while IBM kept it to this day, and even the Sony PS3 used it? I can see it not being great for mobile use, but for workstations and servers... What am I missing? I realize that's a little beyond the scope of this video. Maybe next time :)
You answered your own question: the PS3 used it. So did the Xbox 360 and the Nintendo Wii. Steve Jobs was suddenly demoted from IBM's number 1 client to number 4. Intel promised he would always be number 1 with them. Obviously there were technical excuses for the move, with the idea that PowerPC was great for performance but Intel was better for performance per watt which was needed for mobile. It was very odd that two weeks after this announcement IBM came out with a new version of the PowerPC G5 optimized for low power. I find it hard to believe Jobs was not aware this was coming.
@@jecelassumpcaojr890While that's true, a mobile G5 was already too little too late. Apple had to keep the aging G4 around for the PowerBook line for ages due to the lack of a mobile G5; and while the G5s were fast they weren't all that much quicker and they used the power of a mid-sized town. The AIM alliance just wasn't producing the kind of chips that Apple needed, when they needed them.
@@3rdalbum My experience with a G5 iMac was that it was between 3 and 4 years before Intel iMacs seemed as fast running native applications (of course emulated software was expected to be slower). Perhaps IBM's "power efficient" G5 was still too hot for Apple laptops - I didn't look at its specification in detail at the time
Had a cluster of PowerEdge 7250 running SQL Sever! Only way get around memory limits at the time 4/8gb memory limits.
The place I worked at the time was a big sybase customer, when Microsoft bought it (and renamed it sql server) they persuaded management to move from the Unix version to NT. It did not go well, first we hit issues with memory limits so MS moved us to pentium pro
and a specially patched version of NT that could address more memory. However It still performed like a snail on mogadons. Finally MS surrended on x86 and spec'd out the most expensive Alpha box they could find, and switched us over. MS covered the whole cost of the machine, I seam to remember it having 4 Alpha cpus, and it had more than 4gb of ram, but I can't remember exactly how much. This is in the mid‐late 90s where 64mb was a lot of ram, it had an incredible number of simm slots all on removable cards. It was badged up as compaq but it was clearly a DEC design with a compaq badge just suck on it. That thing finally managed to outperform, the Unix version of sybase running on the sun kit we had, it must have cost MS a fortune. They really wanted to show off migrating a big Unix instance to the new MS Sql server on NT, so I guess they figured it was worth it.
You shouldn't NEED a compiler - the whole point of a stack machine is that you can program it directly. That's how Forth works - you gradually build up higher and higher power constructs, until you have a "lexicon" for your application at hand. No - I imagine if you need to use a lot of existing legacy code then a stack machine likely wouldn't be the best way to go.
8:53 so basically Cray had SIMD in the 1970s/1980s?
I appreciate your videos and always have, but I would be remiss to say that while the jazz background is a trademark of the channel, it makes it very difficult to follow along especially for people like myself who have hearing difficulties. Background music is always wonderful but maybe something without a lively beat and lyrics could help create a more inclusive audience. Thank you.