Oh Itanium… I did NTDDK work, writing drivers , and I got called in to help on an Itanium effort. I had low-level MC680x0, x86, VAX and Alpha experience, so they figured I could help with IA64 too, I suppose. Boy were the performance anomalies bizarre. You’d have an ISR that was nearly functionally and syntactically identical to an x86 (or later AMD64) equivalent (in C), and yet it would perform terribly, but only under certain circumstances. Given that VLIW/EPIC was supposed to largely predetermine/hint ideal branching and ILP at object generation time, you’d expect fairly consistent performance. Instead, depending on what other ISR’s or DPC’s had been serviced, performance for our code would vary massively (I literally would break the system to remote windbg and drill through step by step, instruction by instruction). Eventually it became clear that there were a lot of bad pre-fetch gotchas, lack of ILP “bundling” that the compiler was padding with nop’s, and the resulting instruction cache miss rate was high (along with mp L2 cache coherency problems). Most of this had no clear recovery path as IA64 was an IPF in-order architecture, as they’d really gone all-in on speculative execution. To fix certain issues literally required hand-coded assembler, which oddly enough was actually quite excellent. The ISA itself, abundant gpr’s, clear mnemonics, made for a nice programming front-end (especially compared to our old friend, x86). But none of that meant a hill of beans, because the uArch’s commitment to predetermined/hinted instruction level parallelism was just a huge handful. The people writing the compilers were fighting a nearly impossible battle. Sad thing is, there were a lot of good things about IA64, had Intel taken a slightly less religious approach to the whole affair, things might have gone differently (a problem that stemmed from their abundant insecurity around being “the CISC” people, during an era when RISC was the end-all of every flamewar discussion around processor technology - they wanted very badly for IA64 and EPIC (VLIW) to be the next big buzzwords). I still have, sitting in a storage unit, the McKinely (Itanium2) HP zx6000 workstation that I was provided for that project - the company didn’t want it back! Thing is, unlike my Amiga’s, vintage Mac’s, bread-bin C= machines and my beloved Alpha box, I also can’t get excited by IA64. The reason is pretty simple: it just wasn’t very good (and, as another commenter pointed out, the IA64 bandwagon took down Alpha, PA-RISC and MIPS, all more worthy than IA64).
You hit it right on. When I first learned of this upcoming Chip it was supposed to take the best of the Alpha that was 64 Bit and at the time the fastest CPU, also the best of the HP PA RISC CPU and of corce the best of intel rolled into one Chip. Seemed like a possible big big win. But was a disater. I kind of woner if intel did this to wipe out the Alpha the PA RISC and they tried harf to get SUN and IBM in the Game. In the Long Run Intel did gain in the market as two competative chips were gone, Replaced by mostly intel.
@@jimbronson687 Nah man, the Intel people really, really believed they had the next big thing (and had bought into a lot of weird HP research-think). And a whole lot of other companies thought it was the next big cheeseburger too. The big problem though, IMHO, was that Intel's people got all religious about IA64's VLIW'ness. They also just didn't seem to get that the decoupled CISC decode penalty was increasingly a nanoscopic (literally) bit of transistor budget, and that the whole CISC/RISC debate didn't matter in an era of decoupled CISC, chip multiprocessing, SMT, etc. These were all things THEY helped bring about, but simultaneously didn't believe in. RISC still had (and has) benefits in relative efficiency, but not so much in ultimate performance, especially as the relative latency and performance delta between register file and the rest of the hierarchy of memory subsystems (l1, l2, l3, etc.) grew substantially (even weirder, x86's code compactness, thanks to being CISC, meant that they had an advantage in relative cache performance). In the end, CISC vs RISC vs. VLIW, etc., didn't matter. Meanwhile, as they were blowing it with IA64 and Spentium 4 NutBust, AMD was having one of their good spells, and delivered a lot of hurt, with the ultimate (and sustained) humiliation of AMD64 (something they're no doubt still annoyed by, every time they see a binary labeled AMD64). I was at Microsoft during the up and down of Itanium, and Intel was super-duper upset with us for throwing our weight behind AMD64. But it was simple survival for us: IA64 was never going to get out of its own way, and AMD64, while inelegant, actually worked really well (especially the long-mode sub-modes for 32bit, which made implementing things like WOW64 a snap). And just look at where AMD64 is today: Alder Lake, Zen 3, etc.... it's pretty amazing. Hell, I used to hate x86, but I've grown to have great affection for the weird little ISA "that could". Against all odds, for 46 years, x86 has proven the eternal winner, even though its own creators tried to kill it three successive times (iAPX 432, i860 and IA64). The only thing weirder is that the other big ISA started life as the CPU in the Acron Archimedes. You gotta love this business.
@@smakfu1375 What are your thoughts on The Other Big ISA? It's doing very well in the likes of the M1 processor, but looks absolutely nothing like ARM2. Could you even really call it RISC when half the silicon is a bunch of special purpose heterogeneous cores?
@@kyle8952 Well, M1 as a "package" is a whole bunch of stuff, but the general-purpose CPU cores are ARM (AARCH64) load-store RISC. I don't tend to look at the whole die as "the CPU"... then again, I also consider general purpose cores to be "the CPU's", and the rest of it is other neat stuff (GPU cores, etc.). Modern processor packages (whether monolithic dies, MCM's with chiplets and tiles, etc.) are hugely complex things with tons of strange and interesting stuff going on. Take even the lowly BCM2711 in the RaspberryPi: the VideoCore VI is actually primary processor responsible for bootstrapping the system. It runs an embedded, on-die ROM stored hypervisor-like RTOS that, among other things, provides integrity for the VideoCore processor. This makes sense, given the BCM2711 started life as part of a family of SOC's for set-top boxes. The "color palate" test you see on boot is the VideoCore processor initializing. So, what looks like a simple SBC with some ARM CPU cores and an iGPU, is actually a lot stranger and more exotic than it appears. I like AARCH64 quite a lot, which was my starting point for ARM. Apple's M1 is really impressive, but they're also able to make the most of the ARM RISC ISA by nature of how closely coupled the "SOC" package is. Overall, it's nice to finally see real competition in the industry, as things are really interesting again (just look at AMD's new Ryzen 6000 series mobile processors). Hell, I even witnessed an online flame war over RISC (Apple M1) versus CISC (Ryzen 3), which took me right back to the good old days of usenet.
I was doing research at a university that had an Itanium-based supercomputer. It produced a neat wow factor when you had to mention it in papers, but the thing was a colossal failure and I was able to outperform it with my power mac G5 at home for most things. Probably cost millions of dollars, and certainly tens of thousands a year just in electricity and air conditioning.
@@8BitNaptime Well, get one while you can. They are probably still in that phase where you can pick one up real cheap before they become "retro". I swear they will be discovering them in barns on American Pickers soon. Not a lot of people willing to overhaul an entire building's electrical and HVAC for 64 weak CPU cores anymore.
I can confirm that, despite the fact that HP-UX on Itanium were not an astounding success, many companies with links to HP had entire server clusters based on that architecture. My second job in IT (around 2007) in the TLC sector started with a migration of lots of custom software (written in C++) from version 11.11 of HP-UX to 11.31. However, many years later (around 2013 I think), I was asked to migrate the same software from HP-UX on Itanium to RHEL on Intel. I still remember fighting with the discrepancies not only of the two Unix flavors, but of the two target C++ compilers (acc vs gcc), each one with its own unique quirks - eg. slightly different handling of IPC ("System V compliant" my foot), very different handling of read-only memory regions etc. Fast forward to my current job: last November I started working in the Banking sector. Guess what was my first project? Migrating a server hosting different batch processes (with a mixture of pure shell scripting and Java applications) from HP-UX on Itanium to Linux (of course, lots of different basic Unix commands behave differently, e.g. ps). Fate is an odd mistress, indeed...
I only had customer on hp-ux who made the change from pa risc to Itanium, but most my hp-ux customers ran the same erp system which developed a Linux port when x86-64 came along, so they migrated to RHEL on x86-64. The one who stayed with hp-ux and Itanium had there own custom in-hpuse application so I think decided Itanium was cheaper that the porting effort.
Reading that all was very nostalgic. I remember the first time I booted an HP-UX system (PA RISC, I’m sure) I recall reading something like “IPL HP-UX” and all this time later I have the same reaction - laughing and thinking, “You go way over there with that IPL stuff, you’re an open system OS”.
Want me to pass a deck of Cobol cards? (Not me, but a poker buddy*.) It pays the bills. * Serious. I cannot support your cobol card needs. I have only fortran cards. *_If I say "column," Geeks say what?_*
I remember Itanium and the hammer blow that was AMD64, but I didn't know much about it and the deals that were done at the time, nor did I know it continued as a zombie for as long as it has. This was a FANTASTIC video that really answered some questions I had and was well worth watching. Great work!
Intel up to that point was suing amd for a number of years trying unsuccessfully to block amd from making x86 cpus. When itanium became itanic they were forced into settling with amd to licence x86-64. As part of it amd got access to some of the x86 extensions intel had developed. Of course amd would move the market again when they launched the first dual core chips, something we take for granted now. The other big failure of intel was the pentium 4 of course, which amd trounced performance wise. Intel had previously tried to replace x86 as well with what became the i960 (and another I forget), which ended up being used in printers a fair bit.
@randomguy9777 they had lawsuits against amd and had multiple times been trying to prevent amd making x86 chips. The legal battles only ended when intel realised itanium was not going to replace x86 how they wanted and they cross licenced x86-64 from amd to end the legal disputes. The only reason amd had a licence to produce x86 chips was because in the original IBM pc they insisted on multiple suppliers for the cpu. By the time the 386 came around however intel had stopped sharing its design with amd, so since then amd have had to design their own chips compatible with intels instruction set.
@randomguy9777 they won't bother suing now as they need those x86-64 extensions and amd own the patents for them. As the market has pretty much transitioned towards 64bit only suing amd would be rather stupid at this point. If anyone might be sued by intel it is potentially nvidia who wants to cut out amd and intel from their data centre compute systems and they use arm. Depending upon how they implement it, they could be sailing close to the sun with x86 translation.
epyc 500 core cpu coming in two to three years will leave intel in the dust. seems we are seeing a repeat of intel servers being gigantic compared to amd servers while similar work load performance. 10 chips to compete against 1? possible. intel why you have the money the market share the indistry clout!??
Some would say that Itanium was an Intel executives attempt to eliminate pesky x86 competition IOW AMD through a new architecture for which AMD had neither a license nor clean room implementation. Ironically AMD handed Intel their asses with x86-64 and it must have been humiliating and humbling for Intel to have to approach AMD for an x86-64 ISA license. Hopefully the Intel executive who green lighted got fired from this.
I don’t think Intel needed AMD’s permission to adopt AMD64. They had a mutual perpetual licensing pact as a result of settling a bunch of early lawsuits.
I had a different hypothesis. Intel's compiler suite was widely known as the best optimising compiler by the early 2000s. They curiously took two independent stabs at making a dumber, more straightforward, more pipelined, deeper prefetching processor, that require higher code quality to exploit anywhere near fully, and the angle might have been the long game, well, they have the best compiler tech, so as long as they can penalise the other processors artificially (which they did repeatedly do), and optimise better for their own, they can win compiler and processor markets at the same time and create a soft walled garden. The other besides Itanic was the Netburst. It's not quite enough of a pattern to call it conclusively though. But it is enough to suggest at least a sort of strategic arrogance, where they thought they could corner the market by making a worse product.
"Hopefully the Intel executive who green lighted got fired from this." Sadly it was the other way around. Two or possibly more of the senior engineers went to the CEO and explained why it was not going to work. The CEO fired them on the spot.
@randomguy9777 You've got an interesting idea. Indeed it stands to reason that while until Pentium Pro, they were catching up to state of the art as established in academia and by CPU designers in other areas, such as specific workstation and server oriented processors etc, at that point, they did catch up, and the pool of ideas was exhausted. Before P6, there was a large pool of things you'd want to build but for which the semiconductor budget just wasn't even nearly there in a PC. I think a few things that came in since were opcode fusion and perceptron branch predictors, with the rest of the IPC improvement being thanks to better underlying semiconductor tech and resulting fine adjustments rather than revolutionary insights. Academia was enamoured with VLIW in the 90s and it was considered a future next big thing; so it makes sense that they'd shop for their strategy there. But the compilers that could make good use of such an architecture never happened. Maybe actually now's the time with things like V8, LUAjit and PyPy, which can repeatedly recompile the same piece of code to profile guided specialise it, or maybe they can provide a key insight that can make the approach practical in 10-20 years. I suspect it's still unclear how to merge these things with C-based compiled software ecosystem.
Place I worked at, we were porting software from 1.1ghz PA8900 Mako's running 11iv2 to 11.23 on in theory 'nice' (tm) 4/8 socket I2's. Oodles more cores, oodles more Mhz and internal memory bandwidth (lets forget the discussion about the 'generous' amount of cache on the Itanium .. because maybe she didn't lie and it apparently doesn't matter). Sadly, it all ran at best about 1/4 the speed of the older PA system for some sections of the code where it had a good tailwind (other sections much worse..), irrespective of single or 120 concurrent threads. Four of HP's optimiser engineers were hired at an eye watering rate for 3 weeks. In the end the result was "Wait for the newer better compiler and it'll magically get better ..delay the HP product 8 months". We waited (no choice), it didn't happen, but on the plus side.. they were able to afford lovely 5-star holidays in exotic places that year. It was embarrassing that the old 3-4 year old pentium-3 server and the older dual processor IBM pSeries workstation (275) performance out ran it also. It was all just sad and it killed three of my favorite RISC architectures.
Its good to hear from other people with real world experience of Itanium's performance issues. It took so long for improved compilers to really get there for Itanium, and even then x68-64 had moved beyond the practical performance levels Itanium achived. The lost of some really great risc architectures was the worst part of Itanium. I really liked running HPUX on RiscPA and Alpha was great as a next step up from x86 Linux.
Losing Alpha is the big one for me. For a good few years in the late 90's Alphas were the fastest CPUs out there. And then Compaq bought Digital and the PC sellers didn't know what to do with it. At least some of the engineers went to AMD and worked on Athlon (which used the EV6 bus).
@@IanTester I was not happy about Alpha being discontinued. I mostly used it with Linux, and it was a great option when you needed a machine with far more power than x86 could muster. We used most of them for database servers. Also for the odd customer who had far too many exchange users, and yet wanted to keep everyone on one server.
Itanium was a truly excellent idea... just not for CPUs... Should have been a bare metal GPU and DSP standard - a lower level than CUDA / OpenCL etc.. Proper STANDARDISED assembly code for GPUs and CPUs.. Could have replaced SSE with a far more powerful parallel processing framework.. Some X86 functions (tight loops) could have been rolled out to the Itanium SIMD unit, with a much simplified X86 core.. Now they're stuck with crappy SIMD and CISC on RISC X86 / X64 in each core... -- Could have had 32 x64 cores (minus SEE) and 1 Itanium core ages ago, giving superb CPU performance and bare metal GPU-compute programming power. Could even be 2 separate chiplets for serial X64 and parallel processing, with their own dedicated and shared cache chips for many combinations rapidly rolled out at all price points.. -- Current APUs are cursed by SSE'a increasingly CISC approach to parallel programming. Extreme architecture bloat, yet no standard low level GPU or DSP architecture - the role Itanium could and should have been rolled out for, along with far simpler, non-SIMD (non-SSE) X64 cores and better scheduling and cache usage between cores for optimum load sharing...
@@PrivateSi I wonder if it might be better to have a new module for "do a=a+b on 10k items starting at address X" type instructions and use microcode to convert SSE into instructions for it instead.
On the other hand, it was the initial delay of the iAPX 432 project (started in 1975) that made them hire Steve Morse to design the "temporary" 8086 in 1976-78. And the bad performance of the iAPX 432 compared to the 80286 in early 1982 made them decide to develop the 80386, a 32-bit version of the 80286.
Could not agree more. I was a summer hire on that processor development and knew even then that it was way too costly (read "slow") at the time. The only thing good that came out of it was developing another CPU team in Oregon where the Pentium Pro was developed.
@@randyscorner9434 you wouldn't happen to know anything about High Integrity Systems' involvement with the 432? I've got their 'clone' multibus system sitting behind me right now, and information is scarce! It's basically a rebadged Intel System 310 with custom multibus cards replacing a few of Intel's cards. And yes you are right, without their efforts, it is arguable there wouldn't have been a Pentium Pro at all!
A few things - I was not yet working at Intel in 1999 when AMD-64 was released, but I was working at a BIOS vendor and there is a tiny detail that conflicts with this video. Two BIOS vendors were already booting x86 with EFI by 2000 and we had EFI customers in 2001 (namely a Nokia x86 clamshell set-top box). Thusly, EFI BIOS has been a continuous selling "BIOS product" since the beginning. There was never a need to bring it back, it was always there with a non-zero customer base. Mainly AMD was the hold out. To be fair, Phoenix BIOS was also a hold out. But primarily it was AMD that refused to use or support EFI boot and so any customers using both Intel and AMD would simply use legacy BIOS so they did not have to buy two entirely different BIOS codebases. UEFI then was setup specifically to detach EFI from Intel so AMD could help drive it as an open source project not directly attached to Intel. When AMD finally got on board with UEFI - legacy BIOS finally started to lose customer base.
I used to work for DEC and remember getting my hands on an Alpha 150MHz workstation for the first time in the early-mid 1990's. It was was running Windows NT for Alpha and had a software emulator built in for x86 windows support. The first thing I did with it was to try to play Doom2 in the emulator window, and it actually ran - and much, much faster than my personal Pentium computer could do. It was shocking how fast it was. It also freaked me out when I looked at the chip and saw that the heatspreader actually had two gold plated screws welded on it to to bolt on the heatsink. The Alpha was a beast!
@@RetroBytesUK Sadly, almost all of them. When my plant closed, I moved to NH where DEC had an e-waste recycling center. They would send all obsolete/ overstock systems there for dismantling and destruction. Every day, truckloads of VAX 8000s, Alpha Servers, DEC PCs, etc. Would come in to get ripped apart and sorted. We routinely filled C containers (48"×48"×48" cardboard boxes on pallets) with unsorted CPUs - x86 pentiums, Alphas, everything in between - and other ICs to be bought by recyclers. It was all an enormous waste. Most of the systems were less than a few years old and in great working condition. All HDDs were unceremoniously dumped in a bin and they didn't even save or test the PC DRAM modules - everyone just threw them in with PCB waste. Then Compaq bought us...
If Intel had acquired Alpha for the purpose of developing it rather than killing it (they bought the IP from Compaq and then buried it), and had put a tenth of the R&D money they sunk into the Itanic, computing today would be decades ahead of where it is now.
I still remember when our department at university got its HP-Itanium-1 workstation: I ran a few application benchmarks… 20% slower than my AMD-K7 PC at home:)
I worked on the SGI version of the IA64 workstation and porting/tuning 3rd party apps. I would sit in on the SGI Pro64 compiler team meetings and one time they called in the Intel team to clear up some issues. The Pro64 complier could handle the multiple instructions queues and all that just fine, given that it could predict what the CPU would do. It had an internal table of op-codes and various execution times (I think it was called a NIBS table) and on the MIPS R10K CPU, there was a precise mapping of those, a core of the RISC architecture. There was a whole family of IA64 op-codes that had variable execution times. The compiler guys asked Intel if there was a determinant execution time for those op-codes. The Intel engineer proudly stood up and said "why yes, there is". Then he went on to explain that this or that instruction would take something like 6, 8, 10, or 12 cycles to execute. At that point, the compiler guys just about got up and left. In the RISC world, it's typically 1 or maybe 2 clock cycles/instruction (fixed). In the Intel CISC_trying_to_be_RISC world, there's a formula for the execution time. Faced with that, the compiler team had to use the worst case execution times for every one of these instructions and pad the rest of the pipeline with NOPs which killed performance. On the MIPS CPU, they could jam the instruction pipelines nearly full and keep every part of the CPU busy. On Intel, is was way less than that.
I don't get how there weren't red flags when making this thing. Lets assume can assemble a program well enough in assembly, was there any way to determine the cycle time? Was this all hardware based scheduling?
@@warlockd I wasn't involved in any of the design part of the Itanium, just working with the Merced based development systems, 3rd party developers and our Pro64 compiler team. My guess is that Intel was somewhat arrogant and also didn't really grasp the core principles of RISC architecture. It seems like they had some instructions that were inherently CISC and figured if they waved a "RISC wand" over them, that was good enough. As I recall, Intel thought that having some sort of predictable cycle time for a given instruction was good. But that "predictable" cycle count wasn't a fixed number, instead it was a lookup table of clock cycles vs. other machine states. So yes, someone coding assembler could figure out cycle times given a good understanding of what was going on in the CPU. But the Pro64 RISC compiler wanted a table of "opcode"- "cycle time". I think the fix, in the end, was that the compiler guys had to put in worst case cycle times and then pad the unused slots with NOPs in the code generator back end.
Not having an implicit or compact no-op format has a huge blow to itanium. And the whole "lets hide the real latency" was to prevent the need for a re-compile when the micro-architecture was updated.
Then 10 years later transistor count for other stuff than core cpu logic like cache and branch predictions surpassed this, today its probably 99% or higher. Still x86 is not very power efficient.
Pretty much. All of the promises of RISC turned out to be lies. Making a CPU with fewer and less complex instructions didn't actually make it faster. It _did_ make programs significantly larger, and compilers absolute voodoo. (I do still have a white-box [NT] alpha. one of the later / last models after they'd worked out how to keep them from melting.)
I recall Itanium being some hot stuff, so unreachable, so unseeable. We had Itanium servers in Novosibirsk State University, but not for any usual people. And now I find it marked "retro", causing cognitive dissonance. If you are interested in VLIW architecture, Russian Elbrus e2k is also VLIW, but this is an alive project. With better x86 emulation, aided by CPU ISA adjustments. With Mono, JVM and JavaScript JIT. With C++ compiler although not bleeding edge, but still passable
I spent about 18 months porting software to Itanium Windows. Nearly all of that was digging out and reporting compiler bugs, of which there were lots. Eventually got it working fairly well, but few customers took it up. I gradually transitioned onto porting the same software to AMD64 Windows. When we shipped that, I got a call from my Intel contact asking how much work AMD64 had been relative to Itanium. "Under 5% of the work of Itanium." "Oh..."
1:20 Very peculiar that HP came to that conclusion (about a limit of one instruction per clock cycle) at around the same time IBM introduced its POWER machines, which were capable of _superscalar_ operation -- more than one instruction completed per clock cycle. In other words, the very assumptions of their design were flawed from the get-go.
Yup. The VLIW assumptions, especially that dynamic instruction scheduling was just too complicated for hardware, were broken from day 1. That seemed pretty apparent to a lot of the CPU people even back in the mid-to-late '90s. The instruction density on Itanium (my fingers *really* want to type Itanic) was terrible, and the s/w prefetch needs didn't help much. There were math kernels that ran tolerably well on Itanium, and some DB code did OK, but it was ultimately pretty pathetic: very very late, and very slow.
Was not by chance, out-of-order execution was the next big thing on CPU's and everyone was trying to implement it. Alpha had a lot of influence on CPU architecture, it was the first in a lot of things (including the best implementation of out-of-order execution till then) and everyone tried to surpass it. Check the timeline, all this NEW CPUS came after the Alpha and it's success. I was there at the time and had the pleasure of work on one.... It was night and day. I still remember compiling samba for Sparc took almost 20m. When i tried on the Alpha, took less than a minute, (I thought the compilation failed).
Great video knitting together all the OS & CPU craziness of then. On the AMD64 you mentioned as one of Itaniums nails in the coffin, I remember just how toasty their TDP was (hotter these days of course)
Thanks for this. We used to run our telecoms applications and database on HP-UX on PA-RISC machines (please not "risc-pa"), some K-class servers, then L-class and we wondered if the new Itanium systems would be our future hardware platform. However the next step for us was the rp4400 series servers and after that we migrated to Red Hat Linux on the HP c7000 BladeSystem with x86-64 cpu architecture.
I remember this thing. I dealt with an Itanic (love that name) machine at work for a bit. WOW that thing was slow. The only slower machine I had access to was a Pentium 4. Like the Pentium 4, the entire architecture was complete garbage from the ground up. There was no fixing it. These were Linux machines too...and Linux had the best support for this thing. The Pentium 3 systems ran rings around the P4 and Itanic. Any architecture without out of order execution is DOA.
And any architecture with out of order is likely insecure (expose some sort of side channel somewhere). Really the promise to vectoize most loops was pretty impressive. Unfortunately the code generated could only vectorize some loops, and the code size would choke out all your caches, then you'd miss a load (even with reasonable pre-fetch) and then the whole damn thing would choke. I'm not conviced the problem is entirely unsurmountable, and in the tasks where it's well suited VLIW does quite well.
@@WorBlux The other possible CPU architecture is Decoupled Access Execute, which tries to adapt to variable latency on memory loading by splitting the CPU into an "addressing" part and a "math" part, and passing data through queues. But the complexity of doing this in practice balloons up so it doesn't save much over just doing another Out-of-Order CPU.
I remember comparing an Itanium (OpenVMS) server to a Core 2 server (Linux) running the Cox-Ross-Rubinstein formula written ic C. That was around 2010.No matter how hard I tried to tweak the C code, the Itanium could not outperform the x86 (it was about 40 per cent slower). And the x86 server was likely a fraction of the Itanium server price. The compiler just never delivered on Intel's promises and projections. Companies bought Itanium servers because of their legacy applications written for the VMS OS (Vax, Alpha). Or because of their Mainframe-like enterprise features (hot-swappable CPUs and whatnot).
This is easily the most coherent and straight forward evaluation of the failures of EPIC and Itanium I've seen on TH-cam. We had a number of Alpha boxes at Dupont for evaluation and were very impressed with the kit, only to be told months later that DEC/Compaq were abandoning the platform. Damn you Intel.
@@MultiPetercool Thanks for saying this - put a smile on my face. I was the technical architect of HP's PA-RISC -> Itanium software migration effort. HP handled backwards compatibility via dynamic translation (i.e. translating PA-RISC instructions into their Itanium counterparts on the fly while the program was "executing"). I tried to convince Intel to do the same, but at the time they didn't trust a software-only solution. Instead, Intel just embedded an older and less capable x86 processor front end on the Itanium die. Though, I think I heard that later they also went with a software solution. In my mind, two of the biggest reasons that Itanium failed is that we failed to recognize just how resistant people would be to recompile their programs into native Itanium code. If you didn't recompile, you wouldn't get anywhere near peak performance. Secondly, and I say this with some shame because I was quite guilty of it, we grossly overestimated how much instruction parallelism the smart compilers would be able to find. Much of our early performance analysis was done on loopy computational code that had regular code execution patterns. If you know where the execution flow is going, you can go fast by computing in advance and overlapping parallel execution. But, that kind of execution is limited to a few markets (which is also part of the reason Itanium lived on for awhile - for some computational tasks it did really well). For general computing, Itanium-style VLIW execution proved to be much less effective than hardware-based super-scalar solutions.
@@Carewolf Well said. Backwards compatibility is important for the PC, and it is unlikely that AMD64 would have taken off so well if it didn't have backwards compatibility, but that is only in that market with a giant install base and tons of proprietary programs that could not be recompiled for new architectures (plus bad coding practices of the time that made it hard to really port to a new system). Today we see all sorts of processor architectures with limited to no compatibility with other architectures, and we are getting closer to most user level code being run in a JIT.
I used to work on these things some years back. The chip was incredibly elegant, but it put a *HUGE* onus on compiler writers to properly optimize the code.
Great vid! I started at HP servers in 2004 (x86 servers, internally it was called ISS [Industry Standard Servers]), and I quickly figured out from various hallway chats that I lucked out by being in ISS. The itanium division (internally called BCS [Business Computing Solutions, I think]) was always spoken of in a cringey context. Folks were constantly jumping ship from BCS, coming to ISS or anywhere they could, while others were getting let go left and right. I was just a kid at the time and didn't really understand the details of why BCS was on life support. I was never really a part of all that. But after watching this video, some of those old conversations I over heard in the break room make more sense to me now. :)
Until two years ago I worked at an organization involved in a huge project to port a critical piece of software from openVMS (running on Itanium) to a more modern x86 Linux platform. This was mostly influenced by Itanium clearly hitting the end of the road and openVMS at the time not yet being able to run on x86. It is worth noting that all this is no longer owned by HP as they sold of these parts of their business. Anyway, millions of line of openVMS pascal code where first ported to C++, after that the entire thing was ported to Linux. Some of it with native system calls but for much of it a translation layer was used that would translate openVMS specific system api calls to equivalent Linux calls. I personally had not worked with openVMS before that and it was an interesting experience. Even more so as it approaches some things we take for granted with other operating systems in a completely different way. Directory structure for example and disks in general. Would be worth a video in itself I imagine.
Just stumbled on your channel and now wonder why I haven't seen it until now 😊. Very well made and informative video, will definitely subscribe and watch your other content! As for Itanium - yup, I remember when it came out. I was in high school back then, so didn't have a hope in hell of ever using an Itanium machine, but definitely remember the hype and then the massive thud that happened when Itanic finally launched 😃. Also remember seeing some Itanium processors for sale on a local PC hardware site for absolutely absurd prices and giggling. Amazed its support is only ending now, thought it was long dead.
That's very nice of you to say. Thanks for the subscription. I think Itaniums continued existence comes as a surprise to most as it is only used in a very niche highend market. I think given a free choice Intel would have killed it years ago.
Thanks for the video! You forgot to mention the HP NonStop product line. That was moved from MIPS to Itanium processors, only to be moved to x86 processors a decade later. The Itanium architecture was sound, but at Intel it was always on the backburner. The x86 development was always higher priority and Itanium was always based and produced on previous generation nodes.
Tandem and NonStop are probably a interesting enough to get their own video at some point. There are some things I decided to not really reference in the video NonStop, and HP Superdome (I also want todo a video on Convex) as they felt like they may have taken the narrative of the video too far off on a tangent. Its a tricky balance doing these videos on what you put in a what you keep out. With this video I was trying to keep it a little more succinct, as I thought I could convey the history of Itanium fairly well in a 10min video without skimping on the details, but that meant not looking at some of the more niche product lines that fell into HP's hands. I don't always get the balance of what goes in/out with these things right. Some times in hindsight there are extra details, I wish I had focused on a little more or included, and other times I feel including something in the end pulled the narrative of track a bit. Some times I feel I've gotten a call about right and avoided going down a big rabbit hole.
@@RetroBytesUK the SX1000 and SX2000 Superdomes could have either PA-RISC or Itanium cells. You could have both in the platform, but any nPar had to have cells with the same type of CPU. I briefly worked on HP gear in the mid-late aughts (as a Sun FSE, strange support contract for a customer who shall remain nameless) Of course, with years of supporting SPARC/Solaris systems, HP-UX, the Superdomes and N-class servers were a different animal. HP-UX itself was easy enough to sort out as it drives pretty much like any other Unix. The Management Processor was different from any of the various flavors of LOM, ILOM, RSC, XSCF and of course OBP, the Open Boot PROM, essentially what does the BIOS like function for a SPARC based machine. I'm sure I'm leaving out some as the E10K, Sunfire E12K, E15K, E20K, E25K all had SPARC based Service Processor management machines either external to the cabinet (SSP for the E10K), or built in in the case of the Sunfire machines. The Exxxx machines had MicroSPARC System Controller boards instead of a machine running Solaris to manage the platform... (Okay, I'm in the weeds, living out the glory years, sorry.) Only did that for two years until the customer let HP back into their data center. Of course, the same customer still had a Sun E10000 with 250MHz processors until a year or two ago. I'm not sure they had many folks who knew how to administer it. Anyway, thanks for the trip down memory lane. Having to figure out if an HP cell board was a PA-RISC or Itanium board. Didn't really have to go through that with the Sun gear as they were all SPARC. Just a question of what clock speed for any replacement CPU. All I know about Tandem is I want one of their promotional coffee mugs with two handles.
De-Lid was the hardest I’ve tried to do. Itanium 9260. It was successful. Only lost one Decoupling cap. It cleaned up nice with 15k polishing. Now I need to get some etching done. Nice video.
It's worth watching the Computer History Museum's interview with Dave Cutler where he more-than-suggests he encouraged AMD to continue their x86 to 64-bit efforts and that Microsoft would be extremely interested to support it.
From 1989 to 1998 I worked for Unisys, DEC, Sequent and HP in Silicon Valley. I left DEC shortly after Alpha was released. Part of reason I left HP was because I knew Itanic would be a loser. The extra $30k a year offer from Sun was too good to pass by. 😉
I’ve seen ‘em all. ZILOG, Intel, Motorola, Cypress, Nat Semi... I worked for Plexus who built the prototype systems and boards for MIPS. Plexus was founded by Bob Marsh and Kip Myers of Onyx Systems where a young Scott McNealy ran manufacturing.
The sad thing is it was a good idea on paper. They just grossly underestimated how horrendously difficult it would be to determine what instructions could be run in parallel at compile time, probably because it would sometimes depend on the result of another instruction, something you could only know at runtime.
Afaik, the real killer for Itanium and similar architectures is that the compiler has no idea which memory loads come from L1 cache and are going to load fast, and which ones come in slowly from L2 etc or Ram and need to wait as long as possible while running other operations. As far as I can tell this is 90% of why CPUs have to be out-of-order to perform well.
It wasn't a good idea on paper either. Just assume your compiler always figured out the perfect 3 instructions to run in parallel for your code, what happens if the next generation of CPUs cannot execute 3 but 4 or 6 instructions in parallel? Then code compiled for the first IA-64 version would still only run 3 in parallel and never make use of that new ability, as the compiler would have to re-compile and re-arrange all instructions to now figure out which 4 or 6 instructions can run in parallel. Without doing that, old code will leave 1 or 3 instruction slots totally unused and never unveil the full potential of the new CPU. With current CPUs on the other hand, the CPU itself figures out which instructions it can run in parallel at runtime and when the current generation is doing that for up to 3 instructions at most, the next one can maybe do it for up to 4 or 6 instructions, meaning without re-compiling anything, existing code will immediately benefit from the new CPU capabilities and have more instructions run in parallel. If the CPU makes that decision, a new CPU can make better decisions but if the compiler does it, you won't get a change in decision without a new compiler and re-compiling all code. And with closed source, abandoned projects, lost source code and tons of legacy compilation dependencies, it's not like you could just re-compile anything whenever Intel would release a new CPU; also you'd then need multiple versions, as the one optimized for the latest one would not have ran on the previous one.
@@xcoder1122Having instructions grouped by 3 is not totally useless even if you had a faster later generation CPU and had to load 2 or 3 instruction blocks every cycle (for a total of 6 or 9). You still get the code size penalty but it's not the end of the world. Your dependency tracker and issue queues get marginally simpler. IMHO what really did itanium in is the complexity from all the extra mechanisms they had added in - predication on every instruction, special alias checking and speculative memory loads...
Ah GCC. Ol reliable. Not surprised that they managed to squeeze Itanium's performance better than Intel could. The demigods that develop GCG are truly a mysterious and awe inspiring bunch.
I would love to see more content like this. As a collector I like to see these architectures get talked about. With the exception of Itanium, I have computers based on nearly every one of these. SPARQ, Alpha, PA-RISC, VAX, MIPS, and even a variation of a PDP-8E.
How odd that he started out to list Intel's failures including SPECTRE, but then ignoring that Itanium, due to it's lack of branch prediction, is one of the few architectures *not* vulnerable to SPECTRE.
We had acquired an Itanium workstation for development purposes. The boot process was incredibly slow. We never really got started on our development since AMD64 was announced and really took off, much to Intel’s chagrin.
I remember when Intel was previewing the Itanium. I didn't know the specifics. I just thought it was basically a 64 bit x86. But it seemed that no one was adopting it. Now, I know why. This was some time ago, and I've been in the business before IBM made their 1st 8088 (original) PC. I go back to 8-bitters running CP/M !
Intel designing Itanium: "how do we give the software engineers the steepest possible learning curve?" AMD designing x86-64: "how do we give the software engineers the smoothest possible learning curve?"
I think that AMD got it right due to pure luck. When Intel pushed IA-64 with HP, AMD had nothing to challenge because AMD was a pretty small company. They did not have money to follow suit and embarked their own AA-64. They had no choice but to bet full house on X86 and extend it to 64 bits.
On paper I thought the EPIC concept was a great idea, to explicitly schedule instructions to maximise IPC throughput. I remember seeing a hand coded assembler routine for solving the 8 queens chess problem which was beautifully efficient. I really liked the rotating register windows concept for function parameters and local variables. But the dependence on compilers to exploit this was grossly underestimated and there was a use case mismatch. I believe they (HP initially) designed this to run workstation (scientific) code efficiently where the code blocks (like in Fortran) are much larger and more locally visible to the compiler to provide much more scope for the compiler optimisation to shine in its EPIC optimisations. But in the boring reality of business and systems code, the code blocks are much smaller, the data structures are much more dynamic and the opportunity to exploit parallelism up front using compiler/(feedback driven) analysis was far less than anticipated. Thus regardless of the silicon realisation (not great in V1 Itanium), the combination of compilers, the kind of code they had to optimise and the silicon was not going to shine. So when AMD-64 came along as a very elegant compromise that used dynamic micro architectural techniques to efficiently run otherwise difficult to statically compiler optimise code, AND ran 32 bit x86 code efficiently the writing was on the wall. So I'm sad because it was an interesting architecture for scientific Fortran like workloads, but not modern ones.
In 2023, I still use an OpenVMS system based Itanium2 (rx2800-i2) every day. I also use CentOS-7.9 on x86-64 (DL385p_gen8) every day. installing Python-3.9 on both, then comparing the results, can be very instructive (the x86-64 system is much faster). That said, Intel's main mistake was trying to isolate two markets (16/32 bit for home and small office; 64-bit for server and large business; who remembers the 80432 chip which was shelved before Itanium?). Anyway, 64-bit only chips became irrelevant when AMD added 64-bit extensions to their implementation of the x86 instruction set.
In retrospect, this may have been more of a New Coke moment than an outright disaster. All those competing high-end chips falling by the wayside as companies bet on Itanium gave Xeon a chance to be taken seriously.
Between 1995 and 2005 I worked for a company selling used IBM RS/6000, HP9000 and Sun Microsystems UNIX workstations, servers and spares. Some of the larger PA-RISC HP servers were my favourites as I loved their amazing CPU heatsinks and giant fans, which on start-up sounded like a Vulcan bomber taking off. I think I still have a D-class server and boxes of K-Class boards in my garage. When the Itanium systems started to appear, it all got very boring from my perspective. As an aside, it was always PA-RISC and not RISC-PA as you have said a few times in this video.
"None of Intels' other mistakes can hold a candle to this one." He said... Not knowing that years later, Intel would reply with "hold my beer" and release an entire line of CPUs that brick themselves.
The Itanium did perform really well... when one compiled purely for ia64 and *not* x86. The poor performance came with Itanium's implementation of x86 compatibility layer. To stay backwards-compatible with old x86 32 bit code, it had to emulate it. Translate the x86 code into ia64. That was never the intention to be the main way to run it. Yet the software developers did it anyway. And then they complained. Developers didn't want to move to a pure ia64 environment as it was scary. AMD's x86_64 extension is backwards-compatible with 32 bit x86 with no performance loss. Intel realised in the end that this backwards compatibility is much more important than they anticipated, and choose to license AMD's x86_64 for their future products. That is where we are today. Reason to why Intels own 64 bit attempt failed: The software industry was too stubborn to move to a new architecture. AMD knew this of course.
No. I own an rx8640. Giant itanium machine. 16 sockets. It's dog slow. I run gentoo on it. Everything compiled natively with the right flags, etc. It's slooow.
@@detaart Slow compared to other machines from 2003? Intel core architecture was launched in 2006 so that itanium machine would have been competing against netburst based xeons.
"The Itanium did perform really well" - On certain code, sometimes. Not as the general case even for native code. IA64 would have done better if the x86 emulator have never been expose. You need it for some drivers of the time, but it just confused the issue of who the platorm has actually for.
I love that you use the Solaris colors in your Thanks For Watching outro. I recognize them because I was low-key obsessed with purple and orange colorways for a while. 😜
You throw a ton of events and timelines in without giving hard dates in this video. I’m sitting here trying to take it in and I’m like “BUT WHEN IS THIS HAPPENING?!?”.
Here's a fun tidbit that, understandably, got left out. Intel was still developing Itanium long enough to make a 32nm process version, and that processor was launched in 2017. That's amusing to think about for a million reasons.
Intel is an interesting story. In business, if you swing for the fences, you will strike out a lot. The shareholders won't like that. Of course if you hit a home-run, then you're the hero. I think with Itanium, we may see those concepts come back at some point. Modern processors have to keep their pipelines and pre-fetch components powered up and consequently the Intel and AMD processors consume enormous amount of energy. A less complex, high speed processor that runs super-compiled executables would run fast. I still think some pre-fetch hardware may be necessary though. Let's see how things evolve.
I was closely following the 64-bit-battle at the time (which was around the time of my masters degree in CS). At first I found AMDs solution a bit blunt, and IA64 in principle more elegant; but, execution matters (in multiple interpretations of the word). Poor Intel got hammered (pun fully intended). Around 2009 I had a login on a big itanium based beast of a machine with 160 or so cores and 320Gb of RAM . The performance of my software was terrible; partly due to architecture (instruction-set, NUMA), the fact that Java wasn't particularly optimized for IA64.
AMD's solution for 64-bit was the right one. As crazy and gothic as x86 is, it's still a better architecture than Itanium. Most elegant would have been just another standard RISC.
What I can't believe when I hear this Kind of Stories is that all These companies (HP, Compaq, SGI etc.) All agreed to leave their own Platform behind while the new Platform wasn't even around yet.
Costs of development were rising and market was shrinking. Plus Itanium supposed to come out few years earlier and be a bit better. That is why MIPS CPU stagnated, SGI had to do something when they saw that Itanium will not be there "in few months" or even years. R16000 had DDR, but except this it was still old SysAD, made for R4000 at 100 MHz.
3:47 Just a note that “PDP” was not _one_ range of computers, it was a number of different ranges, with 12-bit, 18-bit and 36-bit word lengths (OK, all of these were pretty much obsolete by the 1980s), plus the famous PDP-11, which was 16-bit and byte-addressable, and was the precursor to the VAX. Maybe when you said “PDP”, you really meant “PDP-11”.
@@lawrencedoliveiro9104 I thought the 14 and 16 where not really general purpose machines like the 11 but was just for automation systems. I know the 11 was the last to stay in production though, and that was the final machine in the line they where still selling. I suspect if you had a service contract you probably could still get 14 and 16 etc but they would have been NOS by the time they shutdown the production line for the 11. So I tend to think of the 11 as where the pdp line ended as it was the last one you could buy.
I worked on some pre-production Itanium workstations my company was developing. Those machines ran very hot. We had a small lab of 8-10 of them in a locked conference room. It was like each one was a hair dryer to blast furnace constantly blowing out hot air in a futile attempt to keep them cool. And it wasn't just because they were pre-production either. Other non-Itanium pre-release workstations ran much, much cooler than Itanium did.
I think Microsoft still uses the AMD64 name for x86_64. Interesting note about EFI. Does this mean that Apple started using EFI after it got created for Itanium?
Yeap the 64bit version of windows is targeting x86-64 which basically forced Intel's hand into supporting it. Your also spot on about Apple, Intel developed for Itanium, then Apple adopted it when they moved to x86, its not like they needed BIOS compatibility, and its a nice clean environment todo bootloader code for.
Actually there are two names thrown around between the two implementations and it is said they have small differences. AMD had AMD64.. Intel was EM64-T
At the time, Macs were on PowerPC CPUs, and Apple used a BIOS known as OpenFirmware. Granted, up until the iMac G3, they used a combination of OpenFirmware with the old Mac ROM, and the OpenFirmware implementation was buggy (One of the main reasons why Apple dropped support for older computers running Mac OS X). But yes, Apple used an early version of EFI for their Intel Macs. For ARM, they use iBoot (based off of iOS devices).
Intel titanium was a great success for Intel! It killed the market for other CPU’s that were being developed at the time by IBM and DEC and other manufacturers plus, it was a proof (albeit highly slanted proof) that nothing was better than x86, that no other architecture could be created that would beat the mighty X 86. Both of these were great strategic successes for Intel.
The one thing I wonder if it was left out of the story was that this time was broadly the end of the ... non-PC micro and mini- computer. Did SGI die just because they went with IA-64 for their specialist workstations, or because a PC with a some plug-in hardware was good enough? The list of "people who went with IA-64" reads like a list of people who thought "we still need something better than a Windows box" at the same time. Maybe they'd have gotten another couple years if the IA-64 rollout was stronger but by the end of that gasp the writing would have totally been on the wall? Dunno, maybe I'm totally on crack. Interestingly enough, I recently specced a new PC (that I'll not be buying any time soon) and it came out over $50k. Not Infinite Reality territory, but...
In the mid 90s, I had this triangle of love with MIPS and Alpha. We used to have Sparc and PPC visiting us for a nice cup of tea. Such fun we had sharing jokes about SCO, and well, everybody knew that Netware didn't have virtual memory, right? But then came Itanium, always wearing that eau de depression. It was the end of good times.
Wasn't there some way Intel could have figured out that Itanium would be a dog before investing so dearly in making the hardware? Looks like the answer was a firm no. With compilers being wishware, ouch. There was no reliable way to rate it without compilers. Extending the x86 to 64 bits made far more sense, with compilers for the x86 being old hat by then. Good on AMD for embracing the obvious. CISCs today are implemented internally as RISCs anyhow, which is how they beat out the older explicitly designed CISCs so well. Maybe at some point automation of compiler creation and resulting code analysis will be good enough to actually evolve architectural prescriptions, and flubs like Itanium won't be as easy to make. But this will probably run on arrays of, you guessed it, x86-64 processors. Thus, new giants in creation stand on the shoulders of old giants.
I think there was a big circle jerk of executives AND engineers in this. You got HP that has a bunch of old Unix systems that need to be migraged and you got intel who has the idea to build a new ISA, something for the future AND all locked to Intel. Your locked with Intel and HP. Everyone was looking at the horizon when they should of saw the fire starting at fab design:P
I used to volunteer at my local Free Geek. Mostly we got the standard Intel and AMD home desktop computers but ISTR we had an Itanium based system or two during the time I was there.
I think I would share your regret if I had seen that too. I could not find anything that was not hugely expensive, but it was all very highend server kit.
When I supported hp-ux I managed to buy a zx6000 workstation running Itanium 2 processors for just £240 from eBay. Now retired, nostalgia prevents me from getting rid of it as even now it still runs hp-ux 11.31 as well as Windows 2003 and Windows 2008. In the end I’ll probably donate it to the Cambridge Computer Museum.
Got handed an SGI prism that was discovered under a desk at work, with all of the accompanying software and manuals. Free to good home. Not the greatest home, 'Barney' lives in the garage because I could never get the dual AGP graphics cards working on anything other than the supplied aged SuSE distro it was supplied with. What good is a graphics workstation without graphics.
Mentioning some dates would have been handy...I had to go to Wikipedia to find out Itanium was launched in 2001 (I thought it was the 1990s) and only discontinued last year.
I think every man and his dog has gotten their hands on early versions of an Itanium based machine prior to the offical launch. We had one in the very late 90's. That probably why you where thinking as the 90's was when everyone was talking about Itanium, then being disappointed by it.
That was a critical time in the history of PC computing at the turn of the century: switching from 32 bit to 64bit. Windows XP64 was coming out. I was working on 64bit drivers for XP64, and the only available 64bit HW was Itanium on a beta Intel server. Was it slow!!! To reboot it took 20 minutes! It was a painful time.
Not a mistake, it scared all the other major "high end" chip makers out of the market. SGI stopped investing in MIP, HP in RISC-PA, Compaq/DEC in Alpha. Even Sun blinked getting tempted away from SPARC after initial resistance. The rest is history. Intel won and only now after Apple has invested in a performant ARM architecture, the M1 do we see the performance advantage they all gave up for Intel's failed promise of VLIW parallel execution. All the high end executives duped by a pipe dream. Not the only architectural fiasco from Intel. Larrabee followed on the heels of this to tackle the massively parallel GPU threat.
I wonder if the Itanium might have done better if someone had used it in a games console? That parallel architecture seems like a good fit for working on vectors in a 3D game.
Code generation wasn't really ever solved for Itanium due to a number of peculiarities, and you always have a lot of flow control heavy code on the CPU. The architecture type employed by Itanium is also known as VLIW (very long instruction word) and it has been used in DSPs on and off for a long time, with varying success, but is seen as a very versatile DSP architecture. So yeah the GPU of Xbox360 would be a valid example! Also, with some intense cringing, maybe the Samsung DSP in the SEGA Saturn. And yet at the end it lost out to SIMD in the GPUs. Vectors make a weak case for VLIW, because most of the time, you apply the same operation to all elements of a vector or several vectors, such as vector dot product or matrix vector products. So using VLIW your instructions are several times as large as they have to be, and take correspondingly that much more power and logic to decode. When your operations don't neatly fit like that, but you're working with arrays of data, you can improve the SIMD weak point just with low cost data shuffling operations, so say you can extract the same element out of 4 consecutive vectors into a new vector and apply a SIMD operation to that. A couple more tricks down the line, and you have covered up the hypothetical advantages of VLIW with practical advantages of SIMD in code density, code cache utilisation, power consumption and footprint.
That's how we ended up with the CELL cpu in the PS3, which had similar tradeoffs to Itanium (high performance on numerical tasks but requiring special programming). It ended up being *not* faster than the XBOX360 despite the special programming. Sony learnt their lesson, did a 180° turn and put bog standard out-of-order x86 cores in the PS4.
At least they had slightly better nameing schemes back then, instead of this i3, i5, i7 malarkey (but which gen of iN....), the xeons aren't much better for easily identifying without searching on the intel site for the specs either.
0:12 "like the floating point bug in the very first Pentiums which recent security problems meltdown and spectre of Intel's other mistakes can hold a candle to this one however" 0:20 "no failures cost Intel so much have been so public at the time" 0:33 "if you're around at the time you remember all the hype about titanium" 1:15 "risk va" 2:07 "the only problem hp saw with this was damn expensive to develop a new cpu architecture" 2:25 "risk ba" 2:44 "after all unix systems were hp's cache cam" 3:10 "until we're really keen on developing this thing" 3:20 "well that was all seven up by hp with its risk pa" 3:23 "deck with its fax platform" 3:32 "development of titanium took a while" 3:40 "first deck had run into some really big problems" 4:06 "deck had lost way too much money" 4:19 "however compact was soon in trouble" 4:22 "it decided it would drop the development of the deck alpha and move titanium for all its big systems and also shift vax's os vms from alpha to titanium" 4:38 "so he decided that he'd move to titanium" 4:40 "this would reduce sgi's development costs you'd no longer have to spend on mips" 4:52 "son unlike the others decided it would hold back and it would keep working on its ultra spark architecture but it did announce that it would put solaris to titanium" 5:03 "even the big unix vendors for the pc platform got behind titanium" 5:08 "brought together all the big unix vendors of the time seo sequence ibm to work on one unix variant for titanium" 5:24 "deck now owned by compact renamed digital unix to true64 and announced it would be moving to titanium which of course never happened because compact went bust" 5:35 "unsurprisingly microsoft would also port windows nt to titanium as well so business-wise titanium was going well" 5:51 "then intel released the first version of the titanium cpu" 6:08 "but again its performance just well disappointing" 6:14 "until they've been working on their own compiler" 6:42 "interesting titanium started to dry up" 6:58 "with vitanium floundering" 7:03 "it felt like titanium was in limbo" 7:10 "amd produced amd 64 a 64-bit version of the xx6 instruction set" 7:21 "something that the itunes well couldn't do" 7:35 "as they could run 32-bit code much faster than titanium" 7:51 "so when the pressure from microsoft he wanted to drop titanium with their market share at risk from amd until agreed to license amd 64's instruction set" 8:07 "i'll show you the now infamous graph of intel's expected earnings from titanium and of course it's actual earnings" 8:15 "it's at this point in time that everyone assumed titanium was dead" 8:20 "however if remember that deal with hp back cursed fairy tale deal well that meant as long as hp kept buying cpus into that to keep to the road map so for the last 20 years intel has had to keep paying to develop titanium" 8:37 "it's only this year that linux torval announced" 8:58 "this is the part of the video where i'd normally start showing you around some bitter kit but not this time you'd still buy a titanium server second hand but well there's still over a grand" 9:14 "anything i could show you running on titanium would be well more fun to show you running on another piece of kit wanna see vms running well that's more interesting than alpha" 9:32 "titanium's just not a fun platform and most titanium servers are huge and well who wants to use that much storage space on titanium i mean i can fit a few more amigas in that space or an sdi box or two hell even a sun server so that's it for titanium" 9:56 "as the pcs bias was riddled with limitations" Great subtitles.
I noticed you refer to the HP RISC architecture as "risk pee ay" in a couple of videos now. It's "PA RISC", not "RISC PA". I honestly never knew how to actually pronounce it, though. If RISC can be pronounced "risk", I figured it could be "pah risk". Sounds like something from Deep Space Nine, maybe 😁
I previously worked for an electronics recycler, and I remember seeing several racks of HP Superdome systems come in for decommissioning (powered by Itanium) It was fascinating seeing how Itanium was implemented on a hardware level.
IRIX never actually was ported to IA64. SGI had their altix and prism systems that ran itanium 2, but they only ran Linux, usually SLES or RHEL. I *think* sgi had a special software package that allowed users to run irix applications in linux, but I don't know that for sure
So a number of distributors where shown an Itanium port for IRIX (I remember sitting and watching the demo). However they did not release it, and shipped Linux with it as you said. I guess they decided completing and maintaining the port was too expensive. We where not allowed to play with the demo so I guess it had alot of rough edges. Normally we where invited to have a hands on session with new products, so it was notable that it did not happen with their Itanium version.
On the face of it, reducing hardware complexity and aggressively targeting parallelism seems like a great idea. But for a hardware company to offload the (enormous) complexity into software... seems like an obvious red flag. "The competent programmer is fully aware of the strictly limited size of their own skull; therefore they approach the programming task in full humility, and among other things they avoid clever tricks like the plague." - Edsger Dijkstra [paraphrased]
The EPIC adventure proved one thing -- VLIW is simply not sustainable as a general purpose ISA. There are too many run-time hazards that even the most optimized compiler can't account for. The overarching goal with Itanium, with its statically scheduled in-order pipeline, didn't materialize a new brave world of compiler "magic" everyone hoped would lead to perpetual scaling. Intel actually knew this in far advance and with the last architecture update (Poulson) they broke with some fundamentals of EPIC and introduced limited out-of-order execution and dynamic scheduling, to ease off some burden on the software. That still didn't solve the issue with bloated cache sizes Itanium was known for, due to fundamentally inefficient instruction packing and ordering.
I completely agree with everything you've said. Compilers have improved in this area, and are significantly better than they where at Itainium's first release, but the still don't have the level of omniscience needed for VLIW to workout as Intel initially predicted it would. Without the introduction of out-of-order execution into the design Itainium would not perform now as well as it does.
It is not so much about run time hazards, it is more about CHANGING from one to next generation. And traditionally, latencies were always larger. Unroll loop or software pipeline it for one set of latencies, it will have stalls on new CPU. That is why wide inorder CPUs went out of fashion quickly. Intel never made such CPU.
Well, to be honest, Intel has made exactly 4 good CPU designs. The Pentium 2 & 3, as well as the Nehalem & Sandy Bridge Core i generations (and their die shrink subvariants). All others were garbage. Itanium yes, but Pentium 4 as well, Pentium D even more so. Pentium and before were horrible true CISC designs. The Pentium Pro fixed that, actually by copying AMDs desperation move of emulating x86 on a RISC core, but it was way too big of a die to be priced remotely sane. The Core & Core 2 fixed the microarchitecture by throwing away everything Pentium 4 did, and restarting with the Pentium 3 design, but they still ran with FSB & north bridge as memory controller, until Intel fully copied AMD again, while already licensing AMD64. Haswell was buggy and super energy hungry, and was completely discarded when the in parallel developed Skylake was finally ready, but it proved even more buggy. Both of those buggy generations were reportedly the reason for Apple to commit to ARM Macs. And after Skylake Intels screw ups left us with one xlake generation after the next. Honestly it is surprising Intel still exists, let alone outperforms other chip designers in numbers.
@@lawrencedoliveiro9104 it suffered low yields too, so was expensive. It could be very fast in the right circumstances, but the pentium mmx cpu could match or beat it in some cases.
I'd argue that the 486 was a good design (the 50mhz version aside! ). 486, pentium 2, pentium 3, core2 (the first core wasn't all that great but better than pentium 4!), 2nd gen I, 6th gen skylake, and now alder lake were all pretty good and the 386 was a big leap at the time too. As for the duff ones, well, p4, i960, (another one that was supposed to replace the x86 cpus), atom, 8th-11th gen. Probably more as well as the subject of this video of course.
nice, I own 3820 and later build a second system with 3930 on LGA2011 and I am super happy still today. seems I've been lucky with picking sandy bridge. they are now over ten (!) years old and are still up to any task.
Itanium was a truly excellent idea... just not for CPUs... Should have been a bare metal GPU and DSP standard - a lower level than CUDA / OpenCL etc.. Proper STANDARDISED assembly code for GPUs and CPUs.. Could have replaced SSE with a far more powerful parallel processing framework.. Some X86 functions (tight loops) could have been rolled out to the Itanium SIMD unit, with a much simplified X86 core.. Now they're stuck with crappy SIMD and CISC on RISC X86 / X64 in each core... -- Could have had 32 x64 cores (minus SEE) and 1 Itanium core ages ago, giving superb CPU performance and bare metal GPU-compute programming power. Could even be 2 separate chiplets for serial X64 and parallel processing, with their own dedicated and shared cache chips for many combinations rapidly rolled out at all price points.. -- Current APUs are cursed by SSE'a increasingly CISC approach to parallel programming. Extreme architecture bloat, yet no standard low level GPU or DSP architecture - the role Itanium could and should have rolled out, along with far simpler, non-SIMD (non-SSE) X64 cores and better scheduling and cache usage between cores for optimum load sharing...
Itanium was still more successful than their early RISC processors. I'm not sure if it was their greatest blunder. Larrabee was another massive waste of energy and money.
It's always fun watching the "look, it's code!" clip art. Usually it's some web page. This one is starting to look like a high school programming class text only utility. "And I wrote the grade admin software for the entire school system on the PDP-8 in junior year." Come on, you know that guy.
Oh man…I was working at SGI on the R10K/R12K team at the time. Itanium was the big boogie man that Intel used to scare us (MIPS) and Dec (Alpha). Makes me really mad - it felt like that was the beginning of the end of SGI.
I worked at a prominent mathematics-software company during Itanium's launch and subsequent train wreck. I helped manage our Itanium build machines...A first-gen Gateway 2000 deskside machine running XP/ia-64 that had more internal heat-ducting than an air-cooled VW engine, all of it labelled 'Required - Do not discard', and a second-gen 2U HP dual-Itanium rack box running HP/UX that had a habit of baking the CPUs...Which HP had to replace under warranty. At one point I had (according to HP invoices) half a million bucks worth of dead Itanium2s sitting on my desk at work, just waiting for HP to send us enough carriers to send them back in.
While it may have had some issues, I have to say that the Itanium/VMS systems I ran for the Navy could run RINGS around any other platform in our site - until of course Moore's Law caught up with what we ran. We ran PDP-11/70s, VAXen, Alphas, and Itaniums (over a period of over 30 years) and loved every one of them. From our viewpoint the Itanium was NOT disappointing. I managed to maintain an availability ratio of 99.93% over the 8 or 9 years we actually ran with Itanium/VMS. We never thought of those machines as being that bad. And physically, they didn't take up that much space, either. I understand the E.P.I.C. issue was a killer for compiler writers, but for typical linear computational code, they screamed when they ran. Eventually, what killed our system was that our apps were client/server data management based on a centralized DB, and things started to become more distributed - and more web-oriented. They finally turned off the Itaniums - but I can't speak ill of them.
I remember hearing about Itanium in all the magazines and then one day, poof, never heard anything about it ever again. Kind of thought I dreamt it or something.
I got to work on TWD as an intern in Mass around 2004-5. Fun gig. Loved the technique of putting 9 cores on the chip and cutting one out later to allow for fabrication defects.
And that's why the 64 bit x86 architecture is often called amd64 instead of x86-64, AMD invented it and Intel had to follow after the Itanium fiasco. That will be a never healing wound for intel....
I think it had to do with support for those that bought the equipment that had them. Can't just say whoops buy a server farm's worth of new equipment on the spot.
Another OS not mentioned that ran on Itanium is NonStop OS - originally from Tandem, then bought by HP who ported it to Itanium. This OS runs COBOL code and is still used in a number of banks today, although with COBOL programmers slowly drying up, this too is coming to an end. That being said, HP have ported NonStop to x86 finally and are offering it in the cloud for an obscene cost.
Oh Itanium… I did NTDDK work, writing drivers , and I got called in to help on an Itanium effort. I had low-level MC680x0, x86, VAX and Alpha experience, so they figured I could help with IA64 too, I suppose. Boy were the performance anomalies bizarre. You’d have an ISR that was nearly functionally and syntactically identical to an x86 (or later AMD64) equivalent (in C), and yet it would perform terribly, but only under certain circumstances. Given that VLIW/EPIC was supposed to largely predetermine/hint ideal branching and ILP at object generation time, you’d expect fairly consistent performance. Instead, depending on what other ISR’s or DPC’s had been serviced, performance for our code would vary massively (I literally would break the system to remote windbg and drill through step by step, instruction by instruction). Eventually it became clear that there were a lot of bad pre-fetch gotchas, lack of ILP “bundling” that the compiler was padding with nop’s, and the resulting instruction cache miss rate was high (along with mp L2 cache coherency problems). Most of this had no clear recovery path as IA64 was an IPF in-order architecture, as they’d really gone all-in on speculative execution.
To fix certain issues literally required hand-coded assembler, which oddly enough was actually quite excellent. The ISA itself, abundant gpr’s, clear mnemonics, made for a nice programming front-end (especially compared to our old friend, x86). But none of that meant a hill of beans, because the uArch’s commitment to predetermined/hinted instruction level parallelism was just a huge handful. The people writing the compilers were fighting a nearly impossible battle.
Sad thing is, there were a lot of good things about IA64, had Intel taken a slightly less religious approach to the whole affair, things might have gone differently (a problem that stemmed from their abundant insecurity around being “the CISC” people, during an era when RISC was the end-all of every flamewar discussion around processor technology - they wanted very badly for IA64 and EPIC (VLIW) to be the next big buzzwords).
I still have, sitting in a storage unit, the McKinely (Itanium2) HP zx6000 workstation that I was provided for that project - the company didn’t want it back! Thing is, unlike my Amiga’s, vintage Mac’s, bread-bin C= machines and my beloved Alpha box, I also can’t get excited by IA64. The reason is pretty simple: it just wasn’t very good (and, as another commenter pointed out, the IA64 bandwagon took down Alpha, PA-RISC and MIPS, all more worthy than IA64).
You hit it right on. When I first learned of this upcoming Chip it was supposed to take the best of the Alpha that was 64 Bit and at the time the fastest CPU, also the best of the HP PA RISC CPU and of corce the best of intel rolled into one Chip. Seemed like a possible big big win. But was a disater. I kind of woner if intel did this to wipe out the Alpha the PA RISC and they tried harf to get SUN and IBM in the Game. In the Long Run Intel did gain in the market as two competative chips were gone, Replaced by mostly intel.
@@jimbronson687 Nah man, the Intel people really, really believed they had the next big thing (and had bought into a lot of weird HP research-think). And a whole lot of other companies thought it was the next big cheeseburger too. The big problem though, IMHO, was that Intel's people got all religious about IA64's VLIW'ness. They also just didn't seem to get that the decoupled CISC decode penalty was increasingly a nanoscopic (literally) bit of transistor budget, and that the whole CISC/RISC debate didn't matter in an era of decoupled CISC, chip multiprocessing, SMT, etc.
These were all things THEY helped bring about, but simultaneously didn't believe in. RISC still had (and has) benefits in relative efficiency, but not so much in ultimate performance, especially as the relative latency and performance delta between register file and the rest of the hierarchy of memory subsystems (l1, l2, l3, etc.) grew substantially (even weirder, x86's code compactness, thanks to being CISC, meant that they had an advantage in relative cache performance). In the end, CISC vs RISC vs. VLIW, etc., didn't matter.
Meanwhile, as they were blowing it with IA64 and Spentium 4 NutBust, AMD was having one of their good spells, and delivered a lot of hurt, with the ultimate (and sustained) humiliation of AMD64 (something they're no doubt still annoyed by, every time they see a binary labeled AMD64). I was at Microsoft during the up and down of Itanium, and Intel was super-duper upset with us for throwing our weight behind AMD64. But it was simple survival for us: IA64 was never going to get out of its own way, and AMD64, while inelegant, actually worked really well (especially the long-mode sub-modes for 32bit, which made implementing things like WOW64 a snap).
And just look at where AMD64 is today: Alder Lake, Zen 3, etc.... it's pretty amazing. Hell, I used to hate x86, but I've grown to have great affection for the weird little ISA "that could". Against all odds, for 46 years, x86 has proven the eternal winner, even though its own creators tried to kill it three successive times (iAPX 432, i860 and IA64). The only thing weirder is that the other big ISA started life as the CPU in the Acron Archimedes. You gotta love this business.
Sounds amazing
@@smakfu1375 What are your thoughts on The Other Big ISA? It's doing very well in the likes of the M1 processor, but looks absolutely nothing like ARM2. Could you even really call it RISC when half the silicon is a bunch of special purpose heterogeneous cores?
@@kyle8952 Well, M1 as a "package" is a whole bunch of stuff, but the general-purpose CPU cores are ARM (AARCH64) load-store RISC. I don't tend to look at the whole die as "the CPU"... then again, I also consider general purpose cores to be "the CPU's", and the rest of it is other neat stuff (GPU cores, etc.). Modern processor packages (whether monolithic dies, MCM's with chiplets and tiles, etc.) are hugely complex things with tons of strange and interesting stuff going on.
Take even the lowly BCM2711 in the RaspberryPi: the VideoCore VI is actually primary processor responsible for bootstrapping the system. It runs an embedded, on-die ROM stored hypervisor-like RTOS that, among other things, provides integrity for the VideoCore processor. This makes sense, given the BCM2711 started life as part of a family of SOC's for set-top boxes. The "color palate" test you see on boot is the VideoCore processor initializing. So, what looks like a simple SBC with some ARM CPU cores and an iGPU, is actually a lot stranger and more exotic than it appears.
I like AARCH64 quite a lot, which was my starting point for ARM. Apple's M1 is really impressive, but they're also able to make the most of the ARM RISC ISA by nature of how closely coupled the "SOC" package is. Overall, it's nice to finally see real competition in the industry, as things are really interesting again (just look at AMD's new Ryzen 6000 series mobile processors). Hell, I even witnessed an online flame war over RISC (Apple M1) versus CISC (Ryzen 3), which took me right back to the good old days of usenet.
I was doing research at a university that had an Itanium-based supercomputer. It produced a neat wow factor when you had to mention it in papers, but the thing was a colossal failure and I was able to outperform it with my power mac G5 at home for most things. Probably cost millions of dollars, and certainly tens of thousands a year just in electricity and air conditioning.
Great, now I want one.
@@8BitNaptime Same
@@8BitNaptime Well, get one while you can. They are probably still in that phase where you can pick one up real cheap before they become "retro". I swear they will be discovering them in barns on American Pickers soon. Not a lot of people willing to overhaul an entire building's electrical and HVAC for 64 weak CPU cores anymore.
@@benjaminsmith3151 guess I'll stick with commodore 64s then...
@@8BitNaptime I think I want a G5 mac now.
I can confirm that, despite the fact that HP-UX on Itanium were not an astounding success, many companies with links to HP had entire server clusters based on that architecture. My second job in IT (around 2007) in the TLC sector started with a migration of lots of custom software (written in C++) from version 11.11 of HP-UX to 11.31. However, many years later (around 2013 I think), I was asked to migrate the same software from HP-UX on Itanium to RHEL on Intel. I still remember fighting with the discrepancies not only of the two Unix flavors, but of the two target C++ compilers (acc vs gcc), each one with its own unique quirks - eg. slightly different handling of IPC ("System V compliant" my foot), very different handling of read-only memory regions etc.
Fast forward to my current job: last November I started working in the Banking sector. Guess what was my first project? Migrating a server hosting different batch processes (with a mixture of pure shell scripting and Java applications) from HP-UX on Itanium to Linux (of course, lots of different basic Unix commands behave differently, e.g. ps). Fate is an odd mistress, indeed...
I only had customer on hp-ux who made the change from pa risc to Itanium, but most my hp-ux customers ran the same erp system which developed a Linux port when x86-64 came along, so they migrated to RHEL on x86-64. The one who stayed with hp-ux and Itanium had there own custom in-hpuse application so I think decided Itanium was cheaper that the porting effort.
I have a vita I may have to try chronicals, I will probably miss the shanties however.
Reading that all was very nostalgic. I remember the first time I booted an HP-UX system (PA RISC, I’m sure) I recall reading something like “IPL HP-UX” and all this time later I have the same reaction - laughing and thinking, “You go way over there with that IPL stuff, you’re an open system OS”.
Want me to pass a deck of Cobol cards? (Not me, but a poker buddy*.) It pays the bills.
* Serious. I cannot support your cobol card needs. I have only fortran cards.
*_If I say "column," Geeks say what?_*
@@77thTrombone How about a bucket of objects as a counter to your cards, that's how most of us now roll.
I remember Itanium and the hammer blow that was AMD64, but I didn't know much about it and the deals that were done at the time, nor did I know it continued as a zombie for as long as it has. This was a FANTASTIC video that really answered some questions I had and was well worth watching. Great work!
Nice of you to say, I appreciate it.
Intel up to that point was suing amd for a number of years trying unsuccessfully to block amd from making x86 cpus. When itanium became itanic they were forced into settling with amd to licence x86-64. As part of it amd got access to some of the x86 extensions intel had developed. Of course amd would move the market again when they launched the first dual core chips, something we take for granted now.
The other big failure of intel was the pentium 4 of course, which amd trounced performance wise.
Intel had previously tried to replace x86 as well with what became the i960 (and another I forget), which ended up being used in printers a fair bit.
I still maintain environments on Itanium, hpux, we are getting rid of it.
@randomguy9777 they had lawsuits against amd and had multiple times been trying to prevent amd making x86 chips.
The legal battles only ended when intel realised itanium was not going to replace x86 how they wanted and they cross licenced x86-64 from amd to end the legal disputes.
The only reason amd had a licence to produce x86 chips was because in the original IBM pc they insisted on multiple suppliers for the cpu. By the time the 386 came around however intel had stopped sharing its design with amd, so since then amd have had to design their own chips compatible with intels instruction set.
@randomguy9777 they won't bother suing now as they need those x86-64 extensions and amd own the patents for them. As the market has pretty much transitioned towards 64bit only suing amd would be rather stupid at this point.
If anyone might be sued by intel it is potentially nvidia who wants to cut out amd and intel from their data centre compute systems and they use arm. Depending upon how they implement it, they could be sailing close to the sun with x86 translation.
AMD calling their server cpu's EPYC is actually such an amazing insult
LOL
you could say it's an....
*epyc* insult ..
I'll show myself out.
Thread-Ripper because it R.I.P's Intel in Threads
And yes, I Know that didn't made any sense
@@t0stbrot It's okay if it didn't make sense. What we all know is that there is no such thing as too many cores, or too many logical processors.
epyc 500 core cpu coming in two to three years will leave intel in the dust. seems we are seeing a repeat of intel servers being gigantic compared to amd servers while similar work load performance. 10 chips to compete against 1? possible. intel why you have the money the market share the indistry clout!??
Some would say that Itanium was an Intel executives attempt to eliminate pesky x86 competition IOW AMD through a new architecture for which AMD had neither a license nor clean room implementation. Ironically AMD handed Intel their asses with x86-64 and it must have been humiliating and humbling for Intel to have to approach AMD for an x86-64 ISA license. Hopefully the Intel executive who green lighted got fired from this.
I don’t think Intel needed AMD’s permission to adopt AMD64. They had a mutual perpetual licensing pact as a result of settling a bunch of early lawsuits.
@@lawrencedoliveiro9104 Nope, that was only for IA32.
I had a different hypothesis. Intel's compiler suite was widely known as the best optimising compiler by the early 2000s. They curiously took two independent stabs at making a dumber, more straightforward, more pipelined, deeper prefetching processor, that require higher code quality to exploit anywhere near fully, and the angle might have been the long game, well, they have the best compiler tech, so as long as they can penalise the other processors artificially (which they did repeatedly do), and optimise better for their own, they can win compiler and processor markets at the same time and create a soft walled garden.
The other besides Itanic was the Netburst.
It's not quite enough of a pattern to call it conclusively though. But it is enough to suggest at least a sort of strategic arrogance, where they thought they could corner the market by making a worse product.
"Hopefully the Intel executive who green lighted got fired from this."
Sadly it was the other way around. Two or possibly more of the senior engineers went to the CEO and explained why it was not going to work. The CEO fired them on the spot.
@randomguy9777 You've got an interesting idea. Indeed it stands to reason that while until Pentium Pro, they were catching up to state of the art as established in academia and by CPU designers in other areas, such as specific workstation and server oriented processors etc, at that point, they did catch up, and the pool of ideas was exhausted. Before P6, there was a large pool of things you'd want to build but for which the semiconductor budget just wasn't even nearly there in a PC. I think a few things that came in since were opcode fusion and perceptron branch predictors, with the rest of the IPC improvement being thanks to better underlying semiconductor tech and resulting fine adjustments rather than revolutionary insights.
Academia was enamoured with VLIW in the 90s and it was considered a future next big thing; so it makes sense that they'd shop for their strategy there. But the compilers that could make good use of such an architecture never happened. Maybe actually now's the time with things like V8, LUAjit and PyPy, which can repeatedly recompile the same piece of code to profile guided specialise it, or maybe they can provide a key insight that can make the approach practical in 10-20 years. I suspect it's still unclear how to merge these things with C-based compiled software ecosystem.
Place I worked at, we were porting software from 1.1ghz PA8900 Mako's running 11iv2 to 11.23 on in theory 'nice' (tm) 4/8 socket I2's. Oodles more cores, oodles more Mhz and internal memory bandwidth (lets forget the discussion about the 'generous' amount of cache on the Itanium .. because maybe she didn't lie and it apparently doesn't matter). Sadly, it all ran at best about 1/4 the speed of the older PA system for some sections of the code where it had a good tailwind (other sections much worse..), irrespective of single or 120 concurrent threads. Four of HP's optimiser engineers were hired at an eye watering rate for 3 weeks. In the end the result was "Wait for the newer better compiler and it'll magically get better ..delay the HP product 8 months". We waited (no choice), it didn't happen, but on the plus side.. they were able to afford lovely 5-star holidays in exotic places that year. It was embarrassing that the old 3-4 year old pentium-3 server and the older dual processor IBM pSeries workstation (275) performance out ran it also. It was all just sad and it killed three of my favorite RISC architectures.
Its good to hear from other people with real world experience of Itanium's performance issues. It took so long for improved compilers to really get there for Itanium, and even then x68-64 had moved beyond the practical performance levels Itanium achived. The lost of some really great risc architectures was the worst part of Itanium. I really liked running HPUX on RiscPA and Alpha was great as a next step up from x86 Linux.
Losing Alpha is the big one for me. For a good few years in the late 90's Alphas were the fastest CPUs out there. And then Compaq bought Digital and the PC sellers didn't know what to do with it. At least some of the engineers went to AMD and worked on Athlon (which used the EV6 bus).
@@IanTester I was not happy about Alpha being discontinued. I mostly used it with Linux, and it was a great option when you needed a machine with far more power than x86 could muster. We used most of them for database servers. Also for the odd customer who had far too many exchange users, and yet wanted to keep everyone on one server.
Itanium was a truly excellent idea... just not for CPUs... Should have been a bare metal GPU and DSP standard - a lower level than CUDA / OpenCL etc.. Proper STANDARDISED assembly code for GPUs and CPUs.. Could have replaced SSE with a far more powerful parallel processing framework.. Some X86 functions (tight loops) could have been rolled out to the Itanium SIMD unit, with a much simplified X86 core.. Now they're stuck with crappy SIMD and CISC on RISC X86 / X64 in each core...
--
Could have had 32 x64 cores (minus SEE) and 1 Itanium core ages ago, giving superb CPU performance and bare metal GPU-compute programming power. Could even be 2 separate chiplets for serial X64 and parallel processing, with their own dedicated and shared cache chips for many combinations rapidly rolled out at all price points..
--
Current APUs are cursed by SSE'a increasingly CISC approach to parallel programming. Extreme architecture bloat, yet no standard low level GPU or DSP architecture - the role Itanium could and should have been rolled out for, along with far simpler, non-SIMD (non-SSE) X64 cores and better scheduling and cache usage between cores for optimum load sharing...
@@PrivateSi I wonder if it might be better to have a new module for "do a=a+b on 10k items starting at address X" type instructions and use microcode to convert SSE into instructions for it instead.
There's an odd little bit of symmetry. The CG rendering in Titanic was done on Dec Alpha CPUs.
That fantastic, I love little details like that.
And then there is the story of the completely forgotten Intel iAPX 432 architecture... arguably Intel's first costly mistake..!
On the other hand, it was the initial delay of the iAPX 432 project (started in 1975) that made them hire Steve Morse to design the "temporary" 8086 in 1976-78. And the bad performance of the iAPX 432 compared to the 80286 in early 1982 made them decide to develop the 80386, a 32-bit version of the 80286.
Could not agree more. I was a summer hire on that processor development and knew even then that it was way too costly (read "slow") at the time. The only thing good that came out of it was developing another CPU team in Oregon where the Pentium Pro was developed.
@@randyscorner9434 you wouldn't happen to know anything about High Integrity Systems' involvement with the 432? I've got their 'clone' multibus system sitting behind me right now, and information is scarce! It's basically a rebadged Intel System 310 with custom multibus cards replacing a few of Intel's cards.
And yes you are right, without their efforts, it is arguable there wouldn't have been a Pentium Pro at all!
A few things - I was not yet working at Intel in 1999 when AMD-64 was released, but I was working at a BIOS vendor and there is a tiny detail that conflicts with this video. Two BIOS vendors were already booting x86 with EFI by 2000 and we had EFI customers in 2001 (namely a Nokia x86 clamshell set-top box). Thusly, EFI BIOS has been a continuous selling "BIOS product" since the beginning. There was never a need to bring it back, it was always there with a non-zero customer base. Mainly AMD was the hold out. To be fair, Phoenix BIOS was also a hold out. But primarily it was AMD that refused to use or support EFI boot and so any customers using both Intel and AMD would simply use legacy BIOS so they did not have to buy two entirely different BIOS codebases. UEFI then was setup specifically to detach EFI from Intel so AMD could help drive it as an open source project not directly attached to Intel. When AMD finally got on board with UEFI - legacy BIOS finally started to lose customer base.
Imagine trying to run windows 10 on an Itanium!!!!🤣🤣🤣🤣🤣
I used to work for DEC and remember getting my hands on an Alpha 150MHz workstation for the first time in the early-mid 1990's. It was was running Windows NT for Alpha and had a software emulator built in for x86 windows support. The first thing I did with it was to try to play Doom2 in the emulator window, and it actually ran - and much, much faster than my personal Pentium computer could do. It was shocking how fast it was. It also freaked me out when I looked at the chip and saw that the heatspreader actually had two gold plated screws welded on it to to bolt on the heatsink. The Alpha was a beast!
The alpha was a great processor, I do wonder how many of them where saddly scrapped for their gold content.
@@RetroBytesUK Sadly, almost all of them. When my plant closed, I moved to NH where DEC had an e-waste recycling center. They would send all obsolete/ overstock systems there for dismantling and destruction. Every day, truckloads of VAX 8000s, Alpha Servers, DEC PCs, etc. Would come in to get ripped apart and sorted. We routinely filled C containers (48"×48"×48" cardboard boxes on pallets) with unsorted CPUs - x86 pentiums, Alphas, everything in between - and other ICs to be bought by recyclers. It was all an enormous waste. Most of the systems were less than a few years old and in great working condition. All HDDs were unceremoniously dumped in a bin and they didn't even save or test the PC DRAM modules - everyone just threw them in with PCB waste.
Then Compaq bought us...
If Intel had acquired Alpha for the purpose of developing it rather than killing it (they bought the IP from Compaq and then buried it), and had put a tenth of the R&D money they sunk into the Itanic, computing today would be decades ahead of where it is now.
Very good cpu. Trans people sure loved this. I wasn't partial to it myself, though. Still a good chip.
Very good cpu. Trans people sure loved this. I wasn't partial to it myself, though. Still a good chip.
I still remember when our department at university got its HP-Itanium-1 workstation: I ran a few application benchmarks… 20% slower than my AMD-K7 PC at home:)
They didn't call it Athlon for nothing
> 20% slower than my AMD-K7 PC at home
Executing x86 code or native Itanuim (VLIW/EPIC) code?
Ironic that the K7/Athlon was actually made in part by laid off DEC engineers.
EDIT: Punctuation
I worked on the SGI version of the IA64 workstation and porting/tuning 3rd party apps. I would sit in on the SGI Pro64 compiler team meetings and one time they called in the Intel team to clear up some issues. The Pro64 complier could handle the multiple instructions queues and all that just fine, given that it could predict what the CPU would do. It had an internal table of op-codes and various execution times (I think it was called a NIBS table) and on the MIPS R10K CPU, there was a precise mapping of those, a core of the RISC architecture. There was a whole family of IA64 op-codes that had variable execution times. The compiler guys asked Intel if there was a determinant execution time for those op-codes. The Intel engineer proudly stood up and said "why yes, there is". Then he went on to explain that this or that instruction would take something like 6, 8, 10, or 12 cycles to execute. At that point, the compiler guys just about got up and left. In the RISC world, it's typically 1 or maybe 2 clock cycles/instruction (fixed). In the Intel CISC_trying_to_be_RISC world, there's a formula for the execution time. Faced with that, the compiler team had to use the worst case execution times for every one of these instructions and pad the rest of the pipeline with NOPs which killed performance. On the MIPS CPU, they could jam the instruction pipelines nearly full and keep every part of the CPU busy. On Intel, is was way less than that.
You have to wonder sometimes how intel failed to see the performance issues coming with variable cycle lengths like that for instructions.
I don't get how there weren't red flags when making this thing. Lets assume can assemble a program well enough in assembly, was there any way to determine the cycle time? Was this all hardware based scheduling?
@@warlockd I wasn't involved in any of the design part of the Itanium, just working with the Merced based development systems, 3rd party developers and our Pro64 compiler team.
My guess is that Intel was somewhat arrogant and also didn't really grasp the core principles of RISC architecture. It seems like they had some instructions that were inherently CISC and figured if they waved a "RISC wand" over them, that was good enough. As I recall, Intel thought that having some sort of predictable cycle time for a given instruction was good. But that "predictable" cycle count wasn't a fixed number, instead it was a lookup table of clock cycles vs. other machine states.
So yes, someone coding assembler could figure out cycle times given a good understanding of what was going on in the CPU. But the Pro64 RISC compiler wanted a table of "opcode"- "cycle time". I think the fix, in the end, was that the compiler guys had to put in worst case cycle times and then pad the unused slots with NOPs in the code generator back end.
Not having an implicit or compact no-op format has a huge blow to itanium. And the whole "lets hide the real latency" was to prevent the need for a re-compile when the micro-architecture was updated.
When the engineer said that, I feel like he might as well have said "The Aristocrats"
Thirty years ago I overheard DEC engineers say that RISC (reduced instruction set computing) really means "Relegate Important Stuff to the Compiler"
Then 10 years later transistor count for other stuff than core cpu logic like cache and branch predictions surpassed this, today its probably 99% or higher.
Still x86 is not very power efficient.
Pretty much. All of the promises of RISC turned out to be lies. Making a CPU with fewer and less complex instructions didn't actually make it faster. It _did_ make programs significantly larger, and compilers absolute voodoo. (I do still have a white-box [NT] alpha. one of the later / last models after they'd worked out how to keep them from melting.)
I recall Itanium being some hot stuff, so unreachable, so unseeable. We had Itanium servers in Novosibirsk State University, but not for any usual people. And now I find it marked "retro", causing cognitive dissonance.
If you are interested in VLIW architecture, Russian Elbrus e2k is also VLIW, but this is an alive project. With better x86 emulation, aided by CPU ISA adjustments. With Mono, JVM and JavaScript JIT. With C++ compiler although not bleeding edge, but still passable
they thought the hardware scheduler was obsolete and unneeded they thought wrong so wrong🤣🤣🤣🤣🤣
I spent about 18 months porting software to Itanium Windows. Nearly all of that was digging out and reporting compiler bugs, of which there were lots. Eventually got it working fairly well, but few customers took it up. I gradually transitioned onto porting the same software to AMD64 Windows. When we shipped that, I got a call from my Intel contact asking how much work AMD64 had been relative to Itanium. "Under 5% of the work of Itanium." "Oh..."
1:20 Very peculiar that HP came to that conclusion (about a limit of one instruction per clock cycle) at around the same time IBM introduced its POWER machines, which were capable of _superscalar_ operation -- more than one instruction completed per clock cycle. In other words, the very assumptions of their design were flawed from the get-go.
Your right it is odd, I wonder if they where wedded to the EPIC idea so took thier eye of the ball with finding ways to push risc-pa further.
Sun too were going for superscaler with sparc.
Yup. The VLIW assumptions, especially that dynamic instruction scheduling was just too complicated for hardware, were broken from day 1. That seemed pretty apparent to a lot of the CPU people even back in the mid-to-late '90s. The instruction density on Itanium (my fingers *really* want to type Itanic) was terrible, and the s/w prefetch needs didn't help much. There were math kernels that ran tolerably well on Itanium, and some DB code did OK, but it was ultimately pretty pathetic: very very late, and very slow.
Was not by chance, out-of-order execution was the next big thing on CPU's and everyone was trying to implement it.
Alpha had a lot of influence on CPU architecture, it was the first in a lot of things (including the best implementation of out-of-order execution till then) and everyone tried to surpass it.
Check the timeline, all this NEW CPUS came after the Alpha and it's success.
I was there at the time and had the pleasure of work on one.... It was night and day. I still remember compiling samba for Sparc took almost 20m. When i tried on the Alpha, took less than a minute, (I thought the compilation failed).
PA-8000 supported out of order, superscalar execution. By the mid 90s nearly everything was superscalar (even the original Pentium).
Small nit: It was PA-RISC, not RISC-PA. The PA stands for precision architecture.
It's not a small nit. Every time he say "RISC-PA" I want to throw things.
It's bad enough to give me fits of _NIC-PA_
Great video knitting together all the OS & CPU craziness of then. On the AMD64 you mentioned as one of Itaniums nails in the coffin, I remember just how toasty their TDP was (hotter these days of course)
Thanks for this. We used to run our telecoms applications and database on HP-UX on PA-RISC machines (please not "risc-pa"), some K-class servers, then L-class and we wondered if the new Itanium systems would be our future hardware platform. However the next step for us was the rp4400 series servers and after that we migrated to Red Hat Linux on the HP c7000 BladeSystem with x86-64 cpu architecture.
I remember this thing. I dealt with an Itanic (love that name) machine at work for a bit. WOW that thing was slow. The only slower machine I had access to was a Pentium 4. Like the Pentium 4, the entire architecture was complete garbage from the ground up. There was no fixing it. These were Linux machines too...and Linux had the best support for this thing. The Pentium 3 systems ran rings around the P4 and Itanic. Any architecture without out of order execution is DOA.
And any architecture with out of order is likely insecure (expose some sort of side channel somewhere). Really the promise to vectoize most loops was pretty impressive. Unfortunately the code generated could only vectorize some loops, and the code size would choke out all your caches, then you'd miss a load (even with reasonable pre-fetch) and then the whole damn thing would choke.
I'm not conviced the problem is entirely unsurmountable, and in the tasks where it's well suited VLIW does quite well.
@@WorBlux The other possible CPU architecture is Decoupled Access Execute, which tries to adapt to variable latency on memory loading by splitting the CPU into an "addressing" part and a "math" part, and passing data through queues. But the complexity of doing this in practice balloons up so it doesn't save much over just doing another Out-of-Order CPU.
8:07. That “expectations over time miss reality over and over again” graph is just legendary!-)
You could even call it . . . E.P.I.C. ;)
I remember comparing an Itanium (OpenVMS) server to a Core 2 server (Linux) running the Cox-Ross-Rubinstein formula written ic C. That was around 2010.No matter how hard I tried to tweak the C code, the Itanium could not outperform the x86 (it was about 40 per cent slower). And the x86 server was likely a fraction of the Itanium server price. The compiler just never delivered on Intel's promises and projections.
Companies bought Itanium servers because of their legacy applications written for the VMS OS (Vax, Alpha). Or because of their Mainframe-like enterprise features (hot-swappable CPUs and whatnot).
This is easily the most coherent and straight forward evaluation of the failures of EPIC and Itanium I've seen on TH-cam. We had a number of Alpha boxes at Dupont for evaluation and were very impressed with the kit, only to be told months later that DEC/Compaq were abandoning the platform. Damn you Intel.
A lot of good cpu architectures where killed for the promise of Itaniumm
The whole Itanium fiasco is a reminder how important backwards compatibility is.
I have a comment. Oddly enough, Itanium was more compatible with PA-RISC than with X86. 🙄
@@MultiPetercool Thanks for saying this - put a smile on my face. I was the technical architect of HP's PA-RISC -> Itanium software migration effort. HP handled backwards compatibility via dynamic translation (i.e. translating PA-RISC instructions into their Itanium counterparts on the fly while the program was "executing"). I tried to convince Intel to do the same, but at the time they didn't trust a software-only solution. Instead, Intel just embedded an older and less capable x86 processor front end on the Itanium die. Though, I think I heard that later they also went with a software solution. In my mind, two of the biggest reasons that Itanium failed is that we failed to recognize just how resistant people would be to recompile their programs into native Itanium code. If you didn't recompile, you wouldn't get anywhere near peak performance. Secondly, and I say this with some shame because I was quite guilty of it, we grossly overestimated how much instruction parallelism the smart compilers would be able to find. Much of our early performance analysis was done on loopy computational code that had regular code execution patterns. If you know where the execution flow is going, you can go fast by computing in advance and overlapping parallel execution. But, that kind of execution is limited to a few markets (which is also part of the reason Itanium lived on for awhile - for some computational tasks it did really well). For general computing, Itanium-style VLIW execution proved to be much less effective than hardware-based super-scalar solutions.
how about webpage backwards compaility -programs might be backwards copmatible but the information shown on them unaccessble
It wasn't the lack of backwards compatibility that killed it. What killed it was that it was shit.
@@Carewolf Well said.
Backwards compatibility is important for the PC, and it is unlikely that AMD64 would have taken off so well if it didn't have backwards compatibility, but that is only in that market with a giant install base and tons of proprietary programs that could not be recompiled for new architectures (plus bad coding practices of the time that made it hard to really port to a new system). Today we see all sorts of processor architectures with limited to no compatibility with other architectures, and we are getting closer to most user level code being run in a JIT.
I used to work on these things some years back. The chip was incredibly elegant, but it put a *HUGE* onus on compiler writers to properly optimize the code.
I did always wonder why the Itanium didn’t make it. At the time it totally seemed like the next big thing. This was a great overview, thank you.
Great vid! I started at HP servers in 2004 (x86 servers, internally it was called ISS [Industry Standard Servers]), and I quickly figured out from various hallway chats that I lucked out by being in ISS. The itanium division (internally called BCS [Business Computing Solutions, I think]) was always spoken of in a cringey context. Folks were constantly jumping ship from BCS, coming to ISS or anywhere they could, while others were getting let go left and right. I was just a kid at the time and didn't really understand the details of why BCS was on life support. I was never really a part of all that. But after watching this video, some of those old conversations I over heard in the break room make more sense to me now. :)
Until two years ago I worked at an organization involved in a huge project to port a critical piece of software from openVMS (running on Itanium) to a more modern x86 Linux platform. This was mostly influenced by Itanium clearly hitting the end of the road and openVMS at the time not yet being able to run on x86. It is worth noting that all this is no longer owned by HP as they sold of these parts of their business.
Anyway, millions of line of openVMS pascal code where first ported to C++, after that the entire thing was ported to Linux. Some of it with native system calls but for much of it a translation layer was used that would translate openVMS specific system api calls to equivalent Linux calls.
I personally had not worked with openVMS before that and it was an interesting experience. Even more so as it approaches some things we take for granted with other operating systems in a completely different way. Directory structure for example and disks in general. Would be worth a video in itself I imagine.
Just stumbled on your channel and now wonder why I haven't seen it until now 😊. Very well made and informative video, will definitely subscribe and watch your other content!
As for Itanium - yup, I remember when it came out. I was in high school back then, so didn't have a hope in hell of ever using an Itanium machine, but definitely remember the hype and then the massive thud that happened when Itanic finally launched 😃. Also remember seeing some Itanium processors for sale on a local PC hardware site for absolutely absurd prices and giggling. Amazed its support is only ending now, thought it was long dead.
That's very nice of you to say. Thanks for the subscription. I think Itaniums continued existence comes as a surprise to most as it is only used in a very niche highend market. I think given a free choice Intel would have killed it years ago.
Thanks for the video!
You forgot to mention the HP NonStop product line. That was moved from MIPS to Itanium processors, only to be moved to x86 processors a decade later.
The Itanium architecture was sound, but at Intel it was always on the backburner. The x86 development was always higher priority and Itanium was always based and produced on previous generation nodes.
Tandem and NonStop are probably a interesting enough to get their own video at some point. There are some things I decided to not really reference in the video NonStop, and HP Superdome (I also want todo a video on Convex) as they felt like they may have taken the narrative of the video too far off on a tangent.
Its a tricky balance doing these videos on what you put in a what you keep out. With this video I was trying to keep it a little more succinct, as I thought I could convey the history of Itanium fairly well in a 10min video without skimping on the details, but that meant not looking at some of the more niche product lines that fell into HP's hands. I don't always get the balance of what goes in/out with these things right. Some times in hindsight there are extra details, I wish I had focused on a little more or included, and other times I feel including something in the end pulled the narrative of track a bit. Some times I feel I've gotten a call about right and avoided going down a big rabbit hole.
@@RetroBytesUK O yeah, Superdome was another one. Good point.
@@RetroBytesUK the SX1000 and SX2000 Superdomes could have either PA-RISC or Itanium cells. You could have both in the platform, but any nPar had to have cells with the same type of CPU.
I briefly worked on HP gear in the mid-late aughts (as a Sun FSE, strange support contract for a customer who shall remain nameless)
Of course, with years of supporting SPARC/Solaris systems, HP-UX, the Superdomes and N-class servers were a different animal. HP-UX itself was easy enough to sort out as it drives pretty much like any other Unix. The Management Processor was different from any of the various flavors of LOM, ILOM, RSC, XSCF and of course OBP, the Open Boot PROM, essentially what does the BIOS like function for a SPARC based machine. I'm sure I'm leaving out some as the E10K, Sunfire E12K, E15K, E20K, E25K all had SPARC based Service Processor management machines either external to the cabinet (SSP for the E10K), or built in in the case of the Sunfire machines. The Exxxx machines had MicroSPARC System Controller boards instead of a machine running Solaris to manage the platform...
(Okay, I'm in the weeds, living out the glory years, sorry.)
Only did that for two years until the customer let HP back into their data center.
Of course, the same customer still had a Sun E10000 with 250MHz processors until a year or two ago. I'm not sure they had many folks who knew how to administer it.
Anyway, thanks for the trip down memory lane. Having to figure out if an HP cell board was a PA-RISC or Itanium board. Didn't really have to go through that with the Sun gear as they were all SPARC. Just a question of what clock speed for any replacement CPU.
All I know about Tandem is I want one of their promotional coffee mugs with two handles.
De-Lid was the hardest I’ve tried to do. Itanium 9260. It was successful. Only lost one Decoupling cap.
It cleaned up nice with 15k polishing. Now I need to get some etching done.
Nice video.
It's worth watching the Computer History Museum's interview with Dave Cutler where he more-than-suggests he encouraged AMD to continue their x86 to 64-bit efforts and that Microsoft would be extremely interested to support it.
From 1989 to 1998 I worked for Unisys, DEC, Sequent and HP in Silicon Valley. I left DEC shortly after Alpha was released. Part of reason I left HP was because I knew Itanic would be a loser. The extra $30k a year offer from Sun was too good to pass by. 😉
I can imagine extra money and Sun was too much to resist. How long did you stay at Sun for ?
@@RetroBytesUK Larry is my new boss. 😉
I touched my first UNIX system in 1977
I’ve seen ‘em all. ZILOG, Intel, Motorola, Cypress, Nat Semi... I worked for Plexus who built the prototype systems and boards for MIPS. Plexus was founded by Bob Marsh and Kip Myers of Onyx Systems where a young Scott McNealy ran manufacturing.
@@MultiPetercool Still there then :-)
The sad thing is it was a good idea on paper. They just grossly underestimated how horrendously difficult it would be to determine what instructions could be run in parallel at compile time, probably because it would sometimes depend on the result of another instruction, something you could only know at runtime.
You would be surprised about level of speculation and tricks people made to solve such situations.
Seems like a task for AI to solve.
Afaik, the real killer for Itanium and similar architectures is that the compiler has no idea which memory loads come from L1 cache and are going to load fast, and which ones come in slowly from L2 etc or Ram and need to wait as long as possible while running other operations. As far as I can tell this is 90% of why CPUs have to be out-of-order to perform well.
It wasn't a good idea on paper either. Just assume your compiler always figured out the perfect 3 instructions to run in parallel for your code, what happens if the next generation of CPUs cannot execute 3 but 4 or 6 instructions in parallel? Then code compiled for the first IA-64 version would still only run 3 in parallel and never make use of that new ability, as the compiler would have to re-compile and re-arrange all instructions to now figure out which 4 or 6 instructions can run in parallel. Without doing that, old code will leave 1 or 3 instruction slots totally unused and never unveil the full potential of the new CPU. With current CPUs on the other hand, the CPU itself figures out which instructions it can run in parallel at runtime and when the current generation is doing that for up to 3 instructions at most, the next one can maybe do it for up to 4 or 6 instructions, meaning without re-compiling anything, existing code will immediately benefit from the new CPU capabilities and have more instructions run in parallel. If the CPU makes that decision, a new CPU can make better decisions but if the compiler does it, you won't get a change in decision without a new compiler and re-compiling all code. And with closed source, abandoned projects, lost source code and tons of legacy compilation dependencies, it's not like you could just re-compile anything whenever Intel would release a new CPU; also you'd then need multiple versions, as the one optimized for the latest one would not have ran on the previous one.
@@xcoder1122Having instructions grouped by 3 is not totally useless even if you had a faster later generation CPU and had to load 2 or 3 instruction blocks every cycle (for a total of 6 or 9). You still get the code size penalty but it's not the end of the world. Your dependency tracker and issue queues get marginally simpler. IMHO what really did itanium in is the complexity from all the extra mechanisms they had added in - predication on every instruction, special alias checking and speculative memory loads...
Ah GCC. Ol reliable. Not surprised that they managed to squeeze Itanium's performance better than Intel could. The demigods that develop GCG are truly a mysterious and awe inspiring bunch.
In my mind they sit on thrones like all the characters from Shazam.
Architecture engineers are summoned through magic doors.
Yes, GCC which took years to be comparable to MIPSPro on SGI machines, even for "easy" benchmarks like matrix multiply.
I would love to see more content like this. As a collector I like to see these architectures get talked about. With the exception of Itanium, I have computers based on nearly every one of these. SPARQ, Alpha, PA-RISC, VAX, MIPS, and even a variation of a PDP-8E.
How odd that he started out to list Intel's failures including SPECTRE, but then ignoring that Itanium, due to it's lack of branch prediction, is one of the few architectures *not* vulnerable to SPECTRE.
And for those who didn't already know, now you know why AMD named their newest server CPU's as "EPYC" :D
Now that's a burn if anything.
We had acquired an Itanium workstation for development purposes. The boot process was incredibly slow. We never really got started on our development since AMD64 was announced and really took off, much to Intel’s chagrin.
I remember when Intel was previewing the Itanium. I didn't know the specifics. I just thought it was basically a 64 bit x86. But it seemed that no one was adopting it. Now, I know why. This was some time ago, and I've been in the business before IBM made their 1st 8088 (original) PC. I go back to 8-bitters running CP/M !
Intel designing Itanium: "how do we give the software engineers the steepest possible learning curve?"
AMD designing x86-64: "how do we give the software engineers the smoothest possible learning curve?"
I think that AMD got it right due to pure luck. When Intel pushed IA-64 with HP, AMD had nothing to challenge because AMD was a pretty small company. They did not have money to follow suit and embarked their own AA-64. They had no choice but to bet full house on X86 and extend it to 64 bits.
On paper I thought the EPIC concept was a great idea, to explicitly schedule instructions to maximise IPC throughput. I remember seeing a hand coded assembler routine for solving the 8 queens chess problem which was beautifully efficient. I really liked the rotating register windows concept for function parameters and local variables. But the dependence on compilers to exploit this was grossly underestimated and there was a use case mismatch. I believe they (HP initially) designed this to run workstation (scientific) code efficiently where the code blocks (like in Fortran) are much larger and more locally visible to the compiler to provide much more scope for the compiler optimisation to shine in its EPIC optimisations. But in the boring reality of business and systems code, the code blocks are much smaller, the data structures are much more dynamic and the opportunity to exploit parallelism up front using compiler/(feedback driven) analysis was far less than anticipated. Thus regardless of the silicon realisation (not great in V1 Itanium), the combination of compilers, the kind of code they had to optimise and the silicon was not going to shine. So when AMD-64 came along as a very elegant compromise that used dynamic micro architectural techniques to efficiently run otherwise difficult to statically compiler optimise code, AND ran 32 bit x86 code efficiently the writing was on the wall.
So I'm sad because it was an interesting architecture for scientific Fortran like workloads, but not modern ones.
Register windows are idea used in SPARC. Rotating registers as in Itanium are something totally different and that concept is taken from Cydrome.
In 2023, I still use an OpenVMS system based Itanium2 (rx2800-i2) every day. I also use CentOS-7.9 on x86-64 (DL385p_gen8) every day. installing Python-3.9 on both, then comparing the results, can be very instructive (the x86-64 system is much faster). That said, Intel's main mistake was trying to isolate two markets (16/32 bit for home and small office; 64-bit for server and large business; who remembers the 80432 chip which was shelved before Itanium?). Anyway, 64-bit only chips became irrelevant when AMD added 64-bit extensions to their implementation of the x86 instruction set.
In retrospect, this may have been more of a New Coke moment than an outright disaster. All those competing high-end chips falling by the wayside as companies bet on Itanium gave Xeon a chance to be taken seriously.
Between 1995 and 2005 I worked for a company selling used IBM RS/6000, HP9000 and Sun Microsystems UNIX workstations, servers and spares. Some of the larger PA-RISC HP servers were my favourites as I loved their amazing CPU heatsinks and giant fans, which on start-up sounded like a Vulcan bomber taking off. I think I still have a D-class server and boxes of K-Class boards in my garage. When the Itanium systems started to appear, it all got very boring from my perspective. As an aside, it was always PA-RISC and not RISC-PA as you have said a few times in this video.
Another blunder: Selling off their ARM assets (StrongARM/XScale) in 2006.
They have an arm license so they can use any arm core from arm, plus Intel going to make Qualcomm chips and custom arm chips soon.
"None of Intels' other mistakes can hold a candle to this one." He said... Not knowing that years later, Intel would reply with "hold my beer" and release an entire line of CPUs that brick themselves.
Even for Intel, self immolation is probably a design issue too far.
The Itanium did perform really well... when one compiled purely for ia64 and *not* x86.
The poor performance came with Itanium's implementation of x86 compatibility layer. To stay backwards-compatible with old x86 32 bit code, it had to emulate it. Translate the x86 code into ia64. That was never the intention to be the main way to run it. Yet the software developers did it anyway. And then they complained. Developers didn't want to move to a pure ia64 environment as it was scary.
AMD's x86_64 extension is backwards-compatible with 32 bit x86 with no performance loss. Intel realised in the end that this backwards compatibility is much more important than they anticipated, and choose to license AMD's x86_64 for their future products. That is where we are today.
Reason to why Intels own 64 bit attempt failed: The software industry was too stubborn to move to a new architecture. AMD knew this of course.
No. I own an rx8640. Giant itanium machine. 16 sockets. It's dog slow. I run gentoo on it. Everything compiled natively with the right flags, etc. It's slooow.
@@detaart Slow compared to other machines from 2003? Intel core architecture was launched in 2006 so that itanium machine would have been competing against netburst based xeons.
@@jaaval even netburst xeons would run circles around it.
"The Itanium did perform really well" - On certain code, sometimes. Not as the general case even for native code. IA64 would have done better if the x86 emulator have never been expose. You need it for some drivers of the time, but it just confused the issue of who the platorm has actually for.
I love that you use the Solaris colors in your Thanks For Watching outro. I recognize them because I was low-key obsessed with purple and orange colorways for a while. 😜
I always wondered why Itanium was such trash and why it didn't last that long...
You throw a ton of events and timelines in without giving hard dates in this video.
I’m sitting here trying to take it in and I’m like “BUT WHEN IS THIS HAPPENING?!?”.
Hardware People: We'll leave the hard part to the Software People
Software People: Don't look at me.
Here's a fun tidbit that, understandably, got left out. Intel was still developing Itanium long enough to make a 32nm process version, and that processor was launched in 2017. That's amusing to think about for a million reasons.
Intel is an interesting story. In business, if you swing for the fences, you will strike out a lot. The shareholders won't like that. Of course if you hit a home-run, then you're the hero. I think with Itanium, we may see those concepts come back at some point. Modern processors have to keep their pipelines and pre-fetch components powered up and consequently the Intel and AMD processors consume enormous amount of energy. A less complex, high speed processor that runs super-compiled executables would run fast. I still think some pre-fetch hardware may be necessary though. Let's see how things evolve.
I was closely following the 64-bit-battle at the time (which was around the time of my masters degree in CS). At first I found AMDs solution a bit blunt, and IA64 in principle more elegant; but, execution matters (in multiple interpretations of the word).
Poor Intel got hammered (pun fully intended).
Around 2009 I had a login on a big itanium based beast of a machine with 160 or so cores and 320Gb of RAM . The performance of my software was terrible; partly due to architecture (instruction-set, NUMA), the fact that Java wasn't particularly optimized for IA64.
AMD's solution for 64-bit was the right one. As crazy and gothic as x86 is, it's still a better architecture than Itanium. Most elegant would have been just another standard RISC.
What I can't believe when I hear this Kind of Stories is that all These companies (HP, Compaq, SGI etc.) All agreed to leave their own Platform behind while the new Platform wasn't even around yet.
That's when you know it was a management decision and not an engineer decision :)
Costs of development were rising and market was shrinking. Plus Itanium supposed to come out few years earlier and be a bit better. That is why MIPS CPU stagnated, SGI had to do something when they saw that Itanium will not be there "in few months" or even years. R16000 had DDR, but except this it was still old SysAD, made for R4000 at 100 MHz.
3:47 Just a note that “PDP” was not _one_ range of computers, it was a number of different ranges, with 12-bit, 18-bit and 36-bit word lengths (OK, all of these were pretty much obsolete by the 1980s), plus the famous PDP-11, which was 16-bit and byte-addressable, and was the precursor to the VAX.
Maybe when you said “PDP”, you really meant “PDP-11”.
What I ment was the whole pdp range from pdp-1 through to the pdp-11 was coming to end with the pdp-11 at that point.
The numbers went up to the PDP-16.
@@lawrencedoliveiro9104 I thought the 14 and 16 where not really general purpose machines like the 11 but was just for automation systems. I know the 11 was the last to stay in production though, and that was the final machine in the line they where still selling. I suspect if you had a service contract you probably could still get 14 and 16 etc but they would have been NOS by the time they shutdown the production line for the 11. So I tend to think of the 11 as where the pdp line ended as it was the last one you could buy.
I worked on some pre-production Itanium workstations my company was developing. Those machines ran very hot. We had a small lab of 8-10 of them in a locked conference room. It was like each one was a hair dryer to blast furnace constantly blowing out hot air in a futile attempt to keep them cool. And it wasn't just because they were pre-production either. Other non-Itanium pre-release workstations ran much, much cooler than Itanium did.
I think Microsoft still uses the AMD64 name for x86_64. Interesting note about EFI. Does this mean that Apple started using EFI after it got created for Itanium?
Yeap the 64bit version of windows is targeting x86-64 which basically forced Intel's hand into supporting it. Your also spot on about Apple, Intel developed for Itanium, then Apple adopted it when they moved to x86, its not like they needed BIOS compatibility, and its a nice clean environment todo bootloader code for.
EFI or later UEFI was intended to be somewhat processor agnostic in a similar way to Open Firmware
A number of Linux distros also use the AMD64 name.
Actually there are two names thrown around between the two implementations and it is said they have small differences.
AMD had AMD64..
Intel was EM64-T
At the time, Macs were on PowerPC CPUs, and Apple used a BIOS known as OpenFirmware. Granted, up until the iMac G3, they used a combination of OpenFirmware with the old Mac ROM, and the OpenFirmware implementation was buggy (One of the main reasons why Apple dropped support for older computers running Mac OS X).
But yes, Apple used an early version of EFI for their Intel Macs. For ARM, they use iBoot (based off of iOS devices).
Intel titanium was a great success for Intel! It killed the market for other CPU’s that were being developed at the time by IBM and DEC and other manufacturers plus, it was a proof (albeit highly slanted proof) that nothing was better than x86, that no other architecture could be created that would beat the mighty X 86. Both of these were great strategic successes for Intel.
The one thing I wonder if it was left out of the story was that this time was broadly the end of the ... non-PC micro and mini- computer. Did SGI die just because they went with IA-64 for their specialist workstations, or because a PC with a some plug-in hardware was good enough? The list of "people who went with IA-64" reads like a list of people who thought "we still need something better than a Windows box" at the same time. Maybe they'd have gotten another couple years if the IA-64 rollout was stronger but by the end of that gasp the writing would have totally been on the wall? Dunno, maybe I'm totally on crack.
Interestingly enough, I recently specced a new PC (that I'll not be buying any time soon) and it came out over $50k. Not Infinite Reality territory, but...
In the mid 90s, I had this triangle of love with MIPS and Alpha. We used to have Sparc and PPC visiting us for a nice cup of tea. Such fun we had sharing jokes about SCO, and well, everybody knew that Netware didn't have virtual memory, right? But then came Itanium, always wearing that eau de depression. It was the end of good times.
Wasn't there some way Intel could have figured out that Itanium would be a dog before investing so dearly in making the hardware? Looks like the answer was a firm no. With compilers being wishware, ouch. There was no reliable way to rate it without compilers. Extending the x86 to 64 bits made far more sense, with compilers for the x86 being old hat by then. Good on AMD for embracing the obvious. CISCs today are implemented internally as RISCs anyhow, which is how they beat out the older explicitly designed CISCs so well.
Maybe at some point automation of compiler creation and resulting code analysis will be good enough to actually evolve architectural prescriptions, and flubs like Itanium won't be as easy to make. But this will probably run on arrays of, you guessed it, x86-64 processors. Thus, new giants in creation stand on the shoulders of old giants.
I think there was a big circle jerk of executives AND engineers in this. You got HP that has a bunch of old Unix systems that need to be migraged and you got intel who has the idea to build a new ISA, something for the future AND all locked to Intel. Your locked with Intel and HP. Everyone was looking at the horizon when they should of saw the fire starting at fab design:P
Yet another channel where I just cannot comprehend how it doesn't have more subscribers and views.
Genuinely excellent content.
You might want to also look into the iAPX 432. Arguably an even bigger blunder.
I used to volunteer at my local Free Geek. Mostly we got the standard Intel and AMD home desktop computers but ISTR we had an Itanium based system or two during the time I was there.
A few years ago I saw an Itanium workstation in good will for $75 dollars. Still wish I had bought it, just for the sake of oddity.
I think I would share your regret if I had seen that too. I could not find anything that was not hugely expensive, but it was all very highend server kit.
When I supported hp-ux I managed to buy a zx6000 workstation running Itanium 2 processors for just £240 from eBay. Now retired, nostalgia prevents me from getting rid of it as even now it still runs hp-ux 11.31 as well as Windows 2003 and Windows 2008. In the end I’ll probably donate it to the Cambridge Computer Museum.
Got handed an SGI prism that was discovered under a desk at work, with all of the accompanying software and manuals. Free to good home.
Not the greatest home, 'Barney' lives in the garage because I could never get the dual AGP graphics cards working on anything other than the supplied aged SuSE distro it was supplied with.
What good is a graphics workstation without graphics.
I was recommended your channel on the main page. No regrets. Awesome content!!
Mentioning some dates would have been handy...I had to go to Wikipedia to find out Itanium was launched in 2001 (I thought it was the 1990s) and only discontinued last year.
I think every man and his dog has gotten their hands on early versions of an Itanium based machine prior to the offical launch. We had one in the very late 90's. That probably why you where thinking as the 90's was when everyone was talking about Itanium, then being disappointed by it.
That was a critical time in the history of PC computing at the turn of the century: switching from 32 bit to 64bit.
Windows XP64 was coming out.
I was working on 64bit drivers for XP64, and the only available 64bit HW was Itanium on a beta Intel server.
Was it slow!!! To reboot it took 20 minutes!
It was a painful time.
Not a mistake, it scared all the other major "high end" chip makers out of the market. SGI stopped investing in MIP, HP in RISC-PA, Compaq/DEC in Alpha. Even Sun blinked getting tempted away from SPARC after initial resistance. The rest is history. Intel won and only now after Apple has invested in a performant ARM architecture, the M1 do we see the performance advantage they all gave up for Intel's failed promise of VLIW parallel execution. All the high end executives duped by a pipe dream. Not the only architectural fiasco from Intel. Larrabee followed on the heels of this to tackle the massively parallel GPU threat.
Intel did a great job of helping Alpha live on by motivating the AXP engineers to go to AMD and develop the Athalon and Athalon-64
I wonder if the Itanium might have done better if someone had used it in a games console? That parallel architecture seems like a good fit for working on vectors in a 3D game.
Code generation wasn't really ever solved for Itanium due to a number of peculiarities, and you always have a lot of flow control heavy code on the CPU. The architecture type employed by Itanium is also known as VLIW (very long instruction word) and it has been used in DSPs on and off for a long time, with varying success, but is seen as a very versatile DSP architecture. So yeah the GPU of Xbox360 would be a valid example! Also, with some intense cringing, maybe the Samsung DSP in the SEGA Saturn.
And yet at the end it lost out to SIMD in the GPUs. Vectors make a weak case for VLIW, because most of the time, you apply the same operation to all elements of a vector or several vectors, such as vector dot product or matrix vector products. So using VLIW your instructions are several times as large as they have to be, and take correspondingly that much more power and logic to decode. When your operations don't neatly fit like that, but you're working with arrays of data, you can improve the SIMD weak point just with low cost data shuffling operations, so say you can extract the same element out of 4 consecutive vectors into a new vector and apply a SIMD operation to that. A couple more tricks down the line, and you have covered up the hypothetical advantages of VLIW with practical advantages of SIMD in code density, code cache utilisation, power consumption and footprint.
That's how we ended up with the CELL cpu in the PS3, which had similar tradeoffs to Itanium (high performance on numerical tasks but requiring special programming). It ended up being *not* faster than the XBOX360 despite the special programming. Sony learnt their lesson, did a 180° turn and put bog standard out-of-order x86 cores in the PS4.
A painful walk down -Memory- Cemetery Lane.
HP-UX still out there? Wow. I'd've lost that bet.
At least they had slightly better nameing schemes back then, instead of this i3, i5, i7 malarkey (but which gen of iN....), the xeons aren't much better for easily identifying without searching on the intel site for the specs either.
0:12 "like the floating point bug in the very first Pentiums which recent security problems meltdown and spectre of Intel's other mistakes can hold a candle to this one however"
0:20 "no failures cost Intel so much have been so public at the time"
0:33 "if you're around at the time you remember all the hype about titanium"
1:15 "risk va"
2:07 "the only problem hp saw with this was damn expensive to develop a new cpu architecture"
2:25 "risk ba"
2:44 "after all unix systems were hp's cache cam"
3:10 "until we're really keen on developing this thing"
3:20 "well that was all seven up by hp with its risk pa"
3:23 "deck with its fax platform"
3:32 "development of titanium took a while"
3:40 "first deck had run into some really big problems"
4:06 "deck had lost way too much money"
4:19 "however compact was soon in trouble"
4:22 "it decided it would drop the development of the deck alpha and move titanium for all its big systems and also shift vax's os vms from alpha to titanium"
4:38 "so he decided that he'd move to titanium"
4:40 "this would reduce sgi's development costs you'd no longer have to spend on mips"
4:52 "son unlike the others decided it would hold back and it would keep working on its ultra spark architecture but it did announce that it would put solaris to titanium"
5:03 "even the big unix vendors for the pc platform got behind titanium"
5:08 "brought together all the big unix vendors of the time seo sequence ibm to work on one unix variant for titanium"
5:24 "deck now owned by compact renamed digital unix to true64 and announced it would be moving to titanium which of course never happened because compact went bust"
5:35 "unsurprisingly microsoft would also port windows nt to titanium as well so business-wise titanium was going well"
5:51 "then intel released the first version of the titanium cpu"
6:08 "but again its performance just well disappointing"
6:14 "until they've been working on their own compiler"
6:42 "interesting titanium started to dry up"
6:58 "with vitanium floundering"
7:03 "it felt like titanium was in limbo"
7:10 "amd produced amd 64 a 64-bit version of the xx6 instruction set"
7:21 "something that the itunes well couldn't do"
7:35 "as they could run 32-bit code much faster than titanium"
7:51 "so when the pressure from microsoft he wanted to drop titanium with their market share at risk from amd until agreed to license amd 64's instruction set"
8:07 "i'll show you the now infamous graph of intel's expected earnings from titanium and of course it's actual earnings"
8:15 "it's at this point in time that everyone assumed titanium was dead"
8:20 "however if remember that deal with hp back cursed fairy tale deal well that meant as long as hp kept buying cpus into that to keep to the road map so for the last 20 years intel has had to keep paying to develop titanium"
8:37 "it's only this year that linux torval announced"
8:58 "this is the part of the video where i'd normally start showing you around some bitter kit but not this time you'd still buy a titanium server second hand but well there's still over a grand"
9:14 "anything i could show you running on titanium would be well more fun to show you running on another piece of kit wanna see vms running well that's more interesting than alpha"
9:32 "titanium's just not a fun platform and most titanium servers are huge and well who wants to use that much storage space on titanium i mean i can fit a few more amigas in that space or an sdi box or two hell even a sun server so that's it for titanium"
9:56 "as the pcs bias was riddled with limitations"
Great subtitles.
You relaise its machine generated automatically by youtube ?
I noticed you refer to the HP RISC architecture as "risk pee ay" in a couple of videos now. It's "PA RISC", not "RISC PA". I honestly never knew how to actually pronounce it, though. If RISC can be pronounced "risk", I figured it could be "pah risk". Sounds like something from Deep Space Nine, maybe 😁
Lol Risk apee I...
I previously worked for an electronics recycler, and I remember seeing several racks of HP Superdome systems come in for decommissioning (powered by Itanium) It was fascinating seeing how Itanium was implemented on a hardware level.
IRIX never actually was ported to IA64. SGI had their altix and prism systems that ran itanium 2, but they only ran Linux, usually SLES or RHEL. I *think* sgi had a special software package that allowed users to run irix applications in linux, but I don't know that for sure
So a number of distributors where shown an Itanium port for IRIX (I remember sitting and watching the demo). However they did not release it, and shipped Linux with it as you said. I guess they decided completing and maintaining the port was too expensive. We where not allowed to play with the demo so I guess it had alot of rough edges. Normally we where invited to have a hands on session with new products, so it was notable that it did not happen with their Itanium version.
There is a reasonable chance this was IRIXs userspace on a Linux kernel. As we could not play with it we could not dig in further.
On the face of it, reducing hardware complexity and aggressively targeting parallelism seems like a great idea. But for a hardware company to offload the (enormous) complexity into software... seems like an obvious red flag.
"The competent programmer is fully aware of the strictly limited size of their own skull; therefore they approach the programming task in full humility, and among other things they avoid clever tricks like the plague." - Edsger Dijkstra [paraphrased]
The EPIC adventure proved one thing -- VLIW is simply not sustainable as a general purpose ISA. There are too many run-time hazards that even the most optimized compiler can't account for. The overarching goal with Itanium, with its statically scheduled in-order pipeline, didn't materialize a new brave world of compiler "magic" everyone hoped would lead to perpetual scaling.
Intel actually knew this in far advance and with the last architecture update (Poulson) they broke with some fundamentals of EPIC and introduced limited out-of-order execution and dynamic scheduling, to ease off some burden on the software. That still didn't solve the issue with bloated cache sizes Itanium was known for, due to fundamentally inefficient instruction packing and ordering.
I completely agree with everything you've said. Compilers have improved in this area, and are significantly better than they where at Itainium's first release, but the still don't have the level of omniscience needed for VLIW to workout as Intel initially predicted it would. Without the introduction of out-of-order execution into the design Itainium would not perform now as well as it does.
It is not so much about run time hazards, it is more about CHANGING from one to next generation. And traditionally, latencies were always larger. Unroll loop or software pipeline it for one set of latencies, it will have stalls on new CPU. That is why wide inorder CPUs went out of fashion quickly. Intel never made such CPU.
Stumbled upon this channel and glad I did. Good job.
Well, to be honest, Intel has made exactly 4 good CPU designs. The Pentium 2 & 3, as well as the Nehalem & Sandy Bridge Core i generations (and their die shrink subvariants). All others were garbage. Itanium yes, but Pentium 4 as well, Pentium D even more so. Pentium and before were horrible true CISC designs. The Pentium Pro fixed that, actually by copying AMDs desperation move of emulating x86 on a RISC core, but it was way too big of a die to be priced remotely sane. The Core & Core 2 fixed the microarchitecture by throwing away everything Pentium 4 did, and restarting with the Pentium 3 design, but they still ran with FSB & north bridge as memory controller, until Intel fully copied AMD again, while already licensing AMD64. Haswell was buggy and super energy hungry, and was completely discarded when the in parallel developed Skylake was finally ready, but it proved even more buggy. Both of those buggy generations were reportedly the reason for Apple to commit to ARM Macs. And after Skylake Intels screw ups left us with one xlake generation after the next.
Honestly it is surprising Intel still exists, let alone outperforms other chip designers in numbers.
alder lake
Pentium Pro was not a big success -- remember it had poor 16-bit performance, which was still rather important for Windows apps at the time.
@@lawrencedoliveiro9104 it suffered low yields too, so was expensive. It could be very fast in the right circumstances, but the pentium mmx cpu could match or beat it in some cases.
I'd argue that the 486 was a good design (the 50mhz version aside! ).
486, pentium 2, pentium 3, core2 (the first core wasn't all that great but better than pentium 4!), 2nd gen I, 6th gen skylake, and now alder lake were all pretty good and the 386 was a big leap at the time too.
As for the duff ones, well, p4, i960, (another one that was supposed to replace the x86 cpus), atom, 8th-11th gen. Probably more as well as the subject of this video of course.
nice, I own 3820 and later build a second system with 3930 on LGA2011 and I am super happy still today. seems I've been lucky with picking sandy bridge. they are now over ten (!) years old and are still up to any task.
I love that Itanium sales forecasts chart. It is a record of the slow death of a dream.
Itanium was a truly excellent idea... just not for CPUs... Should have been a bare metal GPU and DSP standard - a lower level than CUDA / OpenCL etc.. Proper STANDARDISED assembly code for GPUs and CPUs.. Could have replaced SSE with a far more powerful parallel processing framework.. Some X86 functions (tight loops) could have been rolled out to the Itanium SIMD unit, with a much simplified X86 core.. Now they're stuck with crappy SIMD and CISC on RISC X86 / X64 in each core...
--
Could have had 32 x64 cores (minus SEE) and 1 Itanium core ages ago, giving superb CPU performance and bare metal GPU-compute programming power. Could even be 2 separate chiplets for serial X64 and parallel processing, with their own dedicated and shared cache chips for many combinations rapidly rolled out at all price points..
--
Current APUs are cursed by SSE'a increasingly CISC approach to parallel programming. Extreme architecture bloat, yet no standard low level GPU or DSP architecture - the role Itanium could and should have rolled out, along with far simpler, non-SIMD (non-SSE) X64 cores and better scheduling and cache usage between cores for optimum load sharing...
It was not the first time that Intel botched the introduction of anew architecture. Remember the 432?
Itanium was still more successful than their early RISC processors. I'm not sure if it was their greatest blunder. Larrabee was another massive waste of energy and money.
the cgi in titanic was actually rendered on DEC alphas soooo you managed to make it relevant
I would love your videos if the music in the backgroundcwas removed, cant stand it unfortunately.
It's always fun watching the "look, it's code!" clip art. Usually it's some web page. This one is starting to look like a high school programming class text only utility. "And I wrote the grade admin software for the entire school system on the PDP-8 in junior year." Come on, you know that guy.
Oh man…I was working at SGI on the R10K/R12K team at the time. Itanium was the big boogie man that Intel used to scare us (MIPS) and Dec (Alpha). Makes me really mad - it felt like that was the beginning of the end of SGI.
R18000 not being sent to foundry - one of largest SGI's mistakes. It would wipe out original Itanium, probably Itanium 2 too.
I worked at a prominent mathematics-software company during Itanium's launch and subsequent train wreck. I helped manage our Itanium build machines...A first-gen Gateway 2000 deskside machine running XP/ia-64 that had more internal heat-ducting than an air-cooled VW engine, all of it labelled 'Required - Do not discard', and a second-gen 2U HP dual-Itanium rack box running HP/UX that had a habit of baking the CPUs...Which HP had to replace under warranty. At one point I had (according to HP invoices) half a million bucks worth of dead Itanium2s sitting on my desk at work, just waiting for HP to send us enough carriers to send them back in.
While it may have had some issues, I have to say that the Itanium/VMS systems I ran for the Navy could run RINGS around any other platform in our site - until of course Moore's Law caught up with what we ran. We ran PDP-11/70s, VAXen, Alphas, and Itaniums (over a period of over 30 years) and loved every one of them. From our viewpoint the Itanium was NOT disappointing. I managed to maintain an availability ratio of 99.93% over the 8 or 9 years we actually ran with Itanium/VMS. We never thought of those machines as being that bad. And physically, they didn't take up that much space, either. I understand the E.P.I.C. issue was a killer for compiler writers, but for typical linear computational code, they screamed when they ran. Eventually, what killed our system was that our apps were client/server data management based on a centralized DB, and things started to become more distributed - and more web-oriented. They finally turned off the Itaniums - but I can't speak ill of them.
I remember hearing about Itanium in all the magazines and then one day, poof, never heard anything about it ever again. Kind of thought I dreamt it or something.
"And then came the *Hammer* blow from AMD."
I see, what You did there.
(For those wondering - search for "AMD Hammer")
I'm gald somoeone noticed, I thought that was the best gag in the script.
I got to work on TWD as an intern in Mass around 2004-5. Fun gig. Loved the technique of putting 9 cores on the chip and cutting one out later to allow for fabrication defects.
And that's why the 64 bit x86 architecture is often called amd64 instead of x86-64, AMD invented it and Intel had to follow after the Itanium fiasco.
That will be a never healing wound for intel....
The crazy thing about Itanium is that it wasn't discontinued until very recently! I imagined it was killed off very early on, but it wasn't.
I think it had to do with support for those that bought the equipment that had them. Can't just say whoops buy a server farm's worth of new equipment on the spot.
Another OS not mentioned that ran on Itanium is NonStop OS - originally from Tandem, then bought by HP who ported it to Itanium. This OS runs COBOL code and is still used in a number of banks today, although with COBOL programmers slowly drying up, this too is coming to an end. That being said, HP have ported NonStop to x86 finally and are offering it in the cloud for an obscene cost.