When Software Kills: Fatal Bugs in the Therac-25
ฝัง
- เผยแพร่เมื่อ 1 ต.ค. 2024
- The story of the Therac-25, where a race condition bug caused patients to be irradiated with massive uncontrolled doses; how and why it could happen. For my book on the spectrum, see: amzn.to/3XLJ8kY
Join Dave as we explore one of the most shocking and tragic stories in medical history - the story of the Therac-25, a radiation therapy machine that went horribly wrong. In this episode, we'll delve into the fascinating yet disturbing tale of how a seemingly advanced technology ended up causing catastrophic harm to countless patients.
We'll examine the Therac-25's inner workings, explore the flaws in its software and design, and discuss the devastating consequences of its malfunctions. You'll hear the heart-wrenching stories of patients who suffered from radiation overdoses, and learn about the heroic efforts of medical professionals who worked tirelessly to treat these victims.
This episode is a must-watch for anyone interested in technology, medicine, or the human side of innovation gone wrong. So sit back, relax, and get ready to uncover one of the most chilling tales in the history of science and medicine!
Thanks to BobT for the episode idea!
Some more details for those wondering what happened under the hood -
It wasn't the lack of a beam spreader, it was the X-ray target that was incorrectly configured. We make medical use X-rays by hitting a [tungsten] target with high energy electrons. Most of the energy goes into heat, so >100x the amount of electrons need to hit that target to produce a similar X-ray dose vs delivering a beam of electrons for treatment. One reason this race condition was hard to catch was that it required the tech to erroneously select and electron treatment (machine begins to remove the target from the beam line) and reenter the correct setting of X-ray mode while the target was still moving. Since that console saw the target was in motion, it didn't check which way it was moving nor what state it ended in so X-ray outputs were applied with the target out of the beam line. The devices used to monitor the radiation produced by the machine do not operate well at 100x the dose rate and significantly under-measure what has left the machine, further increasing the dose to the poor patient. You normally can't feel radiation, but the current of electrons was so high the patients were essentially zapped like they touched a high voltage wire, that also shredded their cells and DNA.
Unfortunate that these lessons had to be written in blood, but I'm glad to this day radiations devices have many layers of redundancy and interlock codes have gotten more descriptive.
Do you have a source for the computer-side technical stuff? I have read a lot of what's online about Therac-25 but never seen what exactly happened, e.g. that it didn't check which way the turn table was turning.
so, the operator entered "execute small dose radiation" and the machine went "execute order 66". i've been hit with mains voltage some times but those electrons would have felt like hell's whip. man, those poor patients.
@@ΝίκοςΙστοσελίδα the references section on the Wikipedia article about the Therac-25 have some good technical details. In particular, the articles by Nancy Leveson.
@@tammymakesthings Alright, thanks! Funny thing, I looked up Frank Borger (mentioned in another comment) and I ended up finding an IEEE publication I don't remember reading. The more info, the better.
I wonder why on earth didn't the machine just have an "execute button" and THEN it started moving the target, check that everything is ok, etc. It seems like a LOT safer way to do it.
Radiotherapy linac engineer here, this is why we have layers and layers of interlocks, both physical and software. It is a very intentional act to strip back those layers (we can and do, but very rarely and always taking appropriate measures to ensure safe practice)
The safety innovations of today are writen in the blood of the past
there is many more about cancer and moneylooters to say
The only time those safe guards are removed/disabled is for testing/maintenance, at no other time should safe guards be removed/disabled.
@@tonysolar284 yup, and even then if we can do it without, we do that first
The very first day of my apprenticeship I was told "overriding interlocks is an action of last resort. It is never to be done lightly and we test it has been reinstated before handing the machine back to clinical"
And of course, most of these can only be stripped away in service mode, which you are very intentionally told repeatedly to not treat any human or animal with.
@@tonysolar284they usually can't be. Treatment, at least on the Varian machines, needs to happen in "record and verify" mode. No one should be in the room when it's beaming on in service mode.
the fact that you had to change the thumbnail to *this*…
I guess it proves viewers liked that version better. What else do YOU think it proves?
Thanks for touching on the "any idiot shouldn't be able to just call themselves a software *engineer*" subject
Back in the 1980s & 1990s, I was a mainframe system administrator in a state govt. bureau agency. When our Director approved the creation of the first LAN, the task went to a member of the Clerical team because he was the only employee who had his own PC at home, and nobody in my team knew what to do. It took a lot of years to develop a culture of discipline in the "gifted amateurs" (as one of my colleagues referred to them).
In 1977, I did a Fortran programming project in one of our client departments, and saw a piece of code writted by one of their Clerical officers. It achieved Integer Division by repeatedly subtracting the divisor from the dividend, in a DO loop counting the iterations! He had no idea that there were integer & floating point division instructions in a CPU.
The title "Engineer" is reserved in every jurisdiction in the United States. In order to bill oneself as an Engineer, one must pass professional exams and work under the supervision of an experienced Registered Engineer for a number of years in a sort of apprenticeship before acquiring a registration or license. However, industry has arranged exceptions in the statutes to allow them to grant "engineer" titles to employees who haven't spent a single day in a college classroom studying the discipline in which they are employed.
@@Milosz_Ostrow OTOH, some of us dropped out of college, and ended up, say, programming for the DoD. It happens. You cannot discount "gifted amateurs".
Is Bill Gates a "software engineer"?
I worked for a company that thought programmers were simply overpaid clerks. Ironic when you consider that they were selling a software product. It was a hellacious experience dealing with these idiots.
@@Milosz_OstrowMany of the best software developers are mostly self-trained. Because it's a new science where there is a constant need for new learning. Requiring a huge amount of will to constantly improve.
This isn't that compatible with a traditional engineering education.
Dave, people got into your channel because it felt conversational, with a knowledgeable guy who has a knack for walking through complicated situations and explaining them clearly.
Gratuitously sexual thumbnails on a video about a tragedy, and weird AI images in the video itself, don’t feel like what “Dave’s Garage” should be about, IMO.
Well said.
Can you explain how adding images to a script detracts? If it's the same content, but more interesting visuals, I'm not sure how that's a step backwards. But interested to learn!
@@DavesGarage I don't really understand what the *sexy* thumbnail has to do with this story. As for the AI images, it takes us out of the story when we are shown photos of obviously nonexistent machines and plastic looking people. Real historical pictures of similar machines and scenarios would be much better. Your audience tends to be one with a bit more technical knowledge, seeing utter gibberish displayed on a fake computer screen in a photo detracts from that technical and historical feel you have often with your channel. Not to mention, many people dislike AI images due to copyright and ethical usage issues surrounding it all right now.
@@DavesGarage Allow me to explain Dave:
Lose the tits on the inappropriate, sexually suggestive, insensitive thumbnail and regarding adding AI imagery for the sake of it, it's a matter of content for content sake, not always needed.
He's right, not appropriate, bad judgement
Thumbnail choice needs some explaining...
Software engineering has taught me a lot about being intellectually honest and humble. Between compiler errors, my own pre-review testing, code review feedback, automated test failures, and bug reports, I've been repeatedly reminded over the years that no matter how confident I can be that I thought through something correctly, I can still make logical errors, fail to consider certain contexts, etc. I'm glad I work in the video game industry. Heheh, the stakes are much lower.
Yeah. Getting told by the boss that "Some gamers are so upset they started a new subreddit about the problem, which is in your code" is quite different from "The plane suddenly went into a nose-dive and all 300 passangers and the crew died... the problem has been identified as being in one of your code blocks".
@@andersjjensen Imagine how stressful it would be to write code for launch systems used for nuclear weapons!
Yeah, I could never write software people's lives depend on. I'm waaaaaay too clumsy and forgetful
Anecdote about bugs: There is no such thing as bug free code. Once I was responsible for the safety of a software driven robot handling plutonium products. The developers were given 2 week to specify their smallish modules, 2 weeks to code ad deliver it bug free (?) .. and then I would use a Monte-Carlo rig that I developed to test their code on a very fast computer for EIGHT WEEKS. In every case I would find maybe 4 or 5 bugs in the first couple of days. Then 2 or 3 more in the first 2 weeks. Then it would go quiet .. BUT .. in a couple of cases a rare timing bug would be detected in week 6 or 7 after working through millions of random tests which passed. Just as well we found & fixed these bugs!
(After this I used my Monte Carlo rig on many other less critical projects .. I found zillions of bugs in 'bug free' code)
To take it one step further. There is no such thing as bug-free code because such a construct would require bug-free OS's, bug-free Firmware, and Bug-free Hardware. Based on issues we have seen in just the past few years when it comes to CPU hardware-level errors, it stands to reason that the conclusion that bug-free software is impossible is correct even for seemingly simple constructions.
I heard that there was a bug free version of "Hello World", written in machine language on bare metal, running on a COSMAC Elf, in 1977. Things were simpler then.
@@JamieStuff Oddly, I was an RCA 1802 programmer in 1978-1979.
@@JamieStuff I remember the RCA COSMAC ELF and the 1802 processor. Subroutines were a pain to code. I worked for a lab which had a job developing a communication device to be deployed on a satellite. The 1802 was chosen for the processor because its CMOS architecture made it more impervious to effects of cosmic radiation in space. I believe the first CMOS Intel processor was a version of the 8085 in about 1985. We would have preferred the 8085, but it wasn’t going to be space qualified soon enough.
@@trapfethen exactly, not sure such a thing is even possible because entropy
One of the key people who helped find the software race condition bug was Frank Borger at the University of Chicago Medical school. He was active in the Chicago Area Real Time Society, the local DECUS group for the Chicago area. I met him at a number of these meetings but did not hear about his role in figuring out what happened to the THERAC systems until years later.
But how did someone manage to put WinDOS code on the Therac-25? OR was this Therac-25 code such sloppy hot garbage that it appealed enough to Dave to base DOS/Windows on it? YES! That would explain how he knows enough about it to make a video about it!
@@JPs-q1o No idea what nonsense that is about. Moreover I have no idea whether it makes sense to try to explain things to you.
@@Amstelchen Don't feed the troIIs, no matter how hungry we look.
For those curious, the ring he mentioned is related to the Quebec Bridge in Canada, which collapsed twice during its construction. It’s a fascinating story.
Would you happen to know a good read or watch on that there collapsery?
@@hedlund lmk if you find one. Thanks
The US version is known as the Order of the Engineer, and is a stainless steel ring worn on pinky.
Excellent video.
Sadly, having worked for decades on the software of nuclear and high security systems, I can safely say that fewer than maybe 5% of even highly experienced& qualified engineers have any regard or 'feel' for safety or security. Additionally I found that they CANNOT be trained to be safer or more security aware even if their weaknesses are noted. I also found that if you find major - but subtle - defects in the code produced by supposedly very senior engineers, they will vehemently assert that their code is perfect.
And they are willing to bet their job on that?
There is testing and the results.
@@20chocsaday I would edit their code because they wouldn't. It would then pass the extensive tests I was using.
Thenin lies the ERROR. Humans have ERRORS. Not the code. Code is produced by HUMANS. We have yet to find a way to protect ours selves from OURSELVES. - - - - I wonder who produced our code? 😆😆😆😆😆
Also, management won't allocate any time for a belt and braces approach where errors that get through one level of code get trapped by the next.
@@_Mentat For nuclear work, you get the time and money! For example, Canada used Formal methods to write a few lines of code which controlled the safety of their reactors. Those few lines of totally bug free code cost a fortune!
This is one of the reasons why I have reservations about autonomous vehicles. The risk of loss of life is huge, and the automotive industry tends to rush development to be the first to market.
The insurance industry won't underwrite it.
Agreed
The risk of loss of human life is also huge when humans are driving the vehicles. Would it still not be better to use autonomous vehicles (including bugs) if the death toll is less than human drivers?
@@chrisg6597 Try to solve the question of why an accident happened.
You will have arguements going back to the mines where children pull rare ores out of the earth.
As well as the years of testing software.
No more so than with human drivers, I suspect.
I was a developer on a medical device when the therac-25 tragedy happened. it was a truly sobering event to those of us in the industry.
you are absolutely correct that many lessons were learned, many regulations written, many procedures changed. but a handful of the core causes remain a problem to this day. More so in concurrent and real-time systems.
My Hope around this is twofold. first that every developer realizes that these problems can occur regardless of the program and language regardless of the operating system, regardless of the quantity of unit tests.
second, Nancy's seminal paper on the disaster should be required reading for every software engineer
When doing something where people's lives are at stake, it requires very different mindset.
Operating system: Preferably no operating system. I don't think there are many that can be used.
Language: That is easy. When there is garbage collection cycle or memory allocation fails because of memory fragmentation, that can cause airplanes to drop or nuclear meltdown if used in wrong place. That safety critical scene is likely still C, Ada and perhaps very small subset of C++.
Unit tests: Good for developing normal business application to cover something but they are nowhere near enough if human lives are at stake
Software usually doesn't need to be error free. Quality, and process to achieve desired quality should be specified.
@@gruntaxeman3740 Exactly this - as a programmer with 30 years of professional experience (mostly in the medical field, no less), I'm constantly shocked at the number of "engineers" that we hire that have absolutely zero idea of how the full software stack that they're using actually works. They've been trained to see everything below them as a black box that is guaranteed to work every time and will magically solve problems with parallelism, threading, memory management, and data persistence. In a perfect world, I'd agree with you that we'd ditch everything but the application, and make the application its own operating system. In the real world, I'd very much like to see hardened, full real-time operating systems that can guarantee responsiveness in a given number of clock cycles.
What I get is Windows. Even though my work is mostly EMR related (as opposed to medical device), I've just seen so much crap stacked SO HIGH that it seems a special miracle every time a patient encounter succeeds.
In the medical field, if you don't understand the full stack from the machine code writing the registers all the way up the stack to the user interface, you're simply begging for trouble.
As always Dave, a great video. I'd like to add though, it was a combibation of a race condition and an overflow error... the code that checked if the collimator was in place would return a zero, before the beam could fire. To make sure zero wasn't present until the check had been performed, code in the setup loop would increment the variable in question, named Class3. The variable Class3 was 8-bit, and would therefore overflow when it reached 255. This meant that the beam could fire even if the collimator wasn't in place, when Class3 overflowed, approximately 0.4% of the time. Courtesy of Matt Parker's brilliant book: Humble Pi - A Comedy Of Maths Errors.
Thank you for bring this issue up. I work in medical devices, mostly in automated robotic surgery. Reading the Therac-25 is mandatory for anyone working on mission critical code that could hurt/kill people. This was why ISO-62304 was created, which is the software live cycle requirements for software used in medical devices.
Glad you’re talking about the Therac, but I was hoping for a much more in-depth technical explanation of the code flaws from your expert perspective given how delightfully detailed you’ve been for many other technical topics on this channel. If you ever were inclined to do a part 2 of this with more details, I for one would be very interested to see it.
I'm wondering what accidents will happen in the future because code has been taken from AI output, and only cursorily checked by some cheap developers in a "best cost" country.
I work for the company (AECL was legally continued on to be CNL) that made the THERAC, we went on to make nuclear reactors that are controlled by software code. It is so expensive to make quality code, and the real difference between when someone calls themselves a "software engineer" and being an actual software engineer.
0:20 When Elvis gets THERAC'd
A family member of mine was friends with the woman who was injured by the Therac-25 machine in Hamilton, Ontario. Although she unfortunately later ended up passing away from her cancer within a few weeks, her autopsy revealed that her hip was completely destroyed and that she would have needed a full hip replacement. I remember the first time I heard about that and being absolutely horrified at the idea of being killed by the revolutionary machine meant to save you.
AECL made a bunch of changes afterwards to try and make it safer but they didn't correctly identify the issue and more people ended up being hurt and killed afterwards.
I'm in my last year of engineering at UofA right now, and theres a mandatory risk management and safety class. So many computer / software engineers roll their eyes about safety. There was even that controversy a few years back with APEGA ordering job boards to stop using "Software Engineer" for non engineering jobs. All this to say that even today software is regarded as this ultimate safe tool, perhaps even more so because of the prevalence of the PC. Thanks for your breakdown Dave!
sorry dude, the slightest sniff of AI imagery makes me unsubscribe by reflex, that shit wrinkles my bain
To me what I find more interesting is how the programmer has never been identified or seen again.
I will shamefully admit to a morbid curiosity in trying to identify the programmer. his name is known to a handful of University professors that worked the investigation.
ironically, I do take great pleasure in thinking of it as one of the first internet celebrities that has never been doxxed
@@miscellaneousHandle "I will shamefully admit to a morbid curiosity in trying to identify the programmer"
Same here. For years I have always wondered how this person has avoided being named. It is fascinating....
Seen again?
The programmer wasn’t the person most responsible for the problem, it was whoever decided to remove the hardware safety interlock. The same software bug was on the previous model, and no one got injured because of a hardware safety interlock.
@@romangeneral23The programmer wasn’t the person primarily responsible, it was whoever decided to not have a hardware interlock like the previous model (which had the same software bug, but no one was injured).
Hi Dave .. your videos are all amazing but this one is one of your best from my perspective. As an engineer in Canada and wearing the ring since 1990s, and going form structural to software careers, you articulated exactly what i tell myself, colleagues and younger aspiring programmers and engineers. This story should be part of anyone's studies in school.. university .. and work. I will share with many. Be well and thanks again for sharing on your channel.
But how did someone manage to put WinDOS code on the Therac-25? OR was this Therac-25 code such sloppy hot garbage that it appealed enough to Dave to base DOS/Windows on it? YES! That would explain how he knows enough about it to make a video about it!
The United States now has a variant of the Engineer's Ring. My diploma doesn't technically say "engineer" so I wasn't invited, but my partner was.
The Therac-25, the poster child of bad user interface design and bad software design meets death.
There's one valuable lesson that every software engineer need to learn, and that is, "You can't use 9 women to make a baby in one month". In other words, don't rush it.
Most software engineers understand that one but manglement usually has a problem understanding it.
But you can try..?
Tell that to the "AGILE" idiots 😃
"The Mythical Man Month"
ugh, dont use "ai" generated images, its really bad - it feels like filler and low effort.
His scripts have been sounding AI-generated lately as well… sad, used to love this channel
I knew someone who worked for Picker International (now part of Marconi/Philips) in Solon, Ohio in the late 1980s and he told me about the Therac-25 linear accelerator incident back then and it sent shock waves through the industry. I actually interviewed for a job at Picker and they were using PDP-11s, I believe 11/34, to control the systems built in the department where I interviewed. I was pretty humbled by what I heard about the incident and I did not trust myself to be working on code that had the potential to cause grave harm if something went wrong. We’re pretty accustomed to PCs being reliable these days, but in the 1980s DEC reliability was good but not perfect. What would happen in the event of a computer failure, or worse a partial failure such as disk head crash?…
"We’re pretty accustomed to PCs being reliable these days"
Ehh.. I wouldn't trust any PC to run anything critical without much lower level safety catches in place (external to the PC).
@@patrikfloding7985 Also - in the late 1980s or maybe early 1990s, I went with friends to Cedar Point amusement parks near Sandusky, Ohio. There was a ride called Demon Drop that dropped a car about 50 feet and then dissipated energy by rolling with the occupants flat on their backs down a short track with retarders. The was a hut at the bottom of the ride. There was clearly visible through the window a generic tower case PC, probably ‘286 machine, with no-name amber screen monitor. I couldn’t make out what was on the screen, but I’m glad I saw that _after_ going on the ride. In all honesty my guess is the PC was there collecting performance data and the ride was controlled by ladder logic - at least I hope.
I would love to think we have learned , but i work in IT in the medical sector so i know better.
I won't say who I worked for but I was an electronic medical records dev. I found a device driver 'feature' on db2 that added a null terminator at the end of its buffer. Problem was we used rtf to store notes in the db. In some cases there were entire sections of chart notes just gone. Drs had no idea it was happening. I told management and I got "We don't take responsibility for FDA approved, that is on the Dr " .
I quit and got out entirely.
Thank you for shining a light on this issue!
The thumbnail gave me a chuckle. A/B testing the cleavage to click ratio? Would love a video on the results! Great video as always.
Yup. I A/B tested it on a lark (it's an AI creation, and I didn't ask for cleavage!), and it performed vastly better.
@@DavesGarage I do wonder about the viewer retention difference though. Did people who clicked on the cleavage thumbnail still stick around? Please make a video about this!
Plainly Difficult is an excellent channel for covering many of these sort of cock ups, see also his video on this goof up.
My father was an electroradiology technician in a clinic. Many times as a child I was amazed with all that equipment from soviet times. It was like sci-fi movie in real life. I love that old design but scare as f*ck of radiation. If you as a kid see how that shit works, the noise when cathode are spinning, transformers buzzing, warning lights light up... damn its so cool and traumatizing at the same time.
Like all those strange machines and devices in those old Frankenstein movies.
You might like the “diode gone wild” TH-cam channel. I think he’s in the Czech Republic. He’s had quite a few episodes demonstrating and taking apart Soviet era electronics. The guy is super smart.
when i was a kid and i went to the hospital i wasn't even allowed to look at the machines i would ask doctors about how they worked but they didn't answer a very toxic environment .
@@belstar1128 that's sad
I’ve heard about this story before but it’s always interesting to revisit from different perspectives. This is one of those seemingly inexcusable events. I’d be curious to see the chain of decisions that led to this software control without redundancy.
I worked in the elevator industry for a bit. Modern elevators are software-controlled and have been since the 70s or so but on highly specialized/hardened hardware and every piece of the software chain has an electromechanical backup of some sort. It was sometimes difficult to test design changes because we had to bypass 3-4 layers of redundant safeties to cause the “failure”, and even then, the components had such a large safety margin that almost nothing was truly catastrophic. That’s not to say failures don’t happen, but when they do, it’s almost never a systematic design issue.
dave is a man of culture, the thumbnail lol
Ha yep, I have vague memories of this story from my CS ethics course. Definitely a tale to learn about for software engineers, but the thumbnail left me wondering what it had to do with the subject, lol. The AI generated vignette left me chuckling too.
I guess Dave is trying to spice things up?
Thumbnail got my view!
/!\
Warning: Are you sure you want to nuke this pasient?
[Yes] [No]
Even the first code I wrote included sanity checks on entered data, perhaps comes naturally with a Fortran mindset !
I was very interested when a co-worker doing embedded software told me about a document titled 'MISRA C' (Motor Industry Software Reliability Association, C Language).
While some might read it as a simple list of good programming practices, I read it as a list of rules that were developed because something bad happened when it was done another way, and this new practice would address the issue. Much like the safety labels on step ladders, which were added to prevent another person from suffering that type of ladder injury.
Very similar to the Federal Air Regulations. There's a saying in the aviation community, "The FARs are written in blood.". A large percentage of them are the result of lessons learned the hard way.
Also, the primary reason for warning labels on ladders is not to prevent accidents, it is to prevent successful lawsuits when accidents do occur. It's possible to manufacture ladders that are many times safer than the ones you buy at the home store. It would just be impossible for a homeowner to afford, store, or even lift one. It's also why the user manuals for even complex power tools spend very few words on how to use the tool. Instead they are filled with "warnings" like "Do not operate with guards removed"..
Considering how tragic this story is, I find your thumbnail in quite poor taste.
It's pretty bizarre. The vid had a more tasteful one initially, and the AI cheesecake isn't at all this channel's usual style.
So an ugly person in the thumbnail would have been better? Can you explain why?
hm. i see the point on both sides. however, i do find it a bit risque unnecessarily. but we shouldnt let the image on the cover take away from a very well put together video.
@@DavesGarage The best way I could explain it is that I (and a few other commenters) feel that imagery that's designed to generate a prurient interest in the viewer cheapens the grim reality of the story being told. It's a question of tone mismatch as much as anything else.
I've worked in a company in the MRI field, nothing therapy related just imaging. Still it's possible to e.g. overheat a patient through absorbed RF energy, and there are physical interlocks in the RF amplifier path. The Therac-25 case was in every engineer/scientist employee's onboarding orientation material, just to remind of past mistakes.
The Therac-25 incident is something studied in my computer engineering course in university. As a high school computer science teacher, I tell my students on Day 1 it's my job to teach them good programming skills and critical thinking because I do not want to die! I'm hoping my students go on to write great code and build awesome things and I want to set a good base for them to build upon. Yes code is written for computers and machines, but it's people who interact with it and people who are affected by what it computes. Something we must be responsible for.
The reason that code went into production as egregiously as it has is linked to the reason the individual responsible for it not only wasn't jailed or even fined; we aren't even allowed to know their name. When someone has had that level impunity, how long do you reckon they'll keep caring, realistically?
I thought this would be about pagers!
Since when has Therac made a pager...
I suspect the pager software was reprogrammed to trigger a small explosive on a specific text msg. Took some know how and quite a bit of planning in fairness.
And now two-way radios aka Walkie Talkies. A bunch exploded today.
@@mlann2333 But like, did they have access to the _whole_ pager supply line?
I wonder if this is more a Stuxnet situation, and they were just normal pagers that somehow had a way to weaponize (overvolt) their batteries?
It's quite a sophisticated mod to add an explosive charge _and_ trigger it when a particular alphanumeric string is displayed - not received; displayed!
And somewhere there must be an auto-dialer messaging each pager.....
I got one better… COVID 19
I was taught this case in a Software Quality Management course at uni, and it is singlehandedly responsible for my enduring insistence on quality process
Trashy taste of thumbnail for the subject matter
Hey, at least I don't strap my cat to my vacuum for a photo op ;-)
Concise and plainly stated. Well done! One of those sayings that really annoys me is, "Good enough for government work." I was a federal civilian welder in a navy shipyard - if people understood just how damned good our work actually had to be, particularly on SUBSAFE components, only perfectionists would use that saying.
Perfectionism isn’t universal, sadly. Remember the metallurgist who got done for having pencil-whipped decades of strength test results on submarine steel because she thought the test standard was too extreme? We are fortunate that there was so much extremity built into the standard, because no amount of perfectionism downstream can compensate for that kind of confident incorrectness hidden in the supply chain.
@@tinad8561Very well put. Think of the nuclear power industry. Adm. Rickover was right in his management of the Navy nuclear power program. High standards in ALL aspects, human and machine.
Yep. In government, we are presumed correct and therefore we have much greater responsibility to be so.
I'm amazed that they did away with the mechanical safety fallbacks. My first four years of work experience was in nuclear power and for every electrical system we had a mechanical backup.
Starring Worried AI Elvis
Seen from the pictures and redundancy in speech patterns I have the feeling Dave tried having an AI produce this episode.
This story reminds me of a similar situation that happened in the rail transportation industry, where I spent the majority of my career. Prior to the 1980s, safety systems (called interlockings) for railroads and railways involved the use of relays all of which were designed and tested for failure conditions, with critical ones being guaranteed to be failsafe, thanks to springs and gravity. In the 1980s, the industry started developing electronic systems to replace the relays ... a large room of relays could be replaced with a rack of 2 or 3 microprocessors, an obvious economic and maintenance advantage.
Since the relay circuits are, in essence, boolean expressions, the microprocessors and their software were written to handle similar boolean expressions. Each relay circuit was broken into its equivalent Boolean expression, and entered into the software as data for processing. Fortunately, a problem with this method was found during early testing. You see, relay circuits are in essence a large machine with massive parallel processing, while a microprocessor only does one thing at a time, even when many software threads are involved. This caused the kind of race conditions in the software that Dave describes in the video. Fortunately, this was found and rectified in two ways: processes were created involving multiple developers, validators and testers to ensure correct and safe operation; and some failsafe relays were maintained, just in case the software did do something other than intended. The industry has been using electronic interlockings for years now, with a wrong-side unsafe failure a very occurrence.
I reached 250 thousand dollars invested, it took me 2 years, last month I received 30 thousand only in dividends. Only with believers. This month it will be 40,000 and so on, in the next few years it will be 500 thousand in the year alone in Bitcoin ETFs and other dividend yields. What took me 2 years to invest, I will have in 1 Year
I am 52 years old. I reached my first 100 thousand dollars in just 3 months. I started with 20k investing in Bitcoin ETFs and other dividend income. My medium-term goal is to reach one million dollars before I turn 55.
Cryptocurrency investments pay a higher percentage return than any other investment. Mainly Bitcoin ETFs, which mostly pay out every week
This year I reached 100 thousand invested in Bitcoin ETFs and other dividend income, it was exactly 1 year and 4 months, I already accelerated to reach 200 thousand, I think I will reach the goal sooner
Success is always the greatest happiness, I have been in the market since 2020, I have a total of 945 thousand dollars with my 75 thousand dollars invested in Bitcoin ETFs and other dividend yields Investing in cryptocurrencies was the best decision I made in my life
How did you achieve this in a short period of time?
Great video. Any chance of a more in-depth technical explanation of the code flaws as a follow-up? I would be extremely interested in that - especially as I did a lot of dec assembler programming in the 1970-80s.
From reading some documentation re the Therac-25 and the software design, it seems that a real-time executive system was written specifically for the PDP11 controlling computer using assembler. This executive then processed various tasks (also written in assembler) using a pre-emptive scheduler. One of the issues was race conditions between the various tasks.
Why was such a complicated software system designed and used? From my understanding (very limited I admit) there seems to be only 3 stages involved in administrating a treatment. 1) input of parameters and verification 2) positioning of the various machine components (and verification) and 3) treatment execution as per the given parameters. To me these seem to be sequential tasks so why was a scheduler and various separate tasks used? Wouldn't a much simpler linear design have removed the software issues around scheduling and race conditions? Doing this using several pre-emptive tasks wouldn't have been my first thought when designing a system like this.
the company I work for still makes machines like this, albeit not medical devices. I showed them ISO 13849, they said it was "too expensive" to buy the standard. oh well.
It's quite disturbing that such device weren't rigorously tested before widespread use measuring radiation levels 🙁
At the time very few people in any engineering discipline understood some of the unique issues with software based systems, including most programmers. Testing would likely not have revealed anything as the problem surfaced when the operator did certain things at a certain pace (according to this video).
@@patrikfloding7985 I don't know. I think that's bad excuse. On such high safety level needing device every single possible scenario should've been taking into account. But it's bizarre that physically existing safety locks weren't there. Working in industry PLC programmers f'up all the time working in hurry and bit left handed. But equipment I'm involved doesn't doesn't hurt people if done wrong. Well in some equipment there exists safety relays to prevent door opening if machine is in production, even when program should handle it. But if there are such things in packaging equipment, it's absurd that radiation related devices where made in such haste. Especially considering how much money is involved in medical equipment.
In the 1980's, while working at a computer/software consulting firm I received a request from a local university to develop a machine for psychological research. The machine was to administer electronic shocks to people taking tests. I took one look at the requirements, decided that it was an unethical and dangerous application and refused to participate. I'm sure our insurance company is still thanking us!
But was the real research how many developers would write such code? (I hope so!)
@@afaulconbridgeWhy do you hope that?
@@chrismoferBecause it is less unethical and dangerous than running the machines for the experiment they are asking the code to be deloped for. It is removed by several steps.
@gm2407 Great that you replied. Ethics seems like an afterthought these days in business and many other areas.
@@halbos7637 The thing that bothers older people about younger people is lack of forethought about others and consequences. But the truth is it is most people in a lot of situations that don't consider things deeply. Habit, time constraints, lack of reprocussions observed. Ethics is a subject as deep as the universe but can only really be estimated like the movement of celestial bodies via calculus.
What, is it some rite of passage that every computer and nuclear technology youtuber has to do a piece on the Therac? People *died*. This is a *tragedy* that happened to real actual people. Not your fucking content farm. Get real.
Management went for cheap and quick, they should have gone for good.
This is a great story that i would like to share, but can't because the gratuitous thumbnail would offend the intended audience.
If you had clicked on it with the original thumbnail, you could have at that time. But it appears you did not visit the video until I updated to this thumbnail. Perhaps you missed it the first time around.
@@DavesGarage Hey Dave, maybe take your attitude down a notch and realize that not everybody is subscribed to your channel, and even if someone is, then maybe they don't watch the video immediately when you release it. People do have lives and responsibilities, you know. Not all of us are rich and can do whatever the hell we want. And for the record, it would have been worse if the OP *HAD* shared the video before you changed the thumbnail, because their intended audience would have viewed the video AFTER you changed the thumbnail and they would have seen this thumbnail.
Taught this in first year computer science.
Ethics with computers really makes you realise…
Code kills.
A tragedy for all involved 😢
Something I have said for a long time, "Stop calling them software engineers until they're held to the same standard as real engineers."
Therac-25 , The real reason Elvis died.
Click baity thumbnail? Not the usual for you Dave?
You need to do a postmortem on the Post office scandal?
Good idea. Massive scandal that one.
I have seen this covered by a half dozen TH-camrs and almost didn't click. I'm glad I did. You provided a much better representation of the low level issues than any other recount I've seen.
Herd this story ages ago, I thought you were going to discuss the software much more closely, otherwise why revisit.
When did Dave start using clickbait thumbnails?😂
I find the use of "AI" images super distracting. just looks so bad
Elvis.
Agreed
Were they using Windows? That would explain it ;)
nuclear submarines being computer controlled, i just about fell off my chair laughing when i heard that. we barely had electronic controls much less computer control.
you should do a video on the Apollo 11 1202 error that almost aborted the landing.
Yes, So what's the solution? Given how much care Microsoft puts into their code, Heck it runs some nuclear weapons consoles. Windows shouldn't have any bugs. Right?
Good one!
Dave, makes me think about how much “old code” still lies in modern product, used by us in every day life?
IIRC some of his original Task Manager code is still in the current version. But I might be wrong.
This question makes me think back to the Y2K crisis. Code that assumed the first two digits of a year were "19" was likely to break in the year 2000. I know people who were fixing code written in the late 1970s, which was 20 years earlier. That code was sometimes written in languages that were only known by retired former employees, who made a fortune consulting!
@@HweolRidda
Lessons were learned. Some companies make big offers for information systems to government. To maximize profits, they just put there Oracle Forms, make things around proprietary CRM/ERP or other turd. The they make fortune later and stocks go up!
I work in industrial automation, I always conduct myself aware and knowing that both my life and other's lives are on the line when I'm working on or making changes to equipment, whether that be to the wiring or PLC programs. Never be complacent and do professional work you are proud of, never cut corners in our field.
"I'm in this for the like and subs", proceeds to put unrelated eye candy in thumbnail for maximum click bait returns. You're better than that Dave.
Explain why it's bad. Maybe I have a blind spot. But I don't see why an attractive person in the thumbnail is a problem....
@@DavesGarage sexy click bait is going to drag in views that you don't want. It damages the brand you've been building. You've done such an amazing job of being an authority on technology. I'd hate to see that reputation squandered. Ultimately it's your channel and you have the right to run it the way you please, best of luck.
@@DavesGarage "Explain why it's bad. Maybe I have a blind spot. But I don't see why an attractive person in the thumbnail is a problem" - I know that clickbait thumbnails have become almost standard on TH-cam, but there are reasons that people hate them. Speaking only for myself, I expect a thumbnail to be relevant to the video. Unless there is a Julie Gonzalo doppelgänger in this story, the thumbnail is not relevant, thus the only purpose it serves is to trick people into clicking on your video. That's another thing -- most people don't like to be tricked like that.
Was that thumbnail really necessary? 🙄
It worked exactly as intended, so yes.
@@RS-ls7mm Oh yeah and what did he intended with it?
It would be nice that for once tech bros would understand that women work in this industry too and acting like wankers is just cringe, is not "fun" or "cool" in any way.
I had respect for him but it seems this issue runs deep in the "culture" of men in tech industry, not beign able to see women like normal humans and the constant need to sexualize us.
@@squirrelingaround As your kind is so quick to point out, its not for you.
@@squirrelingaround How arrogant that you think everything has to be designed just for you.
@@squirrelingaroundI agree - the video is a bit of an abberation in terms of what Dave's videos are usually like, I.e. very good.
This video went hard on the AI pictures - not necessarily just the thumbnail, but they detract from what was supposed to be a factual representation.
The most egregious moment was the use of a stereotype sysadmin picture - long haired slim white dude fresh out of the 70s - when the script inferred nothing of the sort.
Maybe lay off the genAI for a while - the videos are usually awesome😊
Is it possible that the software engineer that was never identified is now working for Boeing as a senior quality assurance engineer???
Nope. Boeing had great senior quality assurance engineers.
The problem? They were too good. So they got fired. Because their found issues would affect the time plan. Managers wanted people who wasn't so strict when producing/evaluating all quality documents, and when verifying everything was done to spec.
That was 40 years ago, so hopefully the coder is retired by now. Not to give a pass to the coder, but race conditions in software can be very difficult to detect and duplicate for analysis, especially when they’re triggered by an anomalous state of peripheral devices.
When I did mainframe work we had a GIGO (garbage in - garbage out) card deck of random unformatted data which we forced programs to process as input to see if the program properly handled totally unexpected exceptions.
I’m not sure you could call it a race condition, but rather a cascading failure when the entire Bell System long distance telephone network collapsed for almost a day in 1990. The problem happened when a switching fabric experienced an overload. The exception was handled with a break statement which was intended to drop the thread of processing out of an _if_ conditional. The problem is that _break_ is only relevant to _case_ structures, but anywhere else in K&R C, _break_ is treated as a no-op. Execution simply continued in the _if_ leaving the switch locked up. That locked up switch caused load to transfer to other switches which in turn overloaded with the unexpected traffic until the entire system was gridlocked. The vagaries of C, an apparently inexperienced programmer, not enough code review, and a rare but possible condition conspired to set off a disaster.
boeing probably hired people based on everything with the exception of being good at their job.
@@Geek-A-Hertz8707 As I understand it, having a (detectable) heart beat was and still is an absolute requirement.
As an industrial control systems specialist I have seen software bugs persist for 30+ years before being discovered due to the exact conditions needed to produce them arising. You can never be too careful with software that is controlling machines.
I watched a few other vids on this machine. Crazy story
and then there is the firmware bug in some cheap self-cleaning litter boxes that kills kittens :( ...
A physical device that will kill if there's a firmware bug, in today's sofware environment? Holy...
I heard of another such device with the much simpler problem that if the operator made a typo entering the dose and used DEL to correct it all was fine, but if they used BACKSPACE then the digits on the screen vanished but the number received by the control program contained _all_ the digits entered!
Dave,
Thanks for showing this case study. It was informative.
This case is one of the case studies used in training System Safety and System Software Safety engineers. The case study points to the code reuse and the lack of hardware interlocks. This is the first I heard that they used a single, nonprofessional programmer.
We know today it is possible for well designed software to contain flaws (look at the USMC's Osprey during test). Military and industrial entities want to minimize risk to their people and investments. Many put processes in place to minimize software risk through processes based off of standards (for example IEEE and MIL-STD-882). Design and code reviews are essential to this process. System Safety engineering got started with nuclear weapons and submarines, where minor flaws could be fatal.
As and old Chief Petty Officer told me, "There isn't a safety regulation in the US Navy that wasn't paid for in blood."
As a cancer patient, I feel terrified of the idea that a device which meant to cure me, can be my killer because of a software bug!
as described: it was not a software bug - it was leaded at sabotage based on profits.
perfect murder ?
Modern machines have both hardware and software interlocks. That same fault would be impossible to happen for several reasons - I work on these exact type of machines (Linear accelerators) and if something even 1/1000 as bad happened the machine would stop treating and alert everyone, and, if you're getting treatment in Manhattan, I might get a call.
@@blevin591 thank you, that is relieving.. I live in Jordan and the country is well known with advanced healthcare treatments.
You are the fourth high-profile channel I have seen bringing this tragedy to light. I Iove that your rendition is unique, much like the others. For additional view points, see Kyle hull, low level learning and fascinating horror.
Yup, Kyle did a great job. Enough so that I had to go in a totally different direction, so did!
I usually love your videos but aren't you a bit old to be using click bait thumbnails?? I lost some respect with that tactic.
You didn't watch it until I changed it, so you tell me!
@@DavesGarage I only came in to comment, not watch and removed from my watch list. Normally I block channels that do that but maybe I should.
I'm here for the babe in the thumbnail!
And the scariest part is , that is not just "based on a true story" , it is a real deal and it happened with real people...
Who else learned about this at uni
lmao the thumbnail change
I hope you don't mind a little unsolicited opinion on the video production itself, rather than the content: I really don't feel that the AI images add anything to your videos, Dave. It's such a controversial topic at the moment as well and I don't think I'll be the only person who rolls their eyes every time they see a clearly AI generated image in the middle of a video that doesn't need it
I appreciate the feedback. I wanted to do a different approach for this video, or the first couple of minutes. And so I did... and made it the way I wanted. I worry less about the "right" way or algorithm-friendly some days, I guess.
@@DavesGarage totally fair. Thank you for taking the feedback in the manner in which it was intended. I'll still be watching your videos regardless ☺
Sometimes I would press enter rapidly twice to rename a folder and open it. Just yesterday it happened to me that I tried to press ctrl+v, and it threw me an error that the location does not exist.
Upon investigation, it turned out, that when I pressed ctrl+v, the file manager still tried to move the files to "New Folder". I pressed consecutive enters too quickly. I was inside a folder that was referred to by a name that didn't exist anymore.
Not lethal, but it happened yesterday, and it was for the same reason... software still can't handle input that's too rapid in 2024.
Absolutely. Much of application level software has loads of race condition issues. And by application level I include all OS software that's not part of the kernel or drivers (utilities, etc).
Autonomous cars are nothing, at least they're on the ground. No modern plane can fly without a computer. What about Air Traffic Control, Nuclear power, submarines, and traffic lights? Even your average car would quit without the computer. Every day, we put our lives in a computer's hands in 1000 ways we aren't even aware of.
Air Inter in France crashed due to a software design issue in the late 80s.
Federal Regulatory Agencies. 😂 LIke they care.
Run by industries they’re supposed to regulate.
Inputting parameters too quickly resulting in the software glitching out reminds me of the computers at high school. Bizarrely, if you entered in your password too fast and hit Enter, you'd login to the school I.T. admin's computer, even though you used your own username and password.
Gee, software in safety critical systems not being tested thoroughly? And written by non-professional programmers? Where have we heard about this before?
He or she got paid to program, so was a professional programmer. Pretty standard to be mostly self-taught back in those days.
Boeing managed to kill 300 with the same lack of testing and managers rushing a product to market.
Boeing employees are striking now in protest to return those quality checks, but management is threatening them as a consequence.
I'm afraid Boeing is going to continue to kill customers and astronauts.
Great video Dave and thank you for showing us.
People don't believe it when I tell them computers make mistakes all the time. Reminds me of: Did they try turning it off and back on again?
Perhaps you should tell them: "computers rarely make mistakes. Software makes them all the time".
Dave, I found all the developers of the Therac-25, they must be working for GM writing the Infotainment system for their new EVs, as my Blazer EV for sure has never been tested in real world scenarios for flaws the infotainment contains. It is so bad that my car has had error codes from the day it left the dealership a year ago and no dealership has ever figured out how to fix it. Thank you for shedding some light on the importance of great code and better testing.
Now, for what it's worth, all of the machines which actually exhibited this problem were reported to have flagged and displayed an error - Some of them more than 200 times. Part of this blame absolutely lies with the operators who continued to use the equipment regardless.
Admittedly, providing full text error mesaages would likely not have helped much. "Beam turntable did not reach expected position for selected operating mode. Contact service" (the actual error) is still not all that helpful
It'll be fine
I killed a herd of cows with an extra 0
I was amending a feed formulation program (written in DIBOL running on a Vax). In those days we coded in coding sheets then went to site and keyed in the changes directly onto the live system.
Working on the calculation for quantity of active ingredient (drugs) to add to the feed. Active ingredients had potency values which need to be factored in.
I had miskeyed an extra 0 in the calculation which resulted in 10x the amount of active ingredient were added to the feed than there should be.
When a farmer fed the feed to his cows, it killed them all.
The animal feed company was liable because they were supposed to be running the new system along side their old system and cross checking but didn’t they just ran with the new untested code.