I sometimes enjoy just reading all the work someone has gone into fixing some obscure bug, because I can see how much work they put into it, and I can emphasize with the sense of satisfaction one gets when one finally pins down the actual root cause of some sporadic but infuriating bug!
@@LTVoyager - Well, I've done that too. For me there's more of a rush of satisfaction in finding one bug *in someone else's code* which has been causing me headaches for months or even years. It's a very short-term rush though. Just a few hours of thinking "I finally got the bastard! [meaning the bug]".
I had a friend who was customer engineer working on IBM equipment, and got a call out to a machine which had a vertically mounted 8" floppy drive as the boot drive. The report said that it wouldn't boot and was making a horrible grinding noise. Anyway, he inserted a new 8" diskette, turned the machine on, there was a horrible grinding noise and the machine didn't boot. He pulled out the diskette and found it was badly scratched. Tried again, same result. So he took the machine apart In the drive he found cigarette ends, aluminium foil and other debris. Turned out the cleaning lady thought it was a flip top bin and was emptying the trash into it.
@@madbradfreeman in 97 or 98 I had complaints from operators at a plant that sometimes their operator HMI screens acted up for a little while so they couldn't see the graphics. We were blaming the UPS and wasted much time troubleshooting that intermittent problem with no luck. One night I was working late and about 9 pm old Ron the janitor came in to the control room and plugged his vacuum into the one unused orange UPS receptacle and turned it on, the 21 inch CRT monitor I was at immediately lost its mind until I reached over and unplugged Ron's vacuum. I finally just clipped the little jumper bars in the receptacle between the two plugs so that unused plug stayed dead. Problem solved lol. I also caught Ron spraying his cleaner solution over those big CRT monitors and wiping them down with a cloth. Janitors man 🙄🙄
Back in about 1983 I had one of those transformative debugging experiences almost exactly as you described. I was working on an adventure game in Z-80 assembly on my TRS-80. Everything had been going just fine for a few days. Then, I added an item to a table and ... boonies. As I recall, a full on crash was not very eventful. A couple of clicks on the floppy disk and the TRS-80 prompt appeared on screen. I feel silly explaining it now as it seems so obvious, but at the time I just couldn't see it. Stared that mother down for a day and a half. Turns out that my 'items' table, (for which I had allocated what seemed like lots of memory), overwrote an adjacent pointer table by a couple of bytes. I've told this story many times, suggesting that until you've stared at a monitor for 13 hours straight trying to figure out how the simplest of changes brutally crashes the computer, you don't know what it's like to be a programmer.
Yep, the most difficult ones have always been the most obvious ones. The reason why at work we always used the 'fresh eyes' approach. Getting someone else to have a quick look to see if they can spot something obvious which you're not able to see because you've been staring at it for hours/days. The first few times it's extremely frustrating and annoying when someone else comes along and spots your mistake in less than 5 mins, but after a few you come to appreciate it as a valuable tool. And of course you get the opportunity to do it to other people's code as well 🙂
@@farab4391 Exactly. We used to refer to the 'janitor syndrome.' Same idea ... A janitor walks by after you've been staring down your code for hours and asks, "Why did you do that?"
This is really a blast from the past. I remember installing a hard drive on a floppy only system for one of my lab mates. I chose RLL versus MFM and the people we were buying the drive from kept questioning me. She said they didn't sell that many. I told her she should be happy that I was reducing her slow moving inventory.
I think the hardest bug I had to track down was when HP wrote a new implementation of Appletalk as part of a new network card. I had a customized version of CAP (Columbia Appletalk Package) which work absolutely fine with this new implementation, except that occasionally a network connection would just hang up *forever.* After months of debugging, I finally figured out that the problem was in error-handling of a dropped packet. And what's worse, is it came down to subtle and ambiguous wording in the official specification of Appletalk. The bug which took me the *longest* to track down was in a simple client for chat system. It just gets lines from the chat server and writes them out while letting the person at the terminal keep typing in their own messages. Every once-in-awhile my client would print out an extra blank line. It took me at least 18 years to track down that bug!
I was one of the coders on a Playstation 2 game. It was near completion but the game would be unstable, it would crash. All the coders were pointing fingers at each other, none of them were debugging the issue. It was obvious there was memory corruption, but the PS2 tools at the time did not have memory protection, did not have boundary checking. I spent days making the C/C++ code compile and run on a Windows PC using MS DevStudio. I enabled full heap and stack checking using the MS compiler options. Leaving the code to run a loop through the title screen, into the game levels, and back out again, quickly showed that an old bit of code that handled a general purpose link list of objects was to blame. It was not correctly handling unlinking and would lead to incorrect memory blocks being accessed and modified. In the end I made sure to point out what coder was responsible and make sure they knew exactly how much work and risk they had caused.
Yo thats so cool! Can you share more about the PS2 development? I grew up on the PS2 and would love to knlw and or learn more about how developing was for the PS2 back then!
@Dave's Garage: Thanks Mr. Dave.. I love your content, and always enjoy the thrill of catching up on new episodes. Thanks again. Ahh, you take me down time's road making me reminisce my childhood growing up on an Atari, Apple IIe, Apple Mac, 6x86 Cyrix; IBM PC-DOS; IBM 486DX66Mhz MS-DOS then Windows 3.1 then Red Hat Linux.
I LOVE UAC which came with Vista. Thank you for that! OK, the Win7 version required a few clicks less, but the concept of session isolation and being able to switch that easily for just one command is great. Today: Always set UAC to the highest level, and on servers even one step beyond to require a UAC confirmation for shutdown or reboot (unless you run the shutdown command from a shell started with admin rights). Dave, go back to Microsoft and fix the Windows 11/Server 2025 mess. There are so many weird bugs, like my current pet bug with data corruption in a Nested-V Machine with 1st-level being Server 2022 or Server 2025 using deduplication on the drive for the second level guests. Server 2019 is fine. I would post a link to server insider threads, but da tube... Oh, and make the UI fast, efficient and responsive again, so one does not need a fast GFX card to not go crazy and can use a thunderbolt dock again without going insane over that slow speed since it is still just thunderbolt and not PCIe16x.
The hardest bugs I've had to fix usually came down to hardware bugs in the ASIC that had to be worked around somehow. There is one bug fix I've always refused to sign off on, because it was all a lie by the sales team. It required an extra capacitor on the circuit, and there was no amount of code we could write that would fix that one. But somehow, that's how it was sold to the customer.
Gotta add mine. Was a C static variable initialization order problem. It resulted in reading 1 byte beyond an array. Only showed up on SunOS when built with Sun’s C++ compiler without a debugger attached and NO code modification. Turned out the stars had aligned to place the array at the end of a page of memory and triggered a bus error. Obviously this was well before we had tools like valgrind. I’m surprised I still had hair by the time it was found.
Great video! Talking about software, I'd love to see a video on every big Intel CPU (8088, 286, 386, so on) explaining key architectural differences and instruction additions that made it faster and easier for software developers to use like MMX and SSE instructions. I don't think very many videos exist talking about the improvements through the 90s especially, and it would be cool to see how Intel was a leading innovator through the 90s compared to the harder times they are facing currently. Thanks so much and I hope you get a chance to see this!
[3:57] The attic is under the roof Where's the roof? On top of the house. Where is the house? You're not going to believe this but it's next to the garage.
I liked the BASIC on the C128, sine you could merge existing programs then run 'Rebumber to tighten up the layout. I wrote a lot of subroutines, including primative printer drives that I colud combine for a majority of programming projects.
The story of the byte reminded me of many arguments I had in the early years. At university I was taught a BYTE = 8 bits, but a WORD is whatever the size of the registers in the CPU you're using are. So, if the registers in your CPU is 8 bits, then a WORD is 8 bits, if the registers is 16 bits, then a WORD is 16 bits, etc. It makes sense, because you can imagine it as it is the amount your CPU can say in one go, a WORD. The problem was that at the time 16 bit CPUs, or 16 bit computers, were very popular and therefore a lot of people were taught or just learnt themselves that a WORD is 16 bits. Even now when you do a casual Google search you will probably be presented with a WORD = 16 bits (or 2 bytes), but when you dig a bit deeper you find it's not true, or at least it's only true in 16 bit CPUs 🙂
Dave. I can't help but tear up every time I see the Friendly Giant ending ... and I'm really starting to enjoy these episodes. Well done sir! Do you remember why you had latex gloves on?
I had a cheap Atari ST rollerball mouse, that had to be cleaned often when it stuttered or stopped. One weekend, it worked left/right but not up/down. After 3 tries at cleaning, I bought a new one, and an hour later back and running. Next weekend, same thing, but I knew it wasn't dirty yet. Turned out to be at that specific time of day, the sun through the window hit the side of the mouse, and affected the roller ball timing wheel. Can you imagine a user with that problem calling a help desk and being told to close the blinds.
Oh, I used GFA Basic! Loved it. It was a great implementation for the Atari. I made a library of widgets so I could have standardized UI elements. 15 years later, I took that same approach to ASP web development, with libraries for database connections, UI elements, user input sanitation and a state machine template. 15 years later, I created a GTK framework for making Linux GUI apps in Python. So, I owe GFA Basic a lot of thanks. Oh, and I learned Basic on a PDP 11-70. So, you're really poppin' off for me the last month or so!
I was involved with a bug that had an entire group stumped. A report was not printing properly. The program was quite simple, and had been in use for months. Wrong printer control ribbon.
8:30 I was so happy when defrag come in ms-dos 6.2ish I used the database program data-ease to store all the products in the warehouse for a well known worldwide retailer and we use to process picking list from it and lots more and once a week we run history reports it took a night to process due to where information was stored on the disc etc and then backups. Defrag cut the time it took to do jobs over night to under a hour a made using the system more user friendly less waiting for the paint to dry so to speak. Thanks Dave.
The wraparound error reminds me of a bug in the TRS-80 editor assembler for the TRS-80 Model 1. It was sold as a Radio Shack product, but was written by Microsoft. The assembler would happily assemble a relative jump of 128 bytes forward. When the processor executed the command, it jumped backward by 128 bytes. It took a bit of head-scratching to figure out the bug. I ended up disassembling the editor assembler, fixing the bug, then reassembling it.
Once upon a time (1998) I was tasked to fix a bug in a banking application. Turned out that the code was a total mess of patches on top of bugs, including patches of code that'd been commented out. The 10,000 lines of code in numerous source files was also riddled with Y2k bombs, which nobody had yet thought of identifying. As the interfaces to the code could be identified, I gave the project manager the option of re-crafting from scratch or to fix the hundreds of previously unidentified bugs individually. I gave them an estimate of two weeks to get it done. The 1000 lines of source code which was delivered, working on time, included nearly 700 lines of comments describing functionality. i.e. 300 lines of code was all that it took. That included "novel" input parameter checking for plausibility, and being a database application, trapping all errors. The actual work was done in less than 50 lines. Had the code been left untouched until 2000, customers could have written cheques and only had their account debited by 10% of the amount.
I was slinging copper tube in my attic when I was installing my heat pump AC. It was 53C up there! :O So I know what you felt like about the "they'll find my dead body" thing! I maxed out at about 10 minutes before I had to come down to rehydrate.
The hardest bugs are always the ones that are hardest to reproduce. If I can reproduce it in 10 seconds by running a testcase, I'll find out very fast. If it takes a couple of hours to reproduce, and only crashes half of the time... Well, good luck finding that one. My favorite one was a user who claimed the system changed her inputted numbers, sometimes. It was a warehouse application, with numbered boxes. Most worked automatically, but they could manually fill a box and register it in the interface. Safe to say, putting in the wrong box number caused quite some issues down the line. I couldn't reproduce it at all, not in a test environment and not even in the live environment. So in the end, I drove a couple of hours to that warehouse in the hope I would see something by just observing what she did. I watched her type in the number "063", and I thought I saw it change in a flash when she hit the save button. It all came together, it was an interface bug, not a bug in one of the communication protocols. I never tried entering numbers with leading zeroes, but apparently the interface framework we used, had a bug, and treated those as octal numbers. Finding that big was such a strange feeling. Happy that I finally found it, but mad that I spent a couple of days in total to find a bug in a library.
I also grew up with black & white console tv broadcasts of Friendly Giant on the farm near Yorkton. The young uns give me the side eye when I mention the Tickle Trunk!
Dave, some of the Dec operating systems had flags you could set on files for contiguous (mandatory) and contiguous best try. With the mandatory flag set file creation would fail if there wasn't enough contiguous free space to complete the allocation even if there was plenty of free space remaining. This was used for some OS and performance critical files. If contiguous allocation failed you had to delete stuff ir defragment the disk before you could create the file.
Borlands BGI graphics in Turbo C used calloc() to allocate the memory. Borland's calloc() did call malloc() - and then never checked if the allocation worked before clearing the returned memory. In far me ory model, a NULL pointer would be address 0000:0000 of that MS-DOS machine. Right where the interrupt table is. And MS-DOS runs a timer tick 18.2 times/second. So a random time later, that timer tick happened, but with the interrupt pointer trashed. So every single time I run my code, it crashed at a random time/place. And no hardware debugger that could step in and tell how everything ended up exploding. One of two runtime library bugs in the Borland C libraries that did bite and had me chase for way longer time than "normal" bugs. The advantage? I learned (the hard way) to spend extra time to make sure my own code doesn't have random failures from memory overwrites or from unsynchronized access to common data. And for all newer development, I also always have hardware debuggers that can help when writing interrupt service routines and hardware drivers, where traditional debugging doesn't work well.
I'm trying to figure out how this is baked into windows currently. I am under the assumption that drivers are being rolled back from windows updates due to this memory allocation synchronization issue although I can't prove it yet. What would be a good avenue to try and narrow down this issue?
@sirseven3 The BGI (Borland Graphics Interface) was a device-independent graphics library for MS-DOS. On MS-DOS, each individual program had to include the required graphics code to draw text, circles, lines, etc. The Borland languages did ship some such drivers for Hercules [monochrome], CGA, EGA, VGA and some more graphics standards. In Windows, everything is graphics, except the magic blue screen of death. So Windows very early in the startup loads either a custom driver [like from AMD, Intel or NVidia], or makes use of a generic SVGA driver. When normal Windows applications starts, then everything is already prepared for accessing all graphics drawing primitives.
The funniest one I ever heard, was from a tester who said he had a report from live deployed software running under windows. Every so often cpu usage would go 100%, and the software would stop responding. The devs tries to recreate it and failed, no one could get to the bottom of it… eventually someone went to site to observe it happening. Eventually it emerged. The dev went for lunch , came back and the site guys excitedly announced it was happening. When they went to the machine, it was the pipes screensaver….The machine wasn’t very powerful and some idiot had enabled it….it basically ate the cpu cycles for fun….
Uh bug flea market. Mine was a XAML font glyph that was removed from the actual font. Three months because the editor did not notice and the runtime crashed without any information at random moments.
For me it was a self inflicted wound. I was just learning C and I was writing a text windowing system for our company’s brand of terminals. Couldn’t use curses because of some weird things the terminals supported - like using the same control sequence to turn on then off the same feature, like insert mode. After banging my head trying to get curses to work, I decided to roll my own. I used vprintf to create my own functions to move the cursor etc. under some circumstances, I was passing NULL instead of ‘’ and it worked fine. People are using the program without a problem until we upgraded the OS. All of a sudden, the cursor is jumping everywhere, terminals are locking up, it’s a nightmare. Finally, I realized when I passed a null pointer it was looking at position 0 in the memory which used to have a zero at that location so it was treated as ‘’. But now it had 1B followed by random information so that every time I passed vprintf a NULL it would generate a random escape sequence. Fixed with a simple ternary expression.
3:43 In germany we have: "Laden", the store place. "Werkstatt", the workshop area where you do repairs and assembly. But some places - especially in IT - do have both. I guess you shortened "workshop" to "shop", then ended up conflating it with the "shop" that means "store"?
Englishman here but was in Kassel for two years, always thought laden was "load" (yes, I was programming). Is the difference the capitalisation of the word, or am I missing an umlaut?
The hardest bug I've addressed was in smartphone that under certain carrier, at certain times, everyone's phone around a time would seem to run out of battery. Sure, it's near the end of the day so people would just assume it's end of the day (actually around 5-6pm, on Fridays in certain European countries). Turns out there's some cellular protocol under certain circumstances, would be put into certain modes to save power. Without this feature, most smartphone would be out of power within 15-30min. What happened was that there's a race condition between the different cores, the modems cores and the applications cores. Since in mobile, both systems think they are masters and the others are secondary,... besides it's impossible to replicate, the modem core as well as all the code are highly protected to anyone outside the modem system. Out of entire world, was me and one HTC engineer both had a binary dump of the system and went through steps by steps to identify the bits. The modem team then had the visibility of the code mapped to the bit to determine the race condition. The second hardest bug I've address was when I was a bios engineer and we'd have system dried randomly. This was back in the days we first made all-in-one pc laptops. Instead of using the $300+ intel mobile CPU, we decided to use the $100 desktop CPU. we had tens of thousands of computers in factory running burn in tests and would just fail at random. turns out under certain conditions, the fans wouldn't perform and would melt... yes you read that right, melt...
I had a whole 18 inch cubed box of floppies until 2 years ago. It was marked baby stuff and I had moved house to house without opening. ONe day I opened it and it was filled with old 5.25 floppies. I kept a few and tossed the rest.
Recall seems like it is more a tool for Microsoft to utilize in training a Large Model based on how the user engages with the HMI. It wants to know how to execute similarly but at a much faster pace.
For me it was a feature of Microsoft's 5.1 C compiler. Seems that the keyword "volatile" was allowed syntactilly but NOT semantically. As in it would not generate an error, but it also did not generate the correct code.
I believe the invention of shingles hard drive is the worst invention of the century. Hard drives with high capacity and low price, with no indications other than a model number and a red label, has ruined the day for many people who didn't know better. The closest device to a write-once hard drive. BTW. The excessive logging, reporting and files duplications made Windows barely acceptable after Vista. But Rec-All has been the drop that burst the barrell - I now have Debian on all my computers. It has been a really poor endgame on part of Microsoft...
Absolutely, I still use Windows but go to great lengths to remove Recall and a great number of other data harvesting nonsense then create a custom image. Getting harder and harder and continually having to delve into undocumented components is really not fun anymore...
My hardest bug to fix was on a VAX running Fortran. As you know all arguments to FORTRAN subroutines are passed by reference, meaning the subroutine can change the arguments value. Unfortunately someone passed the literal value 0 (Zero) as one of the arguments, the subroutine assigned 1 (one) to it. From then on any use of 0 in the program used 1 instead. Had to debug in assemble to find out why 0 = 1. took a few days.
Mine was somewhat similar. I was working in Assy on either a Z80 or MC68xx. I wanted to load a register with a CONSTANT equal to the number 6 as a loop counter, but instead of specifying a literal by beginning it with a "#" as I should have, I omitted the "#" and the register was instead loaded with the contents of memory location $0006. I stared at that line of code for hours and hours without seeing the error. The hard part was that the contents of $0006 were dynamic, so an error happened regularly, but with different results since the contents of that address were rarely, but occasionally, 6. I had to bring in a colleague to help, and all I had to do was run through my code verbally and the error magically revealed itself. DUH!
Is that what they call orthoganal for an instruction set? I mean the ability to use all types of data and addressing modes as operands... and with great power comes powerful bugs. Wonder if ChatGPT would have caught them...
Your explanation of shingled is much less horrible than the drives themselves! I have an uncanny ability to find bugs. Not so much fix them. When I report them, the response is usually "Never happens. Won't fix." Well, /almost/ never! Fix it, ya bastiches!
AmigaBASIC was hellishly slow! It was only usable once you had a compiler, such as HiSoft BASIC. But I didn't do much in BASIC on my Amiga; I was into 68000 asm to control the hardware directly.
The channel being listed as from the States was throwing me off because I *KNEW* I was hearing a Canadian accent this whole time. I thought I was going nuts! 😂 Anyway I love your content sir. Greetings from a Canadian American. 🇨🇦🇺🇸
Hi Dave, back in my university days we had DEC Rainbows they used an incompatable format. We could make the floppy disks readable on both the Rainbows and the IBM Compatables by formatting them 8 sectors single sided. That's from memory that has taken a few cosmic rays😮.
!!!DAVE!!!! Can't get to the bottom of this tech question online.. Got a laptop with usb thunderbolt ports. No ethernet. For gaming I need to use a USB to ethernet dongle.. Will a pow quality dongle compared to a top quality one effect ping and general connection quality? Thanks
Its immensely rewarding chasing bugs to their root cause and fixing it.
@@kwazar6725 at least the fixing thing, yeah
I sometimes enjoy just reading all the work someone has gone into fixing some obscure bug, because I can see how much work they put into it, and I can emphasize with the sense of satisfaction one gets when one finally pins down the actual root cause of some sporadic but infuriating bug!
speaking for myself, finding & understanding the root cause is the rewarding part (dopamine kick) ... actually fixing it is "just" work.
Not as rewarding as writing well designed and crafted code that runs for years without a significant bug.
@@LTVoyager - Well, I've done that too. For me there's more of a rush of satisfaction in finding one bug *in someone else's code* which has been causing me headaches for months or even years. It's a very short-term rush though. Just a few hours of thinking "I finally got the bastard! [meaning the bug]".
I had a friend who was customer engineer working on IBM equipment, and got a call out to a machine which had a vertically mounted 8" floppy drive as the boot drive. The report said that it wouldn't boot and was making a horrible grinding noise.
Anyway, he inserted a new 8" diskette, turned the machine on, there was a horrible grinding noise and the machine didn't boot. He pulled out the diskette and found it was badly scratched.
Tried again, same result. So he took the machine apart
In the drive he found cigarette ends, aluminium foil and other debris.
Turned out the cleaning lady thought it was a flip top bin and was emptying the trash into it.
bruh 😮
(Fortunately, she apologized, and went back to unplugging the server so she could plug in her vacuum!)
Wtf!
@@madbradfreeman in 97 or 98 I had complaints from operators at a plant that sometimes their operator HMI screens acted up for a little while so they couldn't see the graphics.
We were blaming the UPS and wasted much time troubleshooting that intermittent problem with no luck.
One night I was working late and about 9 pm old Ron the janitor came in to the control room and plugged his vacuum into the one unused orange UPS receptacle and turned it on, the 21 inch CRT monitor I was at immediately lost its mind until I reached over and unplugged Ron's vacuum.
I finally just clipped the little jumper bars in the receptacle between the two plugs so that unused plug stayed dead.
Problem solved lol.
I also caught Ron spraying his cleaner solution over those big CRT monitors and wiping them down with a cloth.
Janitors man 🙄🙄
Back in about 1983 I had one of those transformative debugging experiences almost exactly as you described. I was working on an adventure game in Z-80 assembly on my TRS-80. Everything had been going just fine for a few days. Then, I added an item to a table and ... boonies. As I recall, a full on crash was not very eventful. A couple of clicks on the floppy disk and the TRS-80 prompt appeared on screen. I feel silly explaining it now as it seems so obvious, but at the time I just couldn't see it. Stared that mother down for a day and a half. Turns out that my 'items' table, (for which I had allocated what seemed like lots of memory), overwrote an adjacent pointer table by a couple of bytes. I've told this story many times, suggesting that until you've stared at a monitor for 13 hours straight trying to figure out how the simplest of changes brutally crashes the computer, you don't know what it's like to be a programmer.
Yep, the most difficult ones have always been the most obvious ones. The reason why at work we always used the 'fresh eyes' approach. Getting someone else to have a quick look to see if they can spot something obvious which you're not able to see because you've been staring at it for hours/days. The first few times it's extremely frustrating and annoying when someone else comes along and spots your mistake in less than 5 mins, but after a few you come to appreciate it as a valuable tool. And of course you get the opportunity to do it to other people's code as well 🙂
@@farab4391 Exactly. We used to refer to the 'janitor syndrome.' Same idea ... A janitor walks by after you've been staring down your code for hours and asks, "Why did you do that?"
This is really a blast from the past. I remember installing a hard drive on a floppy only system for one of my lab mates. I chose RLL versus MFM and the people we were buying the drive from kept questioning me. She said they didn't sell that many. I told her she should be happy that I was reducing her slow moving inventory.
I think the hardest bug I had to track down was when HP wrote a new implementation of Appletalk as part of a new network card. I had a customized version of CAP (Columbia Appletalk Package) which work absolutely fine with this new implementation, except that occasionally a network connection would just hang up *forever.* After months of debugging, I finally figured out that the problem was in error-handling of a dropped packet. And what's worse, is it came down to subtle and ambiguous wording in the official specification of Appletalk.
The bug which took me the *longest* to track down was in a simple client for chat system. It just gets lines from the chat server and writes them out while letting the person at the terminal keep typing in their own messages. Every once-in-awhile my client would print out an extra blank line. It took me at least 18 years to track down that bug!
I was one of the coders on a Playstation 2 game. It was near completion but the game would be unstable, it would crash. All the coders were pointing fingers at each other, none of them were debugging the issue. It was obvious there was memory corruption, but the PS2 tools at the time did not have memory protection, did not have boundary checking. I spent days making the C/C++ code compile and run on a Windows PC using MS DevStudio. I enabled full heap and stack checking using the MS compiler options. Leaving the code to run a loop through the title screen, into the game levels, and back out again, quickly showed that an old bit of code that handled a general purpose link list of objects was to blame. It was not correctly handling unlinking and would lead to incorrect memory blocks being accessed and modified. In the end I made sure to point out what coder was responsible and make sure they knew exactly how much work and risk they had caused.
Yo thats so cool! Can you share more about the PS2 development? I grew up on the PS2 and would love to knlw and or learn more about how developing was for the PS2 back then!
Thoroughly enjoyed the discourse!
@Dave's Garage: Thanks Mr. Dave.. I love your content, and always enjoy the thrill of catching up on new episodes. Thanks again. Ahh, you take me down time's road making me reminisce my childhood growing up on an Atari, Apple IIe, Apple Mac, 6x86 Cyrix; IBM PC-DOS; IBM 486DX66Mhz MS-DOS then Windows 3.1 then Red Hat Linux.
Look up, look way up! I will never forget that show.
I LOVE UAC which came with Vista. Thank you for that! OK, the Win7 version required a few clicks less, but the concept of session isolation and being able to switch that easily for just one command is great. Today: Always set UAC to the highest level, and on servers even one step beyond to require a UAC confirmation for shutdown or reboot (unless you run the shutdown command from a shell started with admin rights).
Dave, go back to Microsoft and fix the Windows 11/Server 2025 mess. There are so many weird bugs, like my current pet bug with data corruption in a Nested-V Machine with 1st-level being Server 2022 or Server 2025 using deduplication on the drive for the second level guests. Server 2019 is fine. I would post a link to server insider threads, but da tube...
Oh, and make the UI fast, efficient and responsive again, so one does not need a fast GFX card to not go crazy and can use a thunderbolt dock again without going insane over that slow speed since it is still just thunderbolt and not PCIe16x.
Great insights as always Dave!
The hardest bugs I've had to fix usually came down to hardware bugs in the ASIC that had to be worked around somehow. There is one bug fix I've always refused to sign off on, because it was all a lie by the sales team. It required an extra capacitor on the circuit, and there was no amount of code we could write that would fix that one. But somehow, that's how it was sold to the customer.
Gotta add mine. Was a C static variable initialization order problem. It resulted in reading 1 byte beyond an array.
Only showed up on SunOS when built with Sun’s C++ compiler without a debugger attached and NO code modification.
Turned out the stars had aligned to place the array at the end of a page of memory and triggered a bus error.
Obviously this was well before we had tools like valgrind. I’m surprised I still had hair by the time it was found.
Great video! Talking about software, I'd love to see a video on every big Intel CPU (8088, 286, 386, so on) explaining key architectural differences and instruction additions that made it faster and easier for software developers to use like MMX and SSE instructions. I don't think very many videos exist talking about the improvements through the 90s especially, and it would be cool to see how Intel was a leading innovator through the 90s compared to the harder times they are facing currently. Thanks so much and I hope you get a chance to see this!
[3:57]
The attic is under the roof
Where's the roof?
On top of the house.
Where is the house?
You're not going to believe this but it's next to the garage.
I liked the BASIC on the C128, sine you could merge existing programs then run 'Rebumber to tighten up the layout. I wrote a lot of subroutines, including primative printer drives that I colud combine for a majority of programming projects.
The story of the byte reminded me of many arguments I had in the early years. At university I was taught a BYTE = 8 bits, but a WORD is whatever the size of the registers in the CPU you're using are. So, if the registers in your CPU is 8 bits, then a WORD is 8 bits, if the registers is 16 bits, then a WORD is 16 bits, etc. It makes sense, because you can imagine it as it is the amount your CPU can say in one go, a WORD. The problem was that at the time 16 bit CPUs, or 16 bit computers, were very popular and therefore a lot of people were taught or just learnt themselves that a WORD is 16 bits. Even now when you do a casual Google search you will probably be presented with a WORD = 16 bits (or 2 bytes), but when you dig a bit deeper you find it's not true, or at least it's only true in 16 bit CPUs 🙂
Dave. I can't help but tear up every time I see the Friendly Giant ending ... and I'm really starting to enjoy these episodes. Well done sir! Do you remember why you had latex gloves on?
I had a cheap Atari ST rollerball mouse, that had to be cleaned often when it stuttered or stopped. One weekend, it worked left/right but not up/down. After 3 tries at cleaning, I bought a new one, and an hour later back and running. Next weekend, same thing, but I knew it wasn't dirty yet. Turned out to be at that specific time of day, the sun through the window hit the side of the mouse, and affected the roller ball timing wheel. Can you imagine a user with that problem calling a help desk and being told to close the blinds.
Europe here! Shop for us means a store.. But we know what it means in the US, Great video!
Oh, I used GFA Basic! Loved it. It was a great implementation for the Atari. I made a library of widgets so I could have standardized UI elements. 15 years later, I took that same approach to ASP web development, with libraries for database connections, UI elements, user input sanitation and a state machine template. 15 years later, I created a GTK framework for making Linux GUI apps in Python. So, I owe GFA Basic a lot of thanks. Oh, and I learned Basic on a PDP 11-70. So, you're really poppin' off for me the last month or so!
I was involved with a bug that had an entire group stumped. A report was not printing properly. The program was quite simple, and had been in use for months. Wrong printer control ribbon.
8:30 I was so happy when defrag come in ms-dos 6.2ish I used the database program data-ease to store all the products in the warehouse for a well known worldwide retailer and we use to process picking list from it and lots more and once a week we run history reports it took a night to process due to where information was stored on the disc etc and then backups. Defrag cut the time it took to do jobs over night to under a hour a made using the system more user friendly less waiting for the paint to dry so to speak. Thanks Dave.
The wraparound error reminds me of a bug in the TRS-80 editor assembler for the TRS-80 Model 1. It was sold as a Radio Shack product, but was written by Microsoft. The assembler would happily assemble a relative jump of 128 bytes forward. When the processor executed the command, it jumped backward by 128 bytes. It took a bit of head-scratching to figure out the bug. I ended up disassembling the editor assembler, fixing the bug, then reassembling it.
Once upon a time (1998) I was tasked to fix a bug in a banking application. Turned out that the code was a total mess of patches on top of bugs, including patches of code that'd been commented out. The 10,000 lines of code in numerous source files was also riddled with Y2k bombs, which nobody had yet thought of identifying.
As the interfaces to the code could be identified, I gave the project manager the option of re-crafting from scratch or to fix the hundreds of previously unidentified bugs individually. I gave them an estimate of two weeks to get it done.
The 1000 lines of source code which was delivered, working on time, included nearly 700 lines of comments describing functionality. i.e. 300 lines of code was all that it took.
That included "novel" input parameter checking for plausibility, and being a database application, trapping all errors. The actual work was done in less than 50 lines.
Had the code been left untouched until 2000, customers could have written cheques and only had their account debited by 10% of the amount.
I was slinging copper tube in my attic when I was installing my heat pump AC. It was 53C up there! :O
So I know what you felt like about the "they'll find my dead body" thing! I maxed out at about 10 minutes before I had to come down to rehydrate.
i was expecting a shrodinbug, but that was far from disappointing. :)
Thank you Dave!
The hardest bugs are always the ones that are hardest to reproduce.
If I can reproduce it in 10 seconds by running a testcase, I'll find out very fast. If it takes a couple of hours to reproduce, and only crashes half of the time... Well, good luck finding that one.
My favorite one was a user who claimed the system changed her inputted numbers, sometimes. It was a warehouse application, with numbered boxes. Most worked automatically, but they could manually fill a box and register it in the interface.
Safe to say, putting in the wrong box number caused quite some issues down the line.
I couldn't reproduce it at all, not in a test environment and not even in the live environment. So in the end, I drove a couple of hours to that warehouse in the hope I would see something by just observing what she did.
I watched her type in the number "063", and I thought I saw it change in a flash when she hit the save button.
It all came together, it was an interface bug, not a bug in one of the communication protocols.
I never tried entering numbers with leading zeroes, but apparently the interface framework we used, had a bug, and treated those as octal numbers.
Finding that big was such a strange feeling. Happy that I finally found it, but mad that I spent a couple of days in total to find a bug in a library.
For next. Give us the hardest bug you created! Bonus if you never told anyone about it before. Extra bonus if you took the credit for solving it.
I also grew up with black & white console tv broadcasts of Friendly Giant on the farm near Yorkton. The young uns give me the side eye when I mention the Tickle Trunk!
I'm in the UK. I can confirm that the term 'shop' is an abbreviation for workshop here. Although workshop is probably used more than shop
Dave, some of the Dec operating systems had flags you could set on files for contiguous (mandatory) and contiguous best try. With the mandatory flag set file creation would fail if there wasn't enough contiguous free space to complete the allocation even if there was plenty of free space remaining. This was used for some OS and performance critical files. If contiguous allocation failed you had to delete stuff ir defragment the disk before you could create the file.
Borlands BGI graphics in Turbo C used calloc() to allocate the memory. Borland's calloc() did call malloc() - and then never checked if the allocation worked before clearing the returned memory.
In far me ory model, a NULL pointer would be address 0000:0000 of that MS-DOS machine. Right where the interrupt table is. And MS-DOS runs a timer tick 18.2 times/second.
So a random time later, that timer tick happened, but with the interrupt pointer trashed. So every single time I run my code, it crashed at a random time/place. And no hardware debugger that could step in and tell how everything ended up exploding.
One of two runtime library bugs in the Borland C libraries that did bite and had me chase for way longer time than "normal" bugs.
The advantage? I learned (the hard way) to spend extra time to make sure my own code doesn't have random failures from memory overwrites or from unsynchronized access to common data.
And for all newer development, I also always have hardware debuggers that can help when writing interrupt service routines and hardware drivers, where traditional debugging doesn't work well.
I'm trying to figure out how this is baked into windows currently. I am under the assumption that drivers are being rolled back from windows updates due to this memory allocation synchronization issue although I can't prove it yet. What would be a good avenue to try and narrow down this issue?
@sirseven3 The BGI (Borland Graphics Interface) was a device-independent graphics library for MS-DOS. On MS-DOS, each individual program had to include the required graphics code to draw text, circles, lines, etc.
The Borland languages did ship some such drivers for Hercules [monochrome], CGA, EGA, VGA and some more graphics standards.
In Windows, everything is graphics, except the magic blue screen of death. So Windows very early in the startup loads either a custom driver [like from AMD, Intel or NVidia], or makes use of a generic SVGA driver.
When normal Windows applications starts, then everything is already prepared for accessing all graphics drawing primitives.
The funniest one I ever heard, was from a tester who said he had a report from live deployed software running under windows. Every so often cpu usage would go 100%, and the software would stop responding. The devs tries to recreate it and failed, no one could get to the bottom of it… eventually someone went to site to observe it happening. Eventually it emerged. The dev went for lunch , came back and the site guys excitedly announced it was happening. When they went to the machine, it was the pipes screensaver….The machine wasn’t very powerful and some idiot had enabled it….it basically ate the cpu cycles for fun….
The term "Byte" was used at IBM before the 360 was brought out.
Uh bug flea market. Mine was a XAML font glyph that was removed from the actual font. Three months because the editor did not notice and the runtime crashed without any information at random moments.
urgh XAML.
Thanks guys. I grew up with the Friendly Giant as well.
And I, but it was cancelled immediately before Mr. Roger’s Neighborhood (produced at WQED, our public TV station) came on the schedule.
For me it was a self inflicted wound. I was just learning C and I was writing a text windowing system for our company’s brand of terminals. Couldn’t use curses because of some weird things the terminals supported - like using the same control sequence to turn on then off the same feature, like insert mode. After banging my head trying to get curses to work, I decided to roll my own. I used vprintf to create my own functions to move the cursor etc. under some circumstances, I was passing NULL instead of ‘’ and it worked fine. People are using the program without a problem until we upgraded the OS. All of a sudden, the cursor is jumping everywhere, terminals are locking up, it’s a nightmare. Finally, I realized when I passed a null pointer it was looking at position 0 in the memory which used to have a zero at that location so it was treated as ‘’. But now it had 1B followed by random information so that every time I passed vprintf a NULL it would generate a random escape sequence. Fixed with a simple ternary expression.
3:43 In germany we have:
"Laden", the store place.
"Werkstatt", the workshop area where you do repairs and assembly.
But some places - especially in IT - do have both.
I guess you shortened "workshop" to "shop", then ended up conflating it with the "shop" that means "store"?
Englishman here but was in Kassel for two years, always thought laden was "load" (yes, I was programming). Is the difference the capitalisation of the word, or am I missing an umlaut?
10:42 - What was the discussion leading up to this? (forgiving bad video, but not forgiving bad audio)
…questions from the AI voice video. It starts around 26:11 in the full Shop Talk episode on Dave’s Attic
@@GlenHHodges Thanks.
My hardest bug was back in college during punch card days where I typed a "0" instead of an "O". Fun times!!
Windows Recall always immediately reminds me on the movie Total Recall.
So yes, the name is bad 😂
"Recall" called undo in various applications is damm handy.
The hardest bug I've addressed was in smartphone that under certain carrier, at certain times, everyone's phone around a time would seem to run out of battery. Sure, it's near the end of the day so people would just assume it's end of the day (actually around 5-6pm, on Fridays in certain European countries). Turns out there's some cellular protocol under certain circumstances, would be put into certain modes to save power. Without this feature, most smartphone would be out of power within 15-30min. What happened was that there's a race condition between the different cores, the modems cores and the applications cores. Since in mobile, both systems think they are masters and the others are secondary,... besides it's impossible to replicate, the modem core as well as all the code are highly protected to anyone outside the modem system. Out of entire world, was me and one HTC engineer both had a binary dump of the system and went through steps by steps to identify the bits. The modem team then had the visibility of the code mapped to the bit to determine the race condition.
The second hardest bug I've address was when I was a bios engineer and we'd have system dried randomly. This was back in the days we first made all-in-one pc laptops. Instead of using the $300+ intel mobile CPU, we decided to use the $100 desktop CPU. we had tens of thousands of computers in factory running burn in tests and would just fail at random. turns out under certain conditions, the fans wouldn't perform and would melt... yes you read that right, melt...
I had a whole 18 inch cubed box of floppies until 2 years ago. It was marked baby stuff and I had moved house to house without opening. ONe day I opened it and it was filled with old 5.25 floppies. I kept a few and tossed the rest.
Hey, I remember installing and formatting a Seagate 1/2 height hard drive as mfm just like that. That takes me back to the 80's
ST506 was named from Shugart Technologies I believe.
Recall seems like it is more a tool for Microsoft to utilize in training a Large Model based on how the user engages with the HMI. It wants to know how to execute similarly but at a much faster pace.
I'm in the UK, and I'd always assumed that shop was short for workshop.
It is
You know that I started with DOS 3.0 on an old IBM clone as well as a Tandy…and a Commordore VIC 20!!!!
DE N2JYG
BBC Master 512 here.
Hi Dave!
I LOVED my Commodore 64! I never played games on it, though. Wonderful machine for the times!
@@blakepace Load "*",8,1 !
I spend my money on a Vic20 and didn't have any left when C64 came out. Programming Pacman in 5k of ram taught me a lot of discipline though!
For me it was a feature of Microsoft's 5.1 C compiler.
Seems that the keyword "volatile" was allowed syntactilly but NOT semantically.
As in it would not generate an error, but it also did not generate the correct code.
I believe the invention of shingles hard drive is the worst invention of the century.
Hard drives with high capacity and low price, with no indications other than a model number and a red label, has ruined the day for many people who didn't know better. The closest device to a write-once hard drive.
BTW. The excessive logging, reporting and files duplications made Windows barely acceptable after Vista. But Rec-All has been the drop that burst the barrell - I now have Debian on all my computers. It has been a really poor endgame on part of Microsoft...
Absolutely, I still use Windows but go to great lengths to remove Recall and a great number of other data harvesting nonsense then create a custom image. Getting harder and harder and continually having to delve into undocumented components is really not fun anymore...
Stop man, not gonna listen twice!! 😂
A couple from the USA invited people to their "garage sale".
Dave, do you have any thoughts about NDOS and what happened there?
Hey .. did you say .. ".. back when I was on Ottawa.." ?
Sounds like your attic needs some air flow! Too hot to handle.
My hardest bug to fix was on a VAX running Fortran. As you know all arguments to FORTRAN subroutines are passed by reference, meaning the subroutine can change the arguments value. Unfortunately someone passed the literal value 0 (Zero) as one of the arguments, the subroutine assigned 1 (one) to it. From then on any use of 0 in the program used 1 instead. Had to debug in assemble to find out why 0 = 1. took a few days.
Oh man thats enough to tear your hair out!
@@douglasphillips1203 I don't have much hair left now!
Mine was somewhat similar. I was working in Assy on either a Z80 or MC68xx. I wanted to load a register with a CONSTANT equal to the number 6 as a loop counter, but instead of specifying a literal by beginning it with a "#" as I should have, I omitted the "#" and the register was instead loaded with the contents of memory location $0006. I stared at that line of code for hours and hours without seeing the error. The hard part was that the contents of $0006 were dynamic, so an error happened regularly, but with different results since the contents of that address were rarely, but occasionally, 6. I had to bring in a colleague to help, and all I had to do was run through my code verbally and the error magically revealed itself. DUH!
Is that what they call orthoganal for an instruction set? I mean the ability to use all types of data and addressing modes as operands... and with great power comes powerful bugs. Wonder if ChatGPT would have caught them...
I thought that oddity/bug was well-documented enough that no one fell into it, anymore. Nice to see someone else fell into it
Your explanation of shingled is much less horrible than the drives themselves!
I have an uncanny ability to find bugs. Not so much fix them. When I report them, the response is usually "Never happens. Won't fix."
Well, /almost/ never! Fix it, ya bastiches!
How does a code operate two or more actions at once. Nested codes? I am Reading sensors and seeking to act on data in real time.
Overlays are from HELL.
AmigaBASIC was hellishly slow! It was only usable once you had a compiler, such as HiSoft BASIC.
But I didn't do much in BASIC on my Amiga; I was into 68000 asm to control the hardware directly.
Dave: What do you think about NDVD and AI?
ah i thought that was more in Bob Ross realm
I wanted to hear more about the C64 bug
Was trying to work out why CTRL+ALT+DEL wasn't working so well on windows 11, guess have my answer now!
The hardest bugs are hardware bugs....like timing problems when your variable spans a page.
👍Nice!
Re: temp in the attic... 90? 130? I know you've lived in the US for a long time, but still... 🙂
120F. The F is for Freedom ;-)
@@DavesGarageAnd freedom units is the freedom to create confusion. 😜
@@perwestermark8920 The right tool for the right job
@@DavesGarage Sorry to disagree, Englishman here, but the F is for what the "F***" XD
The channel being listed as from the States was throwing me off because I *KNEW* I was hearing a Canadian accent this whole time. I thought I was going nuts! 😂
Anyway I love your content sir. Greetings from a Canadian American. 🇨🇦🇺🇸
❤❤❤❤
👍👍
I was confused at first as I watch this yesterday. Note Dave, I need more mugs 😂 one isnt enough as my misses stole my cup today and is using it 😂😂
Hi Dave, back in my university days we had DEC Rainbows they used an incompatable format. We could make the floppy disks readable on both the Rainbows and the IBM Compatables by formatting them 8 sectors single sided. That's from memory that has taken a few cosmic rays😮.
!!!DAVE!!!!
Can't get to the bottom of this tech question online..
Got a laptop with usb thunderbolt ports. No ethernet.
For gaming I need to use a USB to ethernet dongle..
Will a pow quality dongle compared to a top quality one effect ping and general connection quality?
Thanks
Was windows recalled? What went wrong?
LOL!!
The old wrap around !
ahh yes, the crap shingled drives that crap wd was selling for nas without disclosing its crap shingled drives, still fu wd
Hey Dave, I don't know much about Autism and folks having kids. Maybe to personal; but if you don't mind sharing, are any of your children autistic?
No, but a couple of them are certainly on the spectrum, but not to the point of actually having ASD!
@DavesGarage thank you for sharing. I wondered if it has genetic traits that get passed on. Heterozygous like.
@@DavesGarage Forgive my ignorance but I thought the "S" was for spectrum?
Third
First 😂
Technically not, it was on the other channel a day ago