The 3900x was a hot chip, and if not properly cooled I could see the IHS becoming partially unsoldered from one or both of the CCX's. This would make it pretty much impossible to thermally manage from then on.
@@dremy746 Different solders melt at different temperatures. The solder used in CPU's is not the same as the lead-tin solder used in making electrical connections. The solder used in CPU's is indium solder, which melts at 157 degrees C, and can soften at lower temps.
@@testickles8834 if you're addressing me, at 58 I suppose I am old enough. I remember water-cooling first-generation athlon's which was quite a trick since you couldn't buy any water cooling Hardware at the time and had to make your own by hand.
for the chips that do boot in to bios you could try turning cores off, first try to see if it works ok with only 2 cores then enable 4 and so on to find out maybe a 6 core can work fine with 4 cores
@@groenevinger3893 sometimes he is too scared to put his working CPUs into motherboards that he suspects might fry them, so I think these partially working CPUs could make for perfect test subjects, as he won't lose anything if they get fried anyway.
@Chris723 Yeah, TH-camrs generally make videos to make money, super hot take you had there. But to say he's not into repairing PCs is clearly false. He shows the broken PC and then he makes it work, without charging the viewer. It's kind of a win win situation.
@Chris723 I mean it's not perfect, but people do get free repairs/parts, he makes money, and it does help some people at home try to fix their own stuff. I'd say the general viewers think they're good enough.
For the ones with dram led... Tech Yes Bryan found he could revive a "dead" 3600 by undervolting in the bios with a good cpu, and then switch to the dead one. It would be interesting if you could make any of yours work by undrvolting or reducing the clocks. [edit] lol, just got to the last one where you did just that. Worth re-checking some of the others similarly.
The 5900x seems similar to a few cases of degraded silicon I've seen. With some time you could probably dial in the voltage/clocks to get it running pretty close to stock. Not perfect but would still work well as a homelab CPU or something.
I have an old fx-8370 chip (use it as a test bench and backup media system now). Used to oc to 5.0GHz, but slowly became unstable over the years. Now it'll run fine forever at 4775MHz but not at 4800 or more.
As others said, the 5900X is "fixable" by disabling Global C-States control in BIOS, which disables core parking feature. My 5950x just did this last week, while it can be fixed by disabling this option, personally i decided to RMA it instead.
I love how you take everything as a learning experience, to many people want to just be right all the time instead of realizing that there is always something to learn. Keep up the good work greag! awesome video!
Hey Greg, thanks to you & your fix or flop series i was able to fix my 10 year old desktop, so let me share what happened the. Out of the blue the system started Blue screen of deaths different error everytime so i thought maybe the windows was corrupt and tried to fresh install the windows but it wouldn't let me boot via the Pendrive, so i thought maybe the HDD had given up the ghost. Had a spare HDD lying so decided to swap it and sometimes it would just bootloop and sometimes it would just give no output to display so started by disconnecting RAM's to narrow down the problem so guess what 1 of the RAM sticks had gone faulty so tried putting it in the other slot to rule out defective slot but the RAM was at fault so finally the system started with the other 4 gig stick. I didn't go to any repair shop as most of them just try to rip off, so thanks to you my potato pc started again.
So the first 2 that get stuck on Dram light, I think you are correct mostly...I've seen this a bunch of times...It usually happens when the SOC voltage gets set too high and damages the memory controller...It really doesn't like getting overvolted too much unless you have extreme cooling...
I just want to say thank you for the fix or flop content! Thanks to your methodology I was able to fix a friend's "dead" PC in less than 15 minutes. He thought the mobo was dead and asked me if I could look at it knowing that I have experience building PCs, and I was quickly able to narrow it down to just 1 dead RAM stick. I cleared the CMOS, swapped in a known working GPU, and then tested RAM one stick and one slot at a time. It was so satisfying being able to save that PC as it is still a relatively powerful system.. i7 9700k on a gigabyte z370 motherboard with an RX 6700xt.
With the overheating CPUs, try delid and liquid metalling the CPU and putting the IHS back. The thermal glue inside might've dried out or not making contact with the IHS anymore
I had the exact same issue as the last CPU that you tested here. Tried underclocking as suggested and everything seems to be working now! Thought I had tried everything... Thanks!
Great vid and quite helpful- I build a lot of Ryzen based computer systems and this helps with a "heads up" of possible outcomes. I have not yet witnessed these symptoms, but good to have this knowledge.
I have a Ryzen 72700X. Where one memory channel crapped out. What happened was I changed my cooler for one of those tower coolers and the weight on the tower cooler ripped the Ryzen out of its socket, and I am guessing that that caused some sort of electrical faultin the CPU, as it doesn't like being pulled out of its socket when the power is on. But that's what happened. The main fault lies with the top-heavy weight of the tower cooler and the CPU retention arm. The weight of the cooler pulled the arm out of its socket. There was an easy fix for this and that was to bend the arm towards the socket again, making a better contact with the catch. But by then it was too late. My verdict is --- Tower coolers are too top-heavy and the CPU retention arms are too weak and don't latch hard enough.
I would love to try to delid that 5900x and see if it's a contact issue. Maybe there's an internal issue where the chip is barely touching lid. Or maybe try turning one of the CCDs off amd try running at stock settings.
A motherboard with debug codes instead of just LEDs (and a power button) would've probably been beneficial for these tests. Or regarding the power button, you could just use one from an unused case to connect it to the pins instead of having to short the jumpers all the time.
@@daveward4358 Greg did say he was trying to contact AMD about that but he just hasn't heard back from them yet. He did also say that he will update us once they do contact him back about it.
@@daveward4358 Not necessarily, only because we can neither confirm or deny if that's true without proof or evidence. What I can say for sure is that, there were a lot of people who bought these CPU's, so it's possible that failure rates can seem to be higher than what it truly is overall.
@@notrixamoris3318 There's no problem with the cpu. The real problem here is the motherboard and psu. I have 12 Ryzen cpu and i dont have any problem at all. But the on my MSI motherboard thats my problem it kill 1 of my CPU. Asus and gigabyte are good for my experience.
I agree....cooling IS an issue with the R5-3600 if you use the pitiful cooler supplied with the CPU. With a good cooler that keeps it below 65C no matter what...mine has been absolutely reliable...so far.(knocks on wood)
If you install poorly it is generally VERY easy to tell why it is broken. Bent pins or massive damage to the socket itself or conductive thermal paste in the socket for example. Thermal loads could cook a chip in theory but in general its not something that will generally cause a chip to fry since people will normally notice something is wrong even if they arent tech literate. In reality though the truth is that no QC is perfect and every company makes lemons or has broken parts slip through the cracks. Or even parts that pass QC if only barely so they fail quickly. Thats one of the reasons you generally want to stress test a new system for a bit after you build it. If something passed the basic QC at the factory but has an issue when its ran hard for a long period its best for it to burn itself out early.
@@abdullahdanze2061 Not much point overclocking a ryzen. I suspect some mobo vendors may be too aggressive with auto voltage settings, causing weaker chips to fail. I've been having erratic issues with my chips.
for giggles I tried a older weed pen that stopped working (as they all do when the battery dies)....but today, I got two hits (with lights) and it still seems like it can give me more. I know it's not the same but I'm pretty stoked, literally :) Good vid Greg -- you're a genuine dude.
This should become a super sub series where every 45 videos or 5 bad cpus or any part really of Fix/Flop you should go back and see if you cant fixate or if they are truly broken
some AMD chips that have dual CCD(x900, x950, or some x600 series) may experience some impedance between CCDs especially when theres a degradation problem related to the silicon, their voltage needs heavy adjustment to work normally. TechYESCity demonstrated that by disabling cores/limiting boost frequencies some of these chips could be salvaged.
Preface: I may have a Ryzen 5 5600X with a failing memory controller. Throughout the past few weeks, I've been troubleshooting one of my PC builds. Around the same time as swapping graphics cards, my SFF rig became quite unstable, lots of random restarts. It freezes or reboots anywhere from immediately after POST to maybe 30 min in. The instability can be at idle, launching an app, activating the Start menu, closing an app, during a benchmark, etc. I've (re)verified the GPU, an RTX 4070 Ti, stability in another system. The previous card, an RX 6900 XT, is just as or more power hungry, therefore, I doubt(ed) a PSU problem. Many "clean" Windows 10 and 11 installs with various attempts at driver versions and driver install order made no improvement. When they would/could complete, benchmarks were fairly on par. I also changed the memory setting to Auto (i.e., base 2133 MT/s) and disabled onboard components (e.g., audio, Wi-Fi, BT) one-by-one via BIOS. Believing I've narrowed the culprit to RAM, CPU, or motherboard, I decided to do another DIMM reseating, though I swapped slots this time. Welp... This resulted in the board not making it through POST, lighting the DRAM Q-LED. No matter what, I couldn't get it passed the DRAM stage, that is, until I ran the system with only one DIMM. We're back to booting but still very unstable. I felt it improbable both DIMMs had failed. So, I swapped in a single stick of Kingston Fury (also DDR4-3200). System again boots, although, no improvement of stability. So, the problem appears to be down to the motherboard or CPU. And after recalling the failures you showcase in this video, I'm leaning CPU. UPDATE: I purchased a 5700X3D... Same problems. So, turns out, it was the motherboard... Somehow... Ultimately, I decided this was time for a platform upgrade. I returned the 5700X3D and sold the 5600X and DRAM to SellGPU, which claimed the components passed their tests. So far, the system is going strong on an Ryzen 7700X, ASRock B650E PG-ITX, and Flare X5 DDR5-6000. Same GPU, PSU, and SSD as before the platform replacement.
I was looking forward to this, since you had quite a collection of dead 3000s. To be honest I recently built two absolute equal 7600x systems, did the same settings on boths, same gpu, same EVERYTHING aside hard drives. One was running at 1.4v, the other at 1.29. The former was almost hitting 95°C (on a Dark Rock pro 4!!) the other was running relatively cool in the 80ish pumping better scores all over the place. I've ended up forcing a Curve optimizer on the 1.4v basically that reduced it to the voltages of the other (even less at -30) and the thing began running cooler than the other and posting roughly the same results if not a tad better. For safety I set the owner to -20, then instructed him how to lower it by time as soon as he checked that he was stable in his editing/gaming/daily routine. He now runs it at -30 and at least in Cpu-z/R23 he gets results slightly better than the other that I've left at stock all while running about 5°C cooler overall. The above was kinda strange tbh.. same b650M-A pro from MSI, same coolers, same Asus Tuf 6800Xt, same evga supernova G6 850, same g.skill 6000 c36 kit 2x16, same crucial P5 plus, same bios version, same bios settings (at default + just enabled XMP and fan profiles), same Phanteks p500 the only difference was one rig has a 2Tb HDD + 256 sata SSD, the other one just a 3Tb HDD carried over from their previous builds. But that can't be the humongous vcore difference. The only test I didn't do is to swap CPUs between the motherboards, but looking at how the CO cure turned out, I thought I didn't wanted to dismount and remount two Dark Rock Pro 4s 4 times in a day..
@GregSalazar for the 5900X try to Enable PBO instead of Deactivating it. And try out to activate the Eco Mode (75W/90W) inside of PBO. I had the exact same Issue, with an 5800X. At Stock Settings it was Boot Looping all the Time until i used the PBO Eco Mode and it wasnt even much slower. Like around 5-10%.
When doing these sort of diagnostics, CPUs have a system called by various names "CPU Internal Error" "CPU IErr," or "CPU Machine Check". This system is basically in charge of double checking results, e.g. that the CPU didn't do 1 + 1 = 3. When such error is detected, the CPU will (try to) tell the OS and the OS will log the error and reboot itself (sounds familiar?). This may require a few tries because sometimes the error is so bad the OS doesn't get the chance to do any of this and the system just hangs or reboots. In the case of Windows, you'll find this in the Windows Registry Log (kinda hard to reach when you can't boot). In Linux, you should see it right on the boot terminal on screen right before reboot. This is very useful because it may indicate further info as to what is failing; and you may find workarounds online if others are having the same issue. I suspect the 5900X was having Machine Checks, and the 3900 from 9:55 may possibly too. Trying to boot into Linux may also yield more info (particularly if it gets stuck in a specific place)
Recently I had a family member complain that shortly after plugging in his new WD 2TB Elements portable USB 3.0 (the next day actually) his monitor, keyboard, and mouse quit working. I saw that his MSI X470 Gaming Plus Max AM4 MB had every USB port filled with various crap...looked like a porcupine. His poor Corsair VS 450 PSU had to handle a Kingston 120GB SSD (OS boot drive), a WD 250GB HDD, a Seagate Barracuda 500GB HDD, the Ryzen 5 3600 w/stock cooler, 16GB G. Skill Trident Z Neo 3600, 5 UpHere RGB case fans, MSI GTX 1650 Super GPU with 2 monitors plugged in, and some gadget that appeared to have a few mini monitor screens. All the fans and lights were working, as well as the GPU fans. I noted the PSU didn’t have a connector for the 4-pin CPU PWR2 on the board, which I’ve read the MB only really needs the 8-pin CPU PWR1 but I can’t help but wonder so I suggested he buy a more powerful PSU to handle all the stuff he had plugged in and to make sure it had the connector to fill that void. Well a new Tier A Corsair RM1000e PSU didn’t help so I tried a known working GPU but that didn’t help either. I should note that no settings were ever tinkered with. Everything was totally factory and his Metallic Gear Neo case w/2 Skiron RGB fans, plus the 5-pack of UpHere RGB fans offered plenty of ventilation. I thought for sure the motherboard had an issue. The next stop was Micro Center for component isolation, which took 3 weeks and they said the problem was the Ryzen 5 3600 failed and it would be $180 to fix. He added that although he just bought everything last summer, (from MC) he didn’t buy the warranty so that’s something he’d have to handle himself through AMD. Noting how many of those chips have been diagnosed as “bad” plus the one I mentioned, I’m beginning to think AMD may have a dependability issue starting to emerge. EDIT: MC scared the kid by saying the new CPU would require a new OS and it would wipe everything on the drive. When the kid asked about saving his games and the cost of the next CPU upgrade. The tech said the ($129.99) Ryzen 5 5600 was actually just $20.01 more than the ($109.98) Ryzen 5 3600 and mentioned flashing the MB. So with another $150 to save his data, the price approached $500. I suggested that since his OS was on a cheap $20 120GB SSD that he should just put in a higher capacity & faster NVMe for the OS and the old SSD wouldn't have to be touched. Plus he already had the activation key from the previous OS installation. Next thing I know, the tech installed the ($49.99) Inland QN322 1TB NVMe and ended up costing “$200 something.” (his words)
I’m able to boot in b1 & b2 but anytime I try to use a1 or a2 I get error code 55 yellow light on ram everything lights on pc but no display on keyboard and mouse. Do you think this is a problem with the cpu or motherboard slits please let me know anything is helpful.
Greg, I don't know whether this interests you, but for bench testing a motherboard, I use a removable, external power switch. I like this better than physically shorting the power switch pins when I have to start up the same board many times. Of course, if you are working with a number of different boards at the same time, this would be more time consuming than shorting with a screwdriver.
Hey Greg, found a intresting thing!! 16:30 / 16:37 time in the video when u press on the ram with your finger (Dram) light Changes to CPu , i dont know im the only one noticing it.. Love ur Content!!
I have a viewsonic version of that portable monitor. I absolutely love it. High refresh rate and relatively good brightness and color. It’s a really nice thing to have around.
I find it interesting the number of memory issues. Just replaced my 2700 X as the memory controller is depredating. It stopped being able to run the memory on XMP, so I had to drop the voltages and speeds down a few months ago(6) and it was stable until yesterday when the problem made a return. However, I believe a bad bios was to blame in my case as I found that the XMP had gone 1.45 V on Auto which was fixed with a bios update.
For the 3600 cpus I would try a slower ram speed. I have a 3600X that will do nothing but give me issues if I use anything over 2400mhz ram. Its a definite issue and I only use the chip in a b550 board to test gpus and other components.
@@Smakheed Wow! I set up my friends workstation pc with a 3900xt and 32gb of 3200 and it just worked. And that was on a b450 AsRock board. I was using my 3600x on an asus x570 and figured 3200 would be fine but the cpu just doesnt like faster speeds. I'm sure with some tweaking i could get it stable but I don't have the time or desire to do so.
I got a problem similar to this, I got a MSI B 550 A Pro motherboard with a Ryzen 5 5600. Worked great for a month, no heating issues. One day I go to power on the PC and it does nothing. Changed out the PSU and ram, gpu. still the same. Then I took out the CPU and it fired right up, but the motherboard will no longer let me bios flash. Keep in mind you do not need ram or cpu in most msi boards with bios button.
It makes sense that the IMC in AMD chips fail more than non K intel chips. The more freedom you give users the more variety you need to support ram wise. Running outside of its preferred settings with higher clocks and tighter timings is higher risk. Similar to turning up the boost on turbo cars. It puts pressure on other parts: clutch, fuel system etc.
I've been in IT for over 30 years and I've personally only seen 1 failed processor. Two users with failed processors would have been crazy to me, but 5!? Intel isn't the best but this is shenanigans.
Well done, young man. My father will be turning 91 in about 3 weeks and I will be turning 54 a week after that. We both still learn from mistakes. If you didn't there would not be much point in having your channel.
The 5900x is a lot like a 9900k I had once. If you overvolt the snot out of it (1.45V Vcore or something) it’ll go into windows and work fine for a while, then it will degrade and eventually require more voltage or lower clocks. It eventually got to the point on the 9900k for me where I was down to disabling 4 of its 8 cores, and locking clocks at 4.0GHz at 1.4V. I just trashed it at that point.
After many Intel builds since 2000 this was my first AMD, Three years of happiness, now? I've spent more than a month plus a good sum trying to fix it, this is the problem I'm having. Thank you
@12:35 hahah this happened to my first Ryzen 9 5950x after trying out the dynamic oc feature on the Asus Dark Hero x570 it kept cycling between this and corrupted boot device.
I am wondering if the silicon shortage over the last two years has contributed to this issue, are manufacturers reducing the failure threshold/overhead to pump out their products? I am in the process of building my new 13th gen system and it is a major concern that component quality might be compromised by the current lack of raw materials globally.
try disabling one of the cores or try baking them in the oven 150c for 30 min. the oven method has worked many times for me and some of the pc tech still works fine after 2 years. oh, and Greg u love being right, dont u;) pz
IIRC The 3600's have a max listed compatibility on XMP of 3200. I ran into a Ryzen 2700x that is listed as 2933 and it, more or less, meant it! I managed to get it set at 3000 and stable 99% of the time. I've learned to leave everything at "Auto" while testing, then once it's a fix or replace, I start playing with settings. The issue with the 2700x didn't start until they "upgraded" the DRAM to a faster, RGB type. The "upgrade" worked fine in my RIG and a test rig but I'm running 5000 series AMD and Gen 11 Intel. Just a thought. Understand that tis video is 11 months old as my eyes see it so you may, or may not, have already thought of this, retested or got answers from AMD about these CPU's.
I built a new AM5 system with an 7950X in November. A couple of weeks ago it wouldn't wake from sleep so after a power cycle, got the DRAM debug light (ASUS Tuf B650 plus). I couldn't believe that two sticks of corsair vengeance had both failed at the same time. After testing the memory (single sticks in all slots - 8 tests) then RMA'ing the motherboard and CPU; turns out the CPU had died. CPU always ran cool as I was running it on a 120w envelope rather than 170w, with an AIO cooler. Never had a CPU fail in 40 years until now.
The labeling of the cpu as "bad" reminded me of a system one of my clients asked me to diagnose and the only information I had to go on was the label which read "doesn't".
Given that a 3900x is just a failed 3950x I'd venture the guess that one or more of the remaining cores from the 12 in use and still 16 physically there is bad. Well not quite bad as in non-functional but bad as in cannot operate at stock clock within stock voltage, and may have worked if binned down into a 3600 with 200mhz less base and 400mhz less boost, or if sold into an even lower sku like a ryzen 3 where 4 cores on a die would be disabled instead of 2.
About dram issue, try pressing cpu a bit harder to the mount. I saw ONE cpu where mount pressure needed to work consistently was a bit unreasonable. Maybe that was not one time wired chip. Just FYI try different slots for gpu and drive, sometimes you have just tiny part of the cpu shorted, and for example m.2 slots are unusable, but otherwise cpu works. Disable that part in bios, and you have yourself working, although crippled a bit cpu. It's not wired that same skews have same issues, it's likely that same type of machine made same mistake. Likely there is some kind of trace that is just a touch too small, and if combined with some other issue, maybe mount pressure too weak or something, just fries. Weak mount I find is starting point to this kind of issues, where cpu kinda works, but use while using very specific thing like avx512, it turns off or freezes. For Dram, try LP modules. I have no idea if they would work, but it would be interesting to find out.
Techyescity tested few ryzen 3000 series cpu's and what he noticed that they are running pretty hot. He came to conclusion that running those "older" cpu's with modern graphics cards makes them run a bit too hot, which will shorten their lifecycle. Like the modern gpu asks too much from cpu. He was able to bring cpu temps down with undervolting
I have a strange 3900x. It will work with 2 dimm's of memory as long as they are both in a1 b1 or a2 b2 configuration, or one dimm in any slot. It won't do anything with the standard dual rank motherboard configs. So it's always in single rank config. It does not matter what you toss in there or the speed/size, either single or dual rank. I had this chip on several motherboards now and it's always the same issue. If you only have 2 dimm slots you can only use 1 dimm. It will not work with 4 dimm's at all. No matter what it still runs at the rated dimm speeds when you select the Xmp profile. Size of the dimm's don't affect it either. It will run just fine on 64 gig's in single rank. I have never figured out what is going on with it and it's only used in a NAS server so I just kept it and never returned it. So both channels work but the chip won't work with both channels.
Greg, I envy your access to hardware, but I don't envy how much thermal paste you must go through in a week. I'd love to know what brand you use though.
Hi Greg, I have had two AMD 2700x cpu's with "B" channel memory controller's faulty. Still being used with 16Gb in "A" channel only. One died and replaced by AMD only for the "B" channel to fail again. Thought it was memory stick's at first as 4 sticks of Corsair 32Gb 3200 installed, only 16Gb usable on two M/B's, a MSi tomahawk and a crosshair VI hero. Love the channel. Mike (retired PC tec)
The last one I wonder if de-lidding and some thermal paste renewal, then replace the IHS (I know it can move around after de-lidding but still) and then carefully apply the CPU cooler again and try running it on normal just as an experiment to see if it will run on auto for vcore etc. Just because cheap paste is usually used. I just wonder if it has been run too hot and the paste is too crappy and now failed. I think that would be a good experiment.
The answer as to why the 5900x is stable in the UEFI environment, but not once Windows starts loading is pretty simple: UEFI environment is uniprocessor (e.g. only running on core 0). Windows' NT kernel, upon initialization, not only runs on each CPU (core), it has to initialize certain per-cpu data structures, etc.. A problem with one of the CPU cores, beyond 0, is likely crashing the system early in the OS boot process. My guess is it was overclocked and overvolted to the point of damage.
Maybe contact Steve from gamer nexus. This sort of situation might be down his alley. Especially if it shows to be a batch wide situation There could be hundreds of Ryzen 3000 chips on the verge of going bad and could open exactly why they could be turning. Might turn into an interesting collab.
I've had a friends 1600AF restart after a short while in windows, basically when its loading apps. Ended up lowering the voltage to something like 1.15ish and the frequency by a few hundred MHz and he ran it like that for a couple of months before upgrading. That CPU was part of a pre-built that had a warranty on the whole system, not by component (around here its not uncommon for a pre-built with off-the-shelf parts to have individual warranties) AND IT HAD A WARRANTY STICKERS ON THE CASE SIDE PANELS. My friend broke them since he isnt an animal and wouldn't keep his PC uncleaned for two years and because of that they denied warranty... I wasn't a fan of the company before and frankly their prices are far from great, but needles to say I don't even open their site anymore.
I have 2 "Dead" 3600's and if i undervolt them they work fine. I also have a 3900x that only boots if PBO is disabled. We have a few 1600's with the dead memory controller. My guess? These are all low quality binned chips that were created close to the end of the MFG cycle and were bottom bottom bottom of the barrel, they were sent out into the wild. Overclocked and that was that. I could be wrong but I have ran into so many degraded chips that work if you undervolt / underclock them. All of them are Ryzen. That flash that you seen when the mobo powered off and then back on is Ryzen memory training. it takes literally a few seconds.
1:58 While i agree that making a Test Bench for CPU check is complicated and hard. But making a cheap one specifically to test GPU's should be possible and save a lot of time checking for GPU related issues.
i changes some settings in bios i think cpu volatage ... then pc turn off himsef and wont boot again i try everything cmos reset ...what should i do its my new config 😔😔
Make sure to disable some cores in the BIOS to see if chips will run afterwards. You can try manually increasing voltages or lowering clocks too. Sometimes some cores will die completely, other times just enabling PBO or even any boost frequency at all will be the issue. I've also seen it where it would basically need an extra 50mV to hold the advertised boost frequency. Mostly, these chips can be saved if you're willing to make compromises.
looked on amd forums… one guy CrispyCrunch wrote… “I got a new angle on this. So deactivating PBO and CBS definetely works, PC was running stable for a week now. But you'll loose performance. So I wrote to the MSI support and the AMD support. MSI suggested to try increasing the DRAM Voltage by 0.05 V, which I did. System seems to be stable, no crashes so far - neither in idle or while gaming.”
Quite funny to see this video because I do the same from time to time. Go back to hardware lying around trying to fix it again and again - and often forgot what was the problem. I got some things repaired tho so it's always good to try it later again. Hope AMD will accept that they'll take a look at it. It would be interesting to know what happend to the cpus.
Hi Greg Thanks for the always informative, and thorough videos. I wonder if you would share some Fix or flop statistics? Do you have as many faulty intel processors as ryzens? Put up against how many you got it in?
My buddy gets Amazon returns and he has gotten alot of Ryzen 3000 series. He sees to always have issues with 3600s and 3900xs. The 6 core 3000 series bins don't seemed to fair well. While the 8 core bins, 3700x and 3950x and even the 4 core bin 3100 and 3300x seemed to fair much better.
I had a boot looping 7700K and was behaving like a bad overclock even though it was on stock settings. So I increased the voltage and lowered the clock speed. It then worked without crashing and after a couple of weeks set back to stock and it worked perfect.
The ryzen 5 3600s with dead memory channels would be worth something to dell so they have an excuse to use single channel ram
BROOO 💀💀
damn
Yep, and Dawid just LOVES that single channel RAM
Fix or flop is getting intense!
The 3900x was a hot chip, and if not properly cooled I could see the IHS becoming partially unsoldered from one or both of the CCX's. This would make it pretty much impossible to thermally manage from then on.
It would be awesome to see him delid it and try it again! It's already damaged so he's not really gonna break it more! lol
solder melts around 200 degrees Celsius. I don't think any 3900X was getting that hot, no matter how bad the cooling was.
all AMD's are hot chips.
I am guessing you ain't old enough to remember the old melting amd chips.
@@dremy746 Different solders melt at different temperatures. The solder used in CPU's is not the same as the lead-tin solder used in making electrical connections. The solder used in CPU's is indium solder, which melts at 157 degrees C, and can soften at lower temps.
@@testickles8834 if you're addressing me, at 58 I suppose I am old enough. I remember water-cooling first-generation athlon's which was quite a trick since you couldn't buy any water cooling Hardware at the time and had to make your own by hand.
"I wish I labeled my CPUs better."
"Let's label this one.... _stubborn_ "
for the chips that do boot in to bios you could try turning cores off, first try to see if it works ok with only 2 cores then enable 4 and so on to find out maybe a 6 core can work fine with 4 cores
Not worth it.
@@JoeNasr123 TF you mean not worth it? He is already in possession of the chips, it costs nothing.
@@TuffMcAwesome Too much work for a chip that's only good enough to throw in the garbage bin..
@@groenevinger3893 sometimes he is too scared to put his working CPUs into motherboards that he suspects might fry them, so I think these partially working CPUs could make for perfect test subjects, as he won't lose anything if they get fried anyway.
Choosing a motherboard with a Post Code display would have provided much more useful information.
he doesnt even have a bios chip reader, he just does this videos for money isnt in real repairing PCs
@Chris723 Yeah, TH-camrs generally make videos to make money, super hot take you had there. But to say he's not into repairing PCs is clearly false. He shows the broken PC and then he makes it work, without charging the viewer. It's kind of a win win situation.
@@ignacio8597 I taught after tens of videos he would start buying some diagnositcs equipment, it would make for better videos.
@Chris723 I mean it's not perfect, but people do get free repairs/parts, he makes money, and it does help some people at home try to fix their own stuff. I'd say the general viewers think they're good enough.
Most people don't have Dr Debug visual codes on their MOBO's so this is also useful to get to the bottom of issues
Ahh, the joys of troubleshooting used components. Loving this series, keep them coming!
lets hope he doesnt have to keep em coming.. noone wants to deal with dead components
Tech Yes City has had similar problems with AMD cpu's recently. Great video as always, Greg. Much respect.
For the ones with dram led... Tech Yes Bryan found he could revive a "dead" 3600 by undervolting in the bios with a good cpu, and then switch to the dead one. It would be interesting if you could make any of yours work by undrvolting or reducing the clocks.
[edit] lol, just got to the last one where you did just that. Worth re-checking some of the others similarly.
100%
Brian at Tech Yes City had similar issues with some Ryzen chips and found that some had bad cores and was able to get them working by disabling a core
The 5900x seems similar to a few cases of degraded silicon I've seen. With some time you could probably dial in the voltage/clocks to get it running pretty close to stock. Not perfect but would still work well as a homelab CPU or something.
I have an old fx-8370 chip (use it as a test bench and backup media system now).
Used to oc to 5.0GHz, but slowly became unstable over the years.
Now it'll run fine forever at 4775MHz but not at 4800 or more.
As others said, the 5900X is "fixable" by disabling Global C-States control in BIOS, which disables core parking feature. My 5950x just did this last week, while it can be fixed by disabling this option, personally i decided to RMA it instead.
Had a 5800 that would always work with pbo off. Probably a cstate issue but I just got a replacement also
I love how you take everything as a learning experience, to many people want to just be right all the time instead of realizing that there is always something to learn. Keep up the good work greag! awesome video!
Hey Greg, thanks to you & your fix or flop series i was able to fix my 10 year old desktop, so let me share what happened the. Out of the blue the system started Blue screen of deaths different error everytime so i thought maybe the windows was corrupt and tried to fresh install the windows but it wouldn't let me boot via the Pendrive, so i thought maybe the HDD had given up the ghost. Had a spare HDD lying so decided to swap it and sometimes it would just bootloop and sometimes it would just give no output to display so started by disconnecting RAM's to narrow down the problem so guess what 1 of the RAM sticks had gone faulty so tried putting it in the other slot to rule out defective slot but the RAM was at fault so finally the system started with the other 4 gig stick. I didn't go to any repair shop as most of them just try to rip off, so thanks to you my potato pc started again.
So the first 2 that get stuck on Dram light, I think you are correct mostly...I've seen this a bunch of times...It usually happens when the SOC voltage gets set too high and damages the memory controller...It really doesn't like getting overvolted too much unless you have extreme cooling...
I just want to say thank you for the fix or flop content! Thanks to your methodology I was able to fix a friend's "dead" PC in less than 15 minutes. He thought the mobo was dead and asked me if I could look at it knowing that I have experience building PCs, and I was quickly able to narrow it down to just 1 dead RAM stick. I cleared the CMOS, swapped in a known working GPU, and then tested RAM one stick and one slot at a time. It was so satisfying being able to save that PC as it is still a relatively powerful system.. i7 9700k on a gigabyte z370 motherboard with an RX 6700xt.
With the overheating CPUs, try delid and liquid metalling the CPU and putting the IHS back. The thermal glue inside might've dried out or not making contact with the IHS anymore
I believe all Ryzen CPUs are soldered to the IHS
Yeah, they're all soldered... But maybe badly. I second delidding
I had the exact same issue as the last CPU that you tested here. Tried underclocking as suggested and everything seems to be working now! Thought I had tried everything... Thanks!
Great vid and quite helpful- I build a lot of Ryzen based computer systems and this helps with a "heads up" of possible outcomes. I have not yet witnessed these symptoms, but good to have this knowledge.
I have a Ryzen 72700X. Where one memory channel crapped out. What happened was I changed my cooler for one of those tower coolers and the weight on the tower cooler ripped the Ryzen out of its socket, and I am guessing that that caused some sort of electrical faultin the CPU, as it doesn't like being pulled out of its socket when the power is on. But that's what happened. The main fault lies with the top-heavy weight of the tower cooler and the CPU retention arm. The weight of the cooler pulled the arm out of its socket. There was an easy fix for this and that was to bend the arm towards the socket again, making a better contact with the catch. But by then it was too late.
My verdict is --- Tower coolers are too top-heavy and the CPU retention arms are too weak and don't latch hard enough.
I would love to try to delid that 5900x and see if it's a contact issue. Maybe there's an internal issue where the chip is barely touching lid. Or maybe try turning one of the CCDs off amd try running at stock settings.
This video's great! the monologue of troubleshooting shows how people may want to try to do when they encounter problems with their systems.
A motherboard with debug codes instead of just LEDs (and a power button) would've probably been beneficial for these tests.
Or regarding the power button, you could just use one from an unused case to connect it to the pins instead of having to short the jumpers all the time.
Same here.
I thought they were going to be sent back to AMD to test them?
@@daveward4358 Greg did say he was trying to contact AMD about that but he just hasn't heard back from them yet. He did also say that he will update us once they do contact him back about it.
@@JCmeister9 So AMD probably know they sent out bad batches.
@@daveward4358 Not necessarily, only because we can neither confirm or deny if that's true without proof or evidence. What I can say for sure is that, there were a lot of people who bought these CPU's, so it's possible that failure rates can seem to be higher than what it truly is overall.
What percentage of the systems you've gotten in for repair are AMD vs Intel?
No idea. Probably more than half.
@@GregSalazar please ask gamer's nexus if there a systimatic problem with AMD processors...
@@notrixamoris3318 There's no problem with the cpu.
The real problem here is the motherboard and psu.
I have 12 Ryzen cpu and i dont have any problem at all.
But the on my MSI motherboard thats my problem it kill 1 of my CPU.
Asus and gigabyte are good for my experience.
@@lelouchabrilvelda1794 opposite here. I’ve had way more issues with Gigabyte boards than MSI.
So question is there a possibility that improper instillation could have caused these chips to fail? Also maybe not having proper cooling?
I agree....cooling IS an issue with the R5-3600 if you use the pitiful cooler supplied with the CPU. With a good cooler that keeps it below 65C no matter what...mine has been absolutely reliable...so far.(knocks on wood)
If you install poorly it is generally VERY easy to tell why it is broken. Bent pins or massive damage to the socket itself or conductive thermal paste in the socket for example. Thermal loads could cook a chip in theory but in general its not something that will generally cause a chip to fry since people will normally notice something is wrong even if they arent tech literate. In reality though the truth is that no QC is perfect and every company makes lemons or has broken parts slip through the cracks. Or even parts that pass QC if only barely so they fail quickly. Thats one of the reasons you generally want to stress test a new system for a bit after you build it. If something passed the basic QC at the factory but has an issue when its ran hard for a long period its best for it to burn itself out early.
Most properly working CPUs will shut themselves off well before any cooling related damage can happen,
Most likely the overclock fried the chips.
@@abdullahdanze2061 Not much point overclocking a ryzen. I suspect some mobo vendors may be too aggressive with auto voltage settings, causing weaker chips to fail. I've been having erratic issues with my chips.
Good vid, love the troubleshooting.
Was wondering when we'd get this video since you've hinted at it a few times
for giggles I tried a older weed pen that stopped working (as they all do when the battery dies)....but today, I got two hits (with lights) and it still seems like it can give me more. I know it's not the same but I'm pretty stoked, literally :) Good vid Greg -- you're a genuine dude.
Greg your channel is sooo good! I am really enjoying it. Please keep it up. 👏🏼
This should become a super sub series where every 45 videos or 5 bad cpus or any part really of Fix/Flop you should go back and see if you cant fixate or if they are truly broken
Been waiting for this!
17:55 recently I also experienced this issue (boot to automatic repair), my solution is disable fast boot and longer time for display bios (3s to 5s)
Greg your videos are the highlight of my day
Love you vids again Greg, anyways that Blue Eyes Toon Dragon was cool at your back
Ryzen cpu are very peculiar with memory. Is it possible to try different AMD recommended memory modules?
Memory was isolated in their respective FoF episodes on top of the single DIMM used here at 2133 MHz.
"Died" Zen CPU - this is normal for Gigabyte mobo. There are many cases even on TH-cam. There will always be a Aorus mother .
At 17:28, was that a spark under the motherboard when he flipped the switch on his PSU?
It sure looked like it! Good catch, I never would have noticed that...Though it could just be a trick of the light from him moving around
RGB Leds, on power first white light... ;)
some AMD chips that have dual CCD(x900, x950, or some x600 series) may experience some impedance between CCDs especially when theres a degradation problem related to the silicon, their voltage needs heavy adjustment to work normally. TechYESCity demonstrated that by disabling cores/limiting boost frequencies some of these chips could be salvaged.
Another great video. Always learn something new
Preface: I may have a Ryzen 5 5600X with a failing memory controller.
Throughout the past few weeks, I've been troubleshooting one of my PC builds. Around the same time as swapping graphics cards, my SFF rig became quite unstable, lots of random restarts. It freezes or reboots anywhere from immediately after POST to maybe 30 min in. The instability can be at idle, launching an app, activating the Start menu, closing an app, during a benchmark, etc. I've (re)verified the GPU, an RTX 4070 Ti, stability in another system. The previous card, an RX 6900 XT, is just as or more power hungry, therefore, I doubt(ed) a PSU problem. Many "clean" Windows 10 and 11 installs with various attempts at driver versions and driver install order made no improvement. When they would/could complete, benchmarks were fairly on par. I also changed the memory setting to Auto (i.e., base 2133 MT/s) and disabled onboard components (e.g., audio, Wi-Fi, BT) one-by-one via BIOS. Believing I've narrowed the culprit to RAM, CPU, or motherboard, I decided to do another DIMM reseating, though I swapped slots this time. Welp... This resulted in the board not making it through POST, lighting the DRAM Q-LED. No matter what, I couldn't get it passed the DRAM stage, that is, until I ran the system with only one DIMM. We're back to booting but still very unstable. I felt it improbable both DIMMs had failed. So, I swapped in a single stick of Kingston Fury (also DDR4-3200). System again boots, although, no improvement of stability. So, the problem appears to be down to the motherboard or CPU. And after recalling the failures you showcase in this video, I'm leaning CPU.
UPDATE: I purchased a 5700X3D... Same problems. So, turns out, it was the motherboard... Somehow...
Ultimately, I decided this was time for a platform upgrade. I returned the 5700X3D and sold the 5600X and DRAM to SellGPU, which claimed the components passed their tests.
So far, the system is going strong on an Ryzen 7700X, ASRock B650E PG-ITX, and Flare X5 DDR5-6000. Same GPU, PSU, and SSD as before the platform replacement.
I was looking forward to this, since you had quite a collection of dead 3000s.
To be honest I recently built two absolute equal 7600x systems, did the same settings on boths, same gpu, same EVERYTHING aside hard drives.
One was running at 1.4v, the other at 1.29. The former was almost hitting 95°C (on a Dark Rock pro 4!!) the other was running relatively cool in the 80ish pumping better scores all over the place.
I've ended up forcing a Curve optimizer on the 1.4v basically that reduced it to the voltages of the other (even less at -30) and the thing began running cooler than the other and posting roughly the same results if not a tad better. For safety I set the owner to -20, then instructed him how to lower it by time as soon as he checked that he was stable in his editing/gaming/daily routine. He now runs it at -30 and at least in Cpu-z/R23 he gets results slightly better than the other that I've left at stock all while running about 5°C cooler overall.
The above was kinda strange tbh.. same b650M-A pro from MSI, same coolers, same Asus Tuf 6800Xt, same evga supernova G6 850, same g.skill 6000 c36 kit 2x16, same crucial P5 plus, same bios version, same bios settings (at default + just enabled XMP and fan profiles), same Phanteks p500 the only difference was one rig has a 2Tb HDD + 256 sata SSD, the other one just a 3Tb HDD carried over from their previous builds. But that can't be the humongous vcore difference. The only test I didn't do is to swap CPUs between the motherboards, but looking at how the CO cure turned out, I thought I didn't wanted to dismount and remount two Dark Rock Pro 4s 4 times in a day..
@GregSalazar for the 5900X try to Enable PBO instead of Deactivating it. And try out to activate the Eco Mode (75W/90W) inside of PBO.
I had the exact same Issue, with an 5800X. At Stock Settings it was Boot Looping all the Time until i used the PBO Eco Mode and it wasnt even much slower. Like around 5-10%.
Great video thanks Greg.
When doing these sort of diagnostics, CPUs have a system called by various names "CPU Internal Error" "CPU IErr," or "CPU Machine Check".
This system is basically in charge of double checking results, e.g. that the CPU didn't do 1 + 1 = 3.
When such error is detected, the CPU will (try to) tell the OS and the OS will log the error and reboot itself (sounds familiar?).
This may require a few tries because sometimes the error is so bad the OS doesn't get the chance to do any of this and the system just hangs or reboots.
In the case of Windows, you'll find this in the Windows Registry Log (kinda hard to reach when you can't boot). In Linux, you should see it right on the boot terminal on screen right before reboot.
This is very useful because it may indicate further info as to what is failing; and you may find workarounds online if others are having the same issue.
I suspect the 5900X was having Machine Checks, and the 3900 from 9:55 may possibly too.
Trying to boot into Linux may also yield more info (particularly if it gets stuck in a specific place)
Recently I had a family member complain that shortly after plugging in his new WD 2TB Elements portable USB 3.0 (the next day actually) his monitor, keyboard, and mouse quit working. I saw that his MSI X470 Gaming Plus Max AM4 MB had every USB port filled with various crap...looked like a porcupine. His poor Corsair VS 450 PSU had to handle a Kingston 120GB SSD (OS boot drive), a WD 250GB HDD, a Seagate Barracuda 500GB HDD, the Ryzen 5 3600 w/stock cooler, 16GB G. Skill Trident Z Neo 3600, 5 UpHere RGB case fans, MSI GTX 1650 Super GPU with 2 monitors plugged in, and some gadget that appeared to have a few mini monitor screens. All the fans and lights were working, as well as the GPU fans.
I noted the PSU didn’t have a connector for the 4-pin CPU PWR2 on the board, which I’ve read the MB only really needs the 8-pin CPU PWR1 but I can’t help but wonder so I suggested he buy a more powerful PSU to handle all the stuff he had plugged in and to make sure it had the connector to fill that void. Well a new Tier A Corsair RM1000e PSU didn’t help so I tried a known working GPU but that didn’t help either. I should note that no settings were ever tinkered with. Everything was totally factory and his Metallic Gear Neo case w/2 Skiron RGB fans, plus the 5-pack of UpHere RGB fans offered plenty of ventilation. I thought for sure the motherboard had an issue.
The next stop was Micro Center for component isolation, which took 3 weeks and they said the problem was the Ryzen 5 3600 failed and it would be $180 to fix. He added that although he just bought everything last summer, (from MC) he didn’t buy the warranty so that’s something he’d have to handle himself through AMD.
Noting how many of those chips have been diagnosed as “bad” plus the one I mentioned, I’m beginning to think AMD may have a dependability issue starting to emerge.
EDIT: MC scared the kid by saying the new CPU would require a new OS and it would wipe everything on the drive. When the kid asked about saving his games and the cost of the next CPU upgrade. The tech said the ($129.99) Ryzen 5 5600 was actually just $20.01 more than the ($109.98) Ryzen 5 3600 and mentioned flashing the MB.
So with another $150 to save his data, the price approached $500.
I suggested that since his OS was on a cheap $20 120GB SSD that he should just put in a higher capacity & faster NVMe for the OS and the old SSD wouldn't have to be touched.
Plus he already had the activation key from the previous OS installation.
Next thing I know, the tech installed the ($49.99) Inland QN322 1TB NVMe and ended up costing “$200 something.” (his words)
I’m able to boot in b1 & b2 but anytime I try to use a1 or a2 I get error code 55 yellow light on ram everything lights on pc but no display on keyboard and mouse. Do you think this is a problem with the cpu or motherboard slits please let me know anything is helpful.
Greg, I don't know whether this interests you, but for bench testing a motherboard, I use a removable, external power switch. I like this better than physically shorting the power switch pins when I have to start up the same board many times. Of course, if you are working with a number of different boards at the same time, this would be more time consuming than shorting with a screwdriver.
Hey Greg, found a intresting thing!! 16:30 / 16:37 time in the video when u press on the ram with your finger (Dram) light Changes to CPu , i dont know im the only one noticing it.. Love ur Content!!
I have a viewsonic version of that portable monitor. I absolutely love it. High refresh rate and relatively good brightness and color. It’s a really nice thing to have around.
I find it interesting the number of memory issues. Just replaced my 2700 X as the memory controller is depredating. It stopped being able to run the memory on XMP, so I had to drop the voltages and speeds down a few months ago(6) and it was stable until yesterday when the problem made a return. However, I believe a bad bios was to blame in my case as I found that the XMP had gone 1.45 V on Auto which was fixed with a bios update.
Commenting before I finish the vid, I've only hit the 5 min mark. But what you said in the 4:30 - 4:59 is exactly why you're one of my favorites.
For the 3600 cpus I would try a slower ram speed. I have a 3600X that will do nothing but give me issues if I use anything over 2400mhz ram. Its a definite issue and I only use the chip in a b550 board to test gpus and other components.
I've seen 3950 chips do that too, anything over 3000Mhz RAM and they wont post.
@@Smakheed Wow! I set up my friends workstation pc with a 3900xt and 32gb of 3200 and it just worked. And that was on a b450 AsRock board. I was using my 3600x on an asus x570 and figured 3200 would be fine but the cpu just doesnt like faster speeds. I'm sure with some tweaking i could get it stable but I don't have the time or desire to do so.
@12:40 , Should set the RAM speed lower , I have same issues with mine , never OC'd and was always set in eco mode with 45w limits etc
I got a problem similar to this, I got a MSI B 550 A Pro motherboard with a Ryzen 5 5600. Worked great for a month, no heating issues. One day I go to power on the PC and it does nothing. Changed out the PSU and ram, gpu. still the same. Then I took out the CPU and it fired right up, but the motherboard will no longer let me bios flash. Keep in mind you do not need ram or cpu in most msi boards with bios button.
It makes sense that the IMC in AMD chips fail more than non K intel chips. The more freedom you give users the more variety you need to support ram wise. Running outside of its preferred settings with higher clocks and tighter timings is higher risk.
Similar to turning up the boost on turbo cars. It puts pressure on other parts: clutch, fuel system etc.
I loveeee buying "dead" bent or broken ryzen cpus, and now they power all my rigs
Any troubleshooting is a great thing.
I've been in IT for over 30 years and I've personally only seen 1 failed processor. Two users with failed processors would have been crazy to me, but 5!? Intel isn't the best but this is shenanigans.
Well done, young man. My father will be turning 91 in about 3 weeks and I will be turning 54 a week after that. We both still learn from mistakes. If you didn't there would not be much point in having your channel.
Used your code and got Win 11 key for $21- Thanks man!!
Greg did you notice the spark at 6:28 would that be the cause of the issue?
Oh shit! Good catch
The 5900x is a lot like a 9900k I had once. If you overvolt the snot out of it (1.45V Vcore or something) it’ll go into windows and work fine for a while, then it will degrade and eventually require more voltage or lower clocks. It eventually got to the point on the 9900k for me where I was down to disabling 4 of its 8 cores, and locking clocks at 4.0GHz at 1.4V. I just trashed it at that point.
After many Intel builds since 2000 this was my first AMD, Three years of happiness, now? I've spent more than a month plus a good sum trying to fix it, this is the problem I'm having.
Thank you
But can any of these get replaced under warranty?
Love this series
@12:35 hahah this happened to my first Ryzen 9 5950x after trying out the dynamic oc feature on the Asus Dark Hero x570 it kept cycling between this and corrupted boot device.
Just hanging on the dram light...great video
17:29 a spark was there under the motherboard :))
I am wondering if the silicon shortage over the last two years has contributed to this issue, are manufacturers reducing the failure threshold/overhead to pump out their products? I am in the process of building my new 13th gen system and it is a major concern that component quality might be compromised by the current lack of raw materials globally.
try disabling one of the cores or try baking them in the oven 150c for 30 min. the oven method has worked many times for me and some of the pc tech still works fine after 2 years. oh, and Greg u love being right, dont u;) pz
IIRC The 3600's have a max listed compatibility on XMP of 3200. I ran into a Ryzen 2700x that is listed as 2933 and it, more or less, meant it! I managed to get it set at 3000 and stable 99% of the time. I've learned to leave everything at "Auto" while testing, then once it's a fix or replace, I start playing with settings. The issue with the 2700x didn't start until they "upgraded" the DRAM to a faster, RGB type. The "upgrade" worked fine in my RIG and a test rig but I'm running 5000 series AMD and Gen 11 Intel. Just a thought.
Understand that tis video is 11 months old as my eyes see it so you may, or may not, have already thought of this, retested or got answers from AMD about these CPU's.
I built a new AM5 system with an 7950X in November. A couple of weeks ago it wouldn't wake from sleep so after a power cycle, got the DRAM debug light (ASUS Tuf B650 plus). I couldn't believe that two sticks of corsair vengeance had both failed at the same time. After testing the memory (single sticks in all slots - 8 tests) then RMA'ing the motherboard and CPU; turns out the CPU had died. CPU always ran cool as I was running it on a 120w envelope rather than 170w, with an AIO cooler. Never had a CPU fail in 40 years until now.
The labeling of the cpu as "bad" reminded me of a system one of my clients asked me to diagnose and the only information I had to go on was the label which read "doesn't".
I have a suggestion. Would be interesting to try a direct die cooling in the stubborn CPU.
Given that a 3900x is just a failed 3950x I'd venture the guess that one or more of the remaining cores from the 12 in use and still 16 physically there is bad. Well not quite bad as in non-functional but bad as in cannot operate at stock clock within stock voltage, and may have worked if binned down into a 3600 with 200mhz less base and 400mhz less boost, or if sold into an even lower sku like a ryzen 3 where 4 cores on a die would be disabled instead of 2.
Love your debug GPU! Cute
About dram issue, try pressing cpu a bit harder to the mount. I saw ONE cpu where mount pressure needed to work consistently was a bit unreasonable. Maybe that was not one time wired chip.
Just FYI try different slots for gpu and drive, sometimes you have just tiny part of the cpu shorted, and for example m.2 slots are unusable, but otherwise cpu works. Disable that part in bios, and you have yourself working, although crippled a bit cpu.
It's not wired that same skews have same issues, it's likely that same type of machine made same mistake. Likely there is some kind of trace that is just a touch too small, and if combined with some other issue, maybe mount pressure too weak or something, just fries.
Weak mount I find is starting point to this kind of issues, where cpu kinda works, but use while using very specific thing like avx512, it turns off or freezes.
For Dram, try LP modules. I have no idea if they would work, but it would be interesting to find out.
At 19:50 the keyboard is not connected wirelessly, maybe something would happen with a wired one, no idea though
6:27 There was a spark on the cpu socket when you turned the power on
Techyescity tested few ryzen 3000 series cpu's and what he noticed that they are running pretty hot. He came to conclusion that running those "older" cpu's with modern graphics cards makes them run a bit too hot, which will shorten their lifecycle. Like the modern gpu asks too much from cpu. He was able to bring cpu temps down with undervolting
I have a strange 3900x. It will work with 2 dimm's of memory as long as they are both in a1 b1 or a2 b2 configuration, or one dimm in any slot. It won't do anything with the standard dual rank motherboard configs. So it's always in single rank config. It does not matter what you toss in there or the speed/size, either single or dual rank. I had this chip on several motherboards now and it's always the same issue. If you only have 2 dimm slots you can only use 1 dimm. It will not work with 4 dimm's at all. No matter what it still runs at the rated dimm speeds when you select the Xmp profile. Size of the dimm's don't affect it either. It will run just fine on 64 gig's in single rank. I have never figured out what is going on with it and it's only used in a NAS server so I just kept it and never returned it. So both channels work but the chip won't work with both channels.
Greg, I envy your access to hardware, but I don't envy how much thermal paste you must go through in a week. I'd love to know what brand you use though.
Did you clear the CMOS for every CPU?
The last processor is not a problem with temperatures, only this processor is already degraded. So lowering the clocks helped.
Hi Greg, I have had two AMD 2700x cpu's with "B" channel memory controller's faulty. Still being used with 16Gb in "A" channel only. One died and replaced by AMD only for the "B" channel to fail again. Thought it was memory stick's at first as 4 sticks of Corsair 32Gb 3200 installed, only 16Gb usable on two M/B's, a MSi tomahawk and a crosshair VI hero. Love the channel.
Mike (retired PC tec)
time for Hollister sponsorship ! Love this brand 👌
The last one I wonder if de-lidding and some thermal paste renewal, then replace the IHS (I know it can move around after de-lidding but still) and then carefully apply the CPU cooler again and try running it on normal just as an experiment to see if it will run on auto for vcore etc. Just because cheap paste is usually used. I just wonder if it has been run too hot and the paste is too crappy and now failed. I think that would be a good experiment.
i think the ihs is soldered on all ryzen cpus
The answer as to why the 5900x is stable in the UEFI environment, but not once Windows starts loading is pretty simple: UEFI environment is uniprocessor (e.g. only running on core 0). Windows' NT kernel, upon initialization, not only runs on each CPU (core), it has to initialize certain per-cpu data structures, etc.. A problem with one of the CPU cores, beyond 0, is likely crashing the system early in the OS boot process. My guess is it was overclocked and overvolted to the point of damage.
Maybe contact Steve from gamer nexus. This sort of situation might be down his alley. Especially if it shows to be a batch wide situation There could be hundreds of Ryzen 3000 chips on the verge of going bad and could open exactly why they could be turning. Might turn into an interesting collab.
Love the graded Yu-Gi-Oh cards in the back !
I've had a friends 1600AF restart after a short while in windows, basically when its loading apps. Ended up lowering the voltage to something like 1.15ish and the frequency by a few hundred MHz and he ran it like that for a couple of months before upgrading.
That CPU was part of a pre-built that had a warranty on the whole system, not by component (around here its not uncommon for a pre-built with off-the-shelf parts to have individual warranties) AND IT HAD A WARRANTY STICKERS ON THE CASE SIDE PANELS. My friend broke them since he isnt an animal and wouldn't keep his PC uncleaned for two years and because of that they denied warranty... I wasn't a fan of the company before and frankly their prices are far from great, but needles to say I don't even open their site anymore.
im about to get a 7800x3d and first amd cpu for me since athlon, should i be worried about the quality of these amd chips?
TechYesCity also had this overheating issue with ryzen 3600s. You can also try disabling cpu cores if undervolting doesn't work and see if it works.
I have 2 "Dead" 3600's and if i undervolt them they work fine. I also have a 3900x that only boots if PBO is disabled. We have a few 1600's with the dead memory controller. My guess? These are all low quality binned chips that were created close to the end of the MFG cycle and were bottom bottom bottom of the barrel, they were sent out into the wild. Overclocked and that was that. I could be wrong but I have ran into so many degraded chips that work if you undervolt / underclock them. All of them are Ryzen. That flash that you seen when the mobo powered off and then back on is Ryzen memory training. it takes literally a few seconds.
1:58 While i agree that making a Test Bench for CPU check is complicated and hard.
But making a cheap one specifically to test GPU's should be possible and save a lot of time checking for GPU related issues.
i changes some settings in bios i think cpu volatage ... then pc turn off himsef and wont boot again i try everything cmos reset ...what should i do its my new config 😔😔
8:06 di something short underneath the motherboard?
Make sure to disable some cores in the BIOS to see if chips will run afterwards. You can try manually increasing voltages or lowering clocks too. Sometimes some cores will die completely, other times just enabling PBO or even any boost frequency at all will be the issue. I've also seen it where it would basically need an extra 50mV to hold the advertised boost frequency. Mostly, these chips can be saved if you're willing to make compromises.
looked on amd forums…
one guy CrispyCrunch wrote…
“I got a new angle on this. So deactivating PBO and CBS definetely works, PC was running stable for a week now. But you'll loose performance.
So I wrote to the MSI support and the AMD support.
MSI suggested to try increasing the DRAM Voltage by 0.05 V, which I did. System seems to be stable, no crashes so far - neither in idle or while gaming.”
Quite funny to see this video because I do the same from time to time. Go back to hardware lying around trying to fix it again and again - and often forgot what was the problem. I got some things repaired tho so it's always good to try it later again.
Hope AMD will accept that they'll take a look at it. It would be interesting to know what happend to the cpus.
Hi Greg
Thanks for the always informative, and thorough videos.
I wonder if you would share some Fix or flop statistics?
Do you have as many faulty intel processors as ryzens? Put up against how many you got it in?
My buddy gets Amazon returns and he has gotten alot of Ryzen 3000 series. He sees to always have issues with 3600s and 3900xs. The 6 core 3000 series bins don't seemed to fair well. While the 8 core bins, 3700x and 3950x and even the 4 core bin 3100 and 3300x seemed to fair much better.
I had a boot looping 7700K and was behaving like a bad overclock even though it was on stock settings. So I increased the voltage and lowered the clock speed. It then worked without crashing and after a couple of weeks set back to stock and it worked perfect.