11
cuddlyogre
124d

I decided to do a daily Blender render just as a creative challenge for myself. I was so excited because I built an awesome computer this year and was eager to put it to use, only to be hit with seemingly random BSODs and crashes a couple seconds into the render.

As it turns out, my CPU cooler is woefully inadequate to handle 32 threads running at once, which was the cause of my crashes. Turning the render threads down to 8 keeps the temps low enough to finish the render.

In a way, it's a relief because the alternative was a problem with my $900 GPU. I've ordered a bigger CPU cooler and well reviewed heatsink paste which will hopefully fix it. If not, I'm going to have to go water-cooled, which I really don't want to do.

Comments
  • 2
    Which CPU and cooler do you have, and which new cooler did you order?
  • 2
    Fun facts...

    It might not be the temp - have you validated that the CPU really overheats?

    It could be a voltage problem, an IO problem (your OS needs a few resources to handle resources) or sth else entirely.

    One should be very cautious nowadays.

    Have seen the wildest thing happened and usually the easiest conclusion was the wrong one.

    Unless we're talking about smoke coming out of the machine and burned plastic smell.
  • 0
    @Fast-Nop

    Original cooler: https://newegg.com/cooler-master-hy...

    CPU: https://newegg.com/intel-core-i9-13...

    New cooler: https://newegg.com/p/...

    I'm open to the idea that I'm not properly applying the thermal paste, but there are no dead spots, so I think I'm doing it right.
  • 1
    @IntrusionCM Using GCC, the temp immediately spikes to the max on the graph (100C). But it's probably much higher. It usually crashes after that. When I turn the threads down, it only goes up to 95C.
  • 6
    @cuddlyogre

    Out of curiosity... What CPU and did u peel the plastic off the heat pad if the CPU had a heat pad instead of thermal paste...

    Cause even a boxed cooler should never reach this ... Ever. Unless there is something wrong.

    And no, not meant in an offensive way. We all had brain farts assembling sometimes....
  • 2
    @IntrusionCM I've removed and cleaned both the CPU and cooler a couple times so I'm 99% certain I did. I'll check again when my new cooler comes in. I would love if it were that simple because it will save me a nice chunk of change.

    The fan's RPMs are in the advertised range, but it's not very loud, so I wonder if it is moving enough air.
  • 1
    @cuddlyogre did u check if the mainboard maybe has a special setting for voltage regulation for fans or you used maybe a regulated seperate PIN connector on the mainboard?

    Sometimes its well hidden. SuperMicro mainboards e.g. have a fan voltage regulation we almost always disable...

    Sounds great first, but in our case NVMEs got very unhappy as the air stream hadn't have enough pressure to cool down the PCIEx NVMEs.
  • 0
    @cuddlyogre Yeah, the original one doesn't look right for a 13900K, and the new one should dot better. However, it depends on the power level that you set for the CPU.

    At full blast, no air cooler will cope with that, but the performance gain is minimal because already in stock config, the CPU is way beyond its sweet spot, and prolonged all-core loads will probably see throttling even with the new cooler.

    However, the system should not crash, just throttle, that's the normal behaviour. I'd also check the RAM - one common mistake is using four sticks and expecting the same speed as with two.
  • 1
    @Fast-Nop >one common mistake is using four sticks and expecting the same speed as with two

    Could you please clarify? I have 4 sticks totaling 128GB.
  • 0
    @cuddlyogre Because you have four mechanical slots, but only two channels in desktop CPUs. With two sticks, each channel in the RAM controller has to deal with the electrical load of two sticks. So the advertised speed of RAM and CPU refers to the one-stick-per-channel setup.

    Intel's spec is vague and only gives "up to" speeds, but that also refers to the best case of one stick per channel.

    AMD is more open about that. Ryzen 7000 lists DDR5-5200 for two sticks and DDR5-3600 for four sticks. Or DDR4-3200 vs DDR4-2667 with Ryzen 5000.
  • 1
    @Fast-Nop is the amount of physical sticks that important anyway? Two single rank sticks in the same channel should be quite similar to a single dual rank stick, at least from the memory controller's perspective. With two dual rank sticks, you probably get some performance penalty that outweighs benefits from rank interleaving.
  • 1
    @electrineer Yes, it is.

    Data from AMD's website for 7800X3D: 2x1R DDR5-5200, 2x2R DDR5-5200, 4x1R DDR5-3600, 4x2R DDR5-3600.

    For 5800X3D: 2x1R DDR4-3200, 2x2R DDR4-3200, 4x1R DDR4-2933, 4x2R DDR4-2667.

    As you can see, single / dual rank only plays a role in 4x2 vs 4x1 config, and only on the older Ryzen 5000. But in all cases, 4x1 is listed slower than 2x2.
  • 0
    @Fast-Nop It's been a while, so I'm a bit rusty. Are you saying it's better to have only one RAM stick per channel?

    When I have only one stick in, my machine never boots. But, I've never tried one in each channel.
  • 0
    @Fast-Nop could be my brain melting down...

    But aren't you confusing things here?

    Number of sticks != number of channels.

    Afaik 4 is always the way to go.

    (EDIT: 4 single rank of course)
  • 0
    @cuddlyogre Yes, and if you number the slot closest to the CPU as 1, the one furthest away as 4, then the typically recommended configs are:

    - One stick in slot 2.

    - One stick in slot 2, one stick in slot 4.

    - All four slots occupied.

    Booting with only one stick should normally work, but you may need to do a bios reset.
  • 0
    @IntrusionCM I haven't built a new computer since 2012, so I basically have no idea what I'm talking about. That also may be part of the overheating problem I'm having.
  • 1
    @IntrusionCM Yes, desktop CPUs always have two channels, but desktop mobos have four slots. For best performance, two sticks is recommended, i.e. one stick per channel. See also AMD's specs above.

    The reason is that the signal traces on mobos are daisy-chained, so the signal first goes to slot 1, then same same lines continue to slot 3. Same with slots 2 and 4.

    Now, if you populate only slots 2 and 4, you put the RAM cleanly at the end of the chain, all good. But if you populate only 1 and 3, you have wires hanging on the bus after the RAM, leading to reflections.

    And if you populate all four slots, then you have unequal signal distance within the same logical channel, which deteriorates the signals. Which is also why 4x1R is much worse than 2x2R.
  • 1
    @Fast-Nop

    Wasn't there quad channel available in some mobos / consumer PCs?

    I definitely know that some consumer ZEN ran better with 4 single rank RAM vs 2.

    Very hard topic, small details matter.

    I rarely build consumer PCs nowadays. Server market has had e.g. quad channel for a long time so the rules are very different.
  • 0
    @IntrusionCM I don't remember quad channel in consumer boards because there were no CPUs with quad channel RAM controllers AFAIK.

    That is, unless you count Threadripper as consumer, those have four or eight channels, depending on the model.

    With the performance, I am referring to what the CPU actually guarantees. If you go, say, 4x1R at DDR4-3200, then that's already out of spec, and if you're willing to do that, you could also go DDR4-3600 in 2x2R. The stability headroom is just higher with only two sticks, that follows from the spec.
  • 0
    @cuddlyogre The short takeaway is: if you are using all four slots, then the advertised speed of your RAM may not be entirely stable. You could reduce the RAM speed in the bios, starting with XMP entirely disabled, and repeat your tests with your current cooler.

    If it still gets to 100°C all-core and throttles, but no BSOD anymore, then you're probably running your RAM too fast.

    Oh, and check whether there's a bios update for your mobo - that often improves RAM support and stability.
  • 1
    @Fast-Nop This is my motherboard. It's dual channel.

    https://newegg.com/p/...

    It has 4 sticks of this RAM.

    https://newegg.com/corsair-64gb-288...

    Does your advice still apply?

    I'll give this all a shot when I get home.
  • 0
    @cuddlyogre That's four sticks of the same RAM, but sold as two kits with two sticks each, not one kit with four sticks. There might be differences between kits (not within kits).

    DDR5-5200 is fine for a 13900K, it has "up to" 5600 guaranteed, but Intel doesn't say what the guaranteed speed with four sticks is.

    Actually, did you even activate XMP in the bios? Because if not, it doesn't even run at 5200 so that disabling XMP won't bring any new results.
  • 1
    @Fast-Nop Oh boy, I don't even know. I'll have to check when I get home.
  • 4
    Omg there's a bunch of bent pins. I'm surprised it even booted. This is going to be an expensive afternoon. I'll let you know how it goes.
  • 0
    @cuddlyogre

    If the CPU could talk:

    Surprised Motherfucker? I'm still kicking it. Let me burn down the fucking remaining shit! Mwahahhahahaha
  • 1
    Good lord what a mess.

    The amount of money I burned today will haunt me until I die.

    I got an Asus motherboard because that's all that was left and I need this computer for work. $400 out the window.

    If you have DDR5, you can use all 4 RAM slots (probably) or your computer will never boot. Meaning I have 64GB of wasted RAM that gets to look pretty on my desk. It only took 8 hours to figure that out. $200 out the window.

    I needed a cooler that could handle my renders so I went with an AIO. It looks cool and was actually easier to install than the one I had before. $200 out the window.

    All in all, I think it's running a lot faster though.

    I'm glad I have a credit card...
  • 1
    Rog Strix more like BSOD all the time am I right?

    I hope they let me return it.
  • 0
    @cuddlyogre Using all four slots should work, but you may need to reduce the RAM speed in the bios. You can look up the mobo QVL that lists specific RAM kits.

    Don't have issues with my ROG Strix mobo, but that's AM4 (i.e. AMD), DDR4-3600, and 2x16GB. Slightly out of spec because the CPU supports only up to DDR4-3200 officially, but it's rock stable.

    Did you try to update your bios? Even with brand new boards, the bios is usually outdated by the time it reaches the end customer. Asus' bios release notes mention improved stability all the time.
  • 1
    @Fast-Nop Oh believe me, I tried everything. My RAM is on the list - 2x32gb. I have two kits. I updated, limited clock speed, tried XMP on and off. Reseated every RAM stick several times, nada

    If I put anything in the first slots, I get qcode 55 - no RAM installed. That happens to almost everyone that uses this board.

    If I put them in the second slots, it will boot about 50% of the time and bsod with some nonsense code and give me a qcode 31 - RAM installed - repeatedly or more 55.

    It's supposed to be something related to DDR5, but my now deceased Gigabyte board has zero issues.

    It's a very common topic on various forums.

    I'm just going to reorder another Gigabyte board since it had no problem with my RAM, but I'll experiment with just using 2 slots to see what kind of performance boost I get.

    I'm going to try to return the Asus board this morning and I've cancelled the other heatsink order, so fingers crossed I'm only out the money for the new board and cooler.
  • 0
    13900k at max load under 90 deg C here. The only cooler I've tried that can pull it off reliably is DeepCool LT 720 (we have three machines with 13900k+LT720). It runs this chip at 315W according to hwinfo in Cinebench.

    Tbh I usually gate it to 250W or lower, not much perf difference but way nicer for the system.
  • 1
    @Fast-Nop @cuddlyogre DDR4 128GB was pretty fiddly even with 1 kit of 4 sticks. We went through several kits and two mobos for one of the machines.

    DDR5 is even more sensitive - one of the machines is stuck on 64GB because the 128gb kit we bought didn't work and we're just waiting for prices to drop now. And 2 kits of 2 sticks is basically not going to work, the timings are too tight. I'd be happy with ddr5-5200 with 4 sticks, never seen a stable ddr5-6000+ system with 4 sticks (consumer platform ofc).

    If it's any consolation, you should see the memory latency vs memory capacity tradeoff for your application. If your working set is small and fits in memory and your application is memory bound, then better RAM timings will (might) help. Otherwise, it's a waste - if your application is not memory bound anyway, then faster will not help, and if your working set is large, then larger mem is (generally) better than faster mem because you'll avoid paging to disk which is paainfully slow.
  • 1
    @RememberMe That's why I stay on AM4 with DDR4-3600-CL16 2x16GB. With the 3D cache of a 5800X3D, faster RAM would be pointless. I don't trust DDR5 and neither AM5 for now, given the general industry trend of shelling out immature products at premium prices.
  • 2
    @Fast-Nop yeah, that's a good strategy (although again that's your particular mix of working set size and throughput/latency/occupancy requirements). I needed the absolute fastest mem throughput for one set of workloads, and large capacity and occupancy for another set, so the current small 64GB DDR5 system and bigger 128GB DDR4 systems worked out perfectly in price/perf. Sadly the usage patterns of both are too large for the big X3D caches to matter much, the L2/L3 cache is basically just acting as a prefetch/shmem buffer, so we needed to invest in RAM.

    Although, DDR5 can go to 48GB/stick now (maybe even 64GB/stick in future? Not sure) so there will eventually be a capacity benefit too.
  • 0
Add Comment