Some info on crashing Ryzen CPUs and what to do about it.
There several ways one can reduce the CPU power consumption;
Reduce the clock frequency.
On a multi core system this can be done on a per core basis.
Reduce the voltage.
This also makes the core Slower!
The voltage should be reduced after slowing down the clock.
And increased before increasing the clock frequency.
Switching off cores completely.
This in known as 'C6' and can be a source of trouble.
A Ryzen may be fine doing nothing at all. Or working very had running dozens
of virtual machines. But crash under varying load such a desktop machine or
light server.
This can be done in more than one way.
Some motherboards may need need a BIOS upgrade before this feature is
available.
Below the result of a 'zenstates.py -l' (more about this later);
~# zenstates.py -l P0 - Enabled - FID = 80 - DID = 8 - VID = 32 - Ratio = 32.00 - vCore = 1.23750 P1 - Enabled - FID = 8C - DID = A - VID = 50 - Ratio = 28.00 - vCore = 1.05000 P2 - Enabled - FID = 7C - DID = 10 - VID = 68 - Ratio = 15.50 - vCore = 0.90000 P3 - Disabled P4 - Disabled P5 - Disabled P6 - Disabled P7 - Disabled C6 State - Package - Enabled C6 State - Core - Disabled
In theory, 'C6 State - Package - Enabled' should be OK because the package (which contains the cores) never gets slower than the fastest CPU.
The BIOS may not have a 'disable C6' option.
Fortunately there is a Python script called 'zenstates.py' which can do
this for you.
For this script to work, you need a module called 'msr' loaded (modprobe msr). You can load this automatically on boot with an entry in a .conf file in '/etc/modules-load.d/'. For instance, add an entry in '/etc/modules-load.d/local.conf' or give it it's own file, E.G.: '/etc/modules-load.d/loadmsr.conf';
msr
After loading msr, you should have msr devices in /dev/cpu/*/;
~$ ll /dev/cpu/*/msr 0 crw------- 1 root root 202, 0 2024-02-12 12:08 /dev/cpu/0/msr 0 crw------- 1 root root 202, 10 2024-02-12 12:08 /dev/cpu/10/msr 0 crw------- 1 root root 202, 11 2024-02-12 12:08 /dev/cpu/11/msr 0 crw------- 1 root root 202, 1 2024-02-12 12:08 /dev/cpu/1/msr 0 crw------- 1 root root 202, 2 2024-02-12 12:08 /dev/cpu/2/msr 0 crw------- 1 root root 202, 3 2024-02-12 12:08 /dev/cpu/3/msr 0 crw------- 1 root root 202, 4 2024-02-12 12:08 /dev/cpu/4/msr 0 crw------- 1 root root 202, 5 2024-02-12 12:08 /dev/cpu/5/msr 0 crw------- 1 root root 202, 6 2024-02-12 12:08 /dev/cpu/6/msr 0 crw------- 1 root root 202, 7 2024-02-12 12:08 /dev/cpu/7/msr 0 crw------- 1 root root 202, 8 2024-02-12 12:08 /dev/cpu/8/msr 0 crw------- 1 root root 202, 9 2024-02-12 12:08 /dev/cpu/9/msr
You can download zenstates.py from GitHub: ZenStates-Linux. Download a zip file by clicking on the green 'code' symbol near the top right (you may need JavaScript enabled for this), and then unzip the file. Or clone it with git;
git clone https://github.com/r4m0n/ZenStates-Linux.git
For this script to work you need 'python3' and 'pypy3-lib'. A symlink from
python3 to python is also required. Debian has a package 'python-is-python3'
which creates this for you.
If all is well you should now be able to query the CPU with
'zenstates.py -l' (as shown above). And a 'zenstates.py --c6-disable'
disables C6.
There is also an installer which creates the systemd config to run the
above script on boot, which is also on GitHub:
disable-c6
If directory 'disable-c6/lib/ZenStates-Linux/' doesn't contain 'zenstates.py',
copy it from 'zenstates/ZenStates-Linux/zenstates.py' to
'disable-c6/lib/ZenStates-Linux/'. Next run the install script in
'disable-c6/'. This copies 'zenstates.py' to '/usr/local/bin/' and copies
'disable-c6.service' to '/usr/local/lib/systemd/system/'. It also creates
the necessary systemd symlinks.
With 'systemd-analyze' you can make an nice SVG of the boot sequence;
systemd-analyze plot > startup_order.svg
Below an excerpt from this image (click for full SVG);
It seems to 'confuse' the kernel a bit though. An excerpt from syslog:
'Feb 12 18:48:37 pc7 kernel: [ 12.803157] msr: Write to unrecognized MSR
0xc0010292 by python3 (pid: 638). Please report to x86@kernel.org'.
But 6 is disabled;
~# zenstates.py -l P0 - Enabled - FID = 80 - DID = 8 - VID = 58 - Ratio = 32.00 - vCore = 1.00000 P1 - Enabled - FID = 8C - DID = A - VID = 6C - Ratio = 28.00 - vCore = 0.87500 P2 - Enabled - FID = 7C - DID = 10 - VID = 7A - Ratio = 15.50 - vCore = 0.78750 P3 - Disabled P4 - Disabled P5 - Disabled P6 - Disabled P7 - Disabled C6 State - Package - Disabled C6 State - Core - Disabled
Note: This a different CPU from the above example. Hence the different voltages.
If you don't use systemd, but for instance SysV or Insserv, you need a
start script for zenstates.py.
'start' zenstates.py in single user mode (rcS), after mounting the local
file systems and loading the modules.
You may want to wait for syslogd as well. The image above can be a source
of inspiration.