Hello,
we've ordered a 80-core 2.6 GHz dev kit. While testing the frequency, I noticed 2.3 GHz. I looked at the various cpufreq settings for the CPPC driver, tried to enable "boost" (not supported here), etc, still no way to go beyond 2.3 GHz. I noticed that the cpufreq would accept any frequency between 1.0 and 2.6 GHz so I tried all of them in 2 MHz increments and measured the effective one. I observed 50 MHz steps. The day after I found the machine running at 2.6 GHz without knowing why! I ran the same tests and the frequency was growing in 81.25 MHz steps this time, and without being capped anymore at 2.3. In the first case I noticed 27 frequency bins from 1000 to 2300 MHz and in the second case I noticed 20 frequency bins from 1056 to 2600 MHz. That was a bit puzzling, so I took this opportunity to take a few extra measurements and to reboot. The machine rebooted at 2.3 again and I only managed to make it reach 2.6 the day after, again!
I managed to make it boot at 2.6 by disabling both CPPC and LPI in the BIOS ACPI settings (just blacklisting cppc_cpufreq has no effect). However this time, after I use it for a while, it goes back to 2.3 and stays there.
So there's really something wrong here. Also I noticed that when it's running at 2.3 GHz, it's not perfectly smooth, there is a little bit of noise in the measurement, as if the machine was throttling a little bit, or was enabling a very wide spread spectrum.
For me it's particularly annoying because we've ordered this machine to optimize software for various multi-core scenarios, which involves regularly rebooting to change the NUMA config (1, 2 or 4 nodes), and seeing it not use its full performance at boot, or suddenly drop in capacity after some time it quite problematic for performance testing. It's also annoying as I'm building on the machine, and seeing build times increase by 13% without any reason is annoying.
BTW I installed Ubuntu 22.04 with kernel 5.15.0-76. But given that it's observable even without cppc_cpufreq, I'm pretty sure now that the kernel is irrelevant.
I'd like to know if this matches anything others have observed, if there's a known workaround etc. I verified that the CPU wasn't particularly hot, and stuff like this. In case that helps, I noted the following info from the BIOS:
Board AVA Developer Platform AHB Clock
SCP FW Version 2.06
SCP FW Build 20220308
MMC FW Version 02.04
MCU FW Version NA
CPU Ampere(R) Altra(R)
Processor
CPU Clock 2600MHz
PCP Clock 1450MHz
L1I CACHE 64KB
L1D CACHE 64KB
L2 CACHE 1MB
SOC Clock 2000MHz
Sys Clock 400MHz
AHB Clock 200MHz
It really feels odd that when it's slow the frequency steps are not that sharp anymore. I'm attaching a graph I've made below.
The script to produce this is ultra-simple:
f=1000000
while [ $f -lt 2700000 ]; do
echo ${f} | sudo tee /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq >/dev/null
echo ${f%???} $(taskset -c 0 ~/mhz/mhz -c -i 1 0 $((f/2)))
f=$((f+2000))
done
Oh, and the utility used to measure the frequency is "mhz" :
git clone https://github.com/wtarreau/mhz
cd mhz
make
./mhz -c 5
2299.361
2299.333
2299.448
2299.361
2299.333
Thanks for any idea!
Willy