While I'm waiting for my Ampere Altra dev kit to arrive:
The CPU cooler looks small. How well does it work for you? How loud is the fan?
Are there any non-water alternatives? Are there any compatible 3U coolers available anywhere?
While I'm waiting for my Ampere Altra dev kit to arrive:
The CPU cooler looks small. How well does it work for you? How loud is the fan?
Are there any non-water alternatives? Are there any compatible 3U coolers available anywhere?
https://www.coolserver.com.cn/en/product_view_442_313.html
But I'm still waiting for this product, the manufacturer said it is out of stock for a long time.
Surprisingly, the cooler is not as noisy, as I'd have expected given its size vs. the TDP.
Sometimes is swithces on-an-off every few seconds, which is annoying, but otherwise the noise is fine.
Don't know if the colling is sufficient, as I haven't found a way to get the CPU temperature yet (sensors-detect doesn't find any, I don't see any sensors in /sys/class/thermal either. According to lscpu, I do reach ans sustain "CPU(s) scaling MHz: 100%" at high CPU load (cpupower frequency-info shows 2.8 GHz), and the fan gets noisy, though the metal block on the CPU still feels warm, not hot to the touch (unlike one of the blocks on the voltage regulators).
So the cooler that comes with the AADK apparently is fine for the 96 core Ampere Altra Max,
I'm waiting for the fan adapter to connect to the module, so in the meantime I'm using a Noctua fan controller connected to a header on the carrier.
"sensors-detect" doesn't detect any either, but just running 'sensors' reports several temperatures:
root@amp:/home/bcran# sensors
nvme-pci-50400
Adapter: PCI adapter
Composite: +30.9°C (low = -273.1°C, high = +81.8°C)
(crit = +84.8°C)
Sensor 1: +30.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +37.9°C (low = -273.1°C, high = +65261.8°C)
nvme-pci-d0100
Adapter: PCI adapter
Composite: +34.9°C (low = -273.1°C, high = +71.8°C)
(crit = +74.8°C)
Sensor 1: +47.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +36.9°C (low = -273.1°C, high = +65261.8°C)
apm_xgene-isa-0000
Adapter: ISA adapter
SoC Temperature: +38.0°C
CPU power: 8.52 W
IO power: 19.04 W
nvme-pci-0100
Adapter: PCI adapter
Composite: +36.9°C (low = -273.1°C, high = +79.8°C)
(crit = +86.8°C)
Sensor 1: +36.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +34.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 3: +33.9°C (low = -273.1°C, high = +65261.8°C)
True, similar here, with "sensors". I also have one sensor per NVMe SSD, and also one for my NIC.
Apparently, the "apm_xgene-isa-0000" is the CPU sensor. By high system load for a dozen minutes, for me that temperature gets to 90°C, CPU power consumption at 120W, I/O at 30W.
apm_xgene-isa-0000
Adapter: ISA adapter
SoC Temperature: +86.0°C
CPU power: 116.32 W
IO power: 29.61 W
@Philipp Krause Yes, the stock cooler may be difficult to handle the heat with more than 64 cores. If you want to use air cooler, maybe 4U is recommended.
On my system, I'm currently using a Alphacool Eisbear Pro Aurora 240 water cooler with ampere bracket, same as this article: https://www.ipi.wiki/community/forum/topic/79192/compatible-coolers
My Q80-30 now has a 35 degree temperature when idle, and 52 degree with 100% load.
@Philipp Krause That seem too hot to me, and just 5 degrees lower than the throttling temperature according to the Altra Max datasheet. I'm not sure if you'd want to change the CPU cooler, or increase airflow through the case.
My case is 3U, so a 4U coller wouldn't fit. Also, I don't know of any place to buy an air cooler that would fit here, so I can't really change the cooloer. For now, the case is open top sitting on the bench, not yet moved into the rack, so "improving airflow through the case" is not yet possible.
Still, 90°C is a lot, especially since we are in autumn, now; I guess that means throttling in summer, unless I manage to improve the cooling.
My impression was that case fans were not running at full speed, even when the CPU was at 90°C. I have two, connected to the FAN1 and FAN2 headers. Still they cool the power regulators at the front very well; the cooler on them is always cool; by contract the voltage regualtors at the back always feel hot, with the CPU cooler in between. Also, even with the CPU at 90°C, the CPU cooler didn't feel oainfully hot when touched, so I wonder if heat transmission from the CPU to the cooler could be the bottleneck. Maybe I should replace that termal pad that came with the AADK by termal paste?
@Philipp Krause if you look at this discussion, the thermal paste could possibly be improved.
https://www.ipi.wiki/community/forum/topic/79192/compatible-coolers
I remember someone doing some experiments using IBM Power 9 CPUs. Basically, thermal paste was much better than common termal pads, and indium termal pads were slightly better than pastes. Personally, I also use termal paste on Power 9 CPUs at wattages were an indium pad is officially recommended, and it seems fine so far.
So next week, I might try replacing the pad from the AADK by paste, and see if that improves cooling.
Thermal paste vs. pad doesn't make much of a difference for me.
With the case closed, the CPU temperatur goes to 105°C.
IMO, the current CPU cooler is not very well suited for a rack case: It blows downwards, and the shape of the cooler, and the placement of the RAM then make that air move left and right. But rack airflow is typically front to back. Lack of space forcing me to put the AADK into a short 3U case probably doesn't help either.
Replacing that 60mm rear fan by two stronger ones (effectively quadruplung air throughput there) helped a bit. With those, I now get the open-case temperatures now also with a closed case.
But I also found a workload that gets CPU power to 160W, and then the CPU goes to 105°C again.