Hello,
I've been spending several days trying to figure why the network performance was so low on the dev kit (Q80-26). I noticed that even as little as 1 Gbps of traffic over the LAN port was enough to keep several cores busy in ksoftirq, which made no sense. Today I needed to run some tests with a 100G NIC so I started to play with it, to notice exactly the same performance limitation as on the on-board port, ~1 Mpps in each direction, no more, with possibly all cores at full throttle.
The kernel was up-to-date for an ubuntu 22 (5.15.0-78), and as usual on ubuntu, the perf tool doesn't work, so I rebuilt it and could notice that almost all the CPU was spent in a "mov daif, x3" instruction in the kernel, in arm_smmu_cmdq_issue_cmdlist(). I tried to re-enable bypass by passing "arm_smmu_v3.disable_bypass=n" to the kernel but it had no effect. I suspected it was ignored by 5.15 so I upgraded to 6.2 (which lacks support for intel ICE board which is present in 5.15). I also had to switch from the intel to a Mellanox ConnectX-6 board. It faced exactly the same problem and started spewing AER errors due to ASPM. I also disabled ASPM, and it calmed all messages but the performance remained abysmal at exactly the same level. Finally I found that arm64 supports "iommu.passthrough=1", and that was it, now I'm doing 9.2 Mpps in both directions! Oh, and now pulling 25 Gbps of HTTP traffic out of it takes 1.2% of CPU, or just one core, much better :-)
For those interested, and who have to deal with PCIe devices (mainly network), just be aware that you'll absolutely need to enable IOMMU passthrough or the device will basically be unusable. Even the NVME SSDs now deliver 2.6 GB/s versus ~550 MB before! Just to save time to those experimenting, here's the grub cmdline I'm using (in /etc/default/grub on ubuntu 22):
GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off net.ifnames=0 biosdevname=0 arm_smmu_v3.disable_bypass=n pcie_aspm=off iommu.passthrough=1"
In hope it helps others facing similar trouble...