In the Linuxx86_64 kernel, microarchitecture-specific optimizations are very common, and both Intel and AMD CPU families employ various performance tricks, while ARM64Linux kernel maintainers are opposed to the introduction of new microarchitecture-specific optimizations because they will affect new ARM processors.

Ampere Computing has sent a set of four patches to provide optimizations for its new AmpereOne server processors. Ampere Computing found that these new high-core-count ARM server processors can benefit from aggressive prefetching when using 4K page sizes. In sequential read performance tests, the benefits of using HugeTLB or Tmpfs were reported to be "up to 1.3~1.4x".

The test results show that in hugetlb or tmpfs, we can greatly improve the continuous read performance to 1.3x~1.4x.While these improvements are exciting for enhancing AmpereOneLinux performance, it currently appears that this work will not be uploaded to the mainline Linux kernel.

Well-known ARMLinux kernel developer WillDeacon expressed his views on the performance enhancement patch of AmpereOneCPU:

"We tend to shy away from microarchitecture-specific optimizations in arm64 kernels because these optimizations are very difficult to maintain, difficult to test properly, often result in bloat, and add additional barriers to updating our library routines.

Granted, we have some help for Thunder-X1 in copy_page() (disguised as ARM64_HAS_NO_HW_PREFETCH), but frankly, that machine needs all the help it can get.

Therefore, I don't really expect merging; modern CPUs should do a better job of copying data. This is copy_to_user(), not rocket science. "

ARM's Mark Rutland also agreed with Deacon and agreed to cancel Thunder-X1's targeted optimization. Kernel developer MarcZyngier agrees and is already working on a patch to remove Thunder-X1-specific code.

In order to keep the code maintainable and avoid overly complex ARM64Linux kernel code, they no longer pursue CPU/microarchitecture specific optimizations. It will be worth watching whether any future ARMLinux-focused distributions will carry such patches themselves, or whether any AmpereOne-optimized Linux distributions will move forward. Especially considering Ampere's focus on high-performance and energy-efficient ARMLinux servers, it's likely they don't want to leave any traces of optimization, especially considering they aim to compete with AMDEPYC and IntelXeon servers.