Arm’s new Cortex-A320 is its first ultra-efficient CPU using the advanced Armv9 architecture, designed to meet the needs of IoT and artificial intelligence applications. Compared with the Cortex-A520, this processor achieves more than 50% efficiency improvement through multiple microarchitectural optimizations, as well as narrow fetching and decoding data paths, a dense L1 cache bank, and a port-reduced integer register file.
Cortex-A320 also improves scalar performance by 30% over the previous generation Cortex-A35 through efficient branch predictors, prefetchers and memory system improvements.
The Cortex-A320 is a single-channel, out-of-order CPU with 32-bit instruction fetches and an 8-stage pipeline. The processor supports single-core to quad-core configurations and is scalable. It features the DSU-120T, a simplified DynamIQ Shared Unit (DSU) that enables Cortex-A320-only clustering.
Cortex-A320 supports 64KB of L1 cache and up to 512KB of L2 cache, and provides a 256-bit AMBA5AXI interface to connect to external memory. The L2 cache and L2 TLB can be shared between Cortex-A320CPUs. The vector processing unit implements NEON and SVE2SIMD (Single Instruction, Multiple Data) technology and can be used individually in a single-core composite processor or shared between different cores in a dual-core or quad-core implementation.
Cortex-A320 targets not only the IoT market but also the artificial intelligence sector and does so by integrating Armv9 architecture improvements into NEON and SVE2 vector processing. The processor's ML processing capabilities are ten times higher compared to the Cortex-A35, and the overall ML performance is six times higher than the widely used Cortex-A53. ArmCortex-A320 supports new data types such as BF16 and enhanced dot multiplication and matrix multiplication instructions, making it the most efficient Cortex-ACPU for ML applications.