Actual test of Apple M3 Ultra running DeepSeek R1 full version: faster than 8-card A100

Recently, the UP host of Bilibili "although but Zhang Heihei" shared a video,It shows the test results of the full-blooded version of Apple M3Ultra running the 671 billion parameter DeepSeekR1 model. Its speed is even faster than eight A100 graphics cards, but the cost is much lower.

Running the 671 billion-parameter DeepSeekR1 model usually requires a professional-grade server equipped with 6-8 A100s. The total price easily exceeds one million yuan, which is almost impossible for ordinary users to afford.

However, the full-blooded version of M3Ultra only requires a MacStudio to achieve similar performance, and is extremely cost-effective.

The test results show that when running the DeepSeekR1 model, the performance of eight A100 graphics cards is 16.41Tokens/s, while the full-blood version of M3Ultra reaches 15.78Tokens/s in GGUF format.

After switching to the MLX format that can take advantage of unified memory, the speed increased to 19.17Tokens/s, surpassing 8 A100 graphics cards. In addition, when M3Ultra ran the DeepSeekV3671 billion parameter model, the speed also reached 19.66Tokens/s.

However, this does not mean that M3Ultra can surpass A100 in all scenarios. When a single user performs single model inference, it mainly relies on memory bandwidth and capacity, and cannot fully realize the potential of A100. In multi-user inference and large model training scenarios, M3Ultra is completely unable to compare with A100.

In addition, M3Ultra performed well in the large language model inference speed test,Whether it is the Llama3.170B, Gemma227B or Qwen2.514B model, its speed is significantly better than other M series chips. Compared with M2Ultra, the speed is increased by 13%, 34% and 18% respectively.

The full-blooded version of M3Ultra tested this time is equipped with 512GB of unified memory, and the total price is 74,249 yuan. For most users, if they do not need to run such a large-scale model, the unified memory capacity can be appropriately reduced to save costs.