DeepSeekV3 and R1 series open source AI large models have demonstrated excellent performance in multi-language understanding and complex reasoning tasks. They not only promote the popularization and development of AI technology, but also make great contributions to the open source community. At present, various technology giants have begun to support and deploy DeepSeek, and domestic hardware is also accelerating support.
As a domestic full-featured GPU innovation company, Moore Thread has quickly implemented efficient deployment of DeepSeek distillation model inference services, allowing more developers to innovate AI applications based on Moore Thread's full-featured GPU.
One-click experience address:
https://playground.mthreads.com
also,Users can also perform inference deployment of the DeepSeek-R1 distillation model based on Moore thread MTTS80 and MTTS4000 graphics cards.
In fact, as early as January 28, there was already a site B UP owner manually completing the practice on Moore Thread MTTS80:
https://www.bilibili.com/video/BV18YfQYEEs2
Through the distillation model provided by DeepSeek, the capabilities of large-scale models can be migrated to smaller and more efficient versions to achieve high-performance inference on domestic GPUs.
Moore Thread is based on a self-developed full-featured GPU and quickly implements the deployment of inference services for the DeepSeek distillation model through open source and self-developed dual-engine solutions.
Open source framework adaptation:
Based on the Ollama open source framework, Moore Thread completed the deployment of the DeepSeek-R1-Distill-Qwen-7B distillation model and demonstrated excellent performance in a variety of Chinese tasks, verifying the versatility and CUDA compatibility of Moore Thread's self-developed full-featured GPU.
Self-developed engine acceleration:
Through the high-performance inference engine independently developed by Moore Threads, combined with software and hardware collaborative optimization technology, and customized operator acceleration and memory management, the computing efficiency and resource utilization of the model are significantly improved.
This engine not only supports the efficient operation of the DeepSeek distillation model, but also provides technical support for the deployment of more large-scale models in the future.
at last,Moore Thread is about to open its self-designed KUAE GPU intelligent computing cluster, which fully supports the distributed deployment of DeepSeekV3, R1 models, and new generation distillation models.
Kua'e cluster integrates advanced reasoning technology and distributed computing framework to ensure efficient and stable operation of large-scale models and help developers quickly implement business implementation.