Apple reveals its AI model training strategy: from large-scale web scraping to secretly authorized transactions and synthetic content

The WWDC conference focused on Liquid design, the upcoming new visual design language for its operating system, and Apple also announced the next generation of AI foundation models that will support both the device and the cloud. Following the conference, the tech giant appears poised to better understand Apple's AI strategy through a detailed technical report that will give users and the tech community insight into the training and optimization process of its models. Apple emphasized in the report that it truly focuses on privacy and efficiency when training models.

Although it is not very important in the current field of artificial intelligence, Apple has released a detailed report on its basic model called "Apple Intelligence Basic Language Model - 2025 Technical Report", which provides an in-depth introduction to the key elements of the latest artificial intelligence model. This report covers almost everything from model architecture to training phase, post-training phase, and how to fine-tune the model. The report also explores methods used to ensure model technical improvements to increase model efficiency while avoiding privacy breaches.

While Apple has previously shared on-device AI models available to developers and the 3 billion parameters it has, the limitation is that its structure has been sparse so far. The model is reportedly split into multiple parts to improve efficiency. The first part is called Block 1 and contains over 60% of the core building blocks (called transformation layers). The AI then understands the main expressions of the language and generates responses.

The second part, called Block 2, is more lightweight due to the removal of two memory-intensive technical components: key and value projection. Thanks to this strategy, Apple was able to reduce the model’s memory footprint by about 38% and even speed up the model’s response time. The company has been looking into how to improve the performance of its AI models natively, and a few years ago they explored the idea of running a model that was larger than the device's memory capacity. While they didn't end up adopting the established solution, they have been looking for ways to deal with hardware limitations and other challenges.

Regarding the server side of the AI model, Apple ensures that its private cloud computing system adopts a customized architecture. This approach is called Parallel Orbit Mixed Experts (PT-MoE), and it’s a clever strategy that, simply put, breaks down large AI models into smaller parts called experts. Now, by partitioning the model into a mixture of experts, the model does not need to be fully run every time; instead, it can focus only on experts relevant to the current task. Only those parts of the model with domain expertise are activated, saving performance and increasing efficiency.

In addition, Apple has also designed a new Transformer architecture called "Parallel Track Transformer", which has multiple independently running tracks that work together only at key points. Because of this, this model does not experience system-wide latency. The tech giant has also addressed one of Apple Intelligence's biggest pain points: limited language support.

With the new model, Apple has significantly improved its multi-language capabilities. To expand language support, Apple has increased the proportion of non-English data in the training process from 8% to 30%, covering real content and AI-generated content, thereby improving the model’s understanding and supporting a wider range of languages. This will make features like writing tools work better. When training the new AI system, Apple relies heavily on web data collected by Applebot, its home-grown web crawler, which has also been used in previous models. Interestingly, since Apple respects privacy, if a website does not want to be crawled, its content will not be used.

The company uses a variety of techniques to train its models; primarily using public web data as training material. Apple tends to filter out irrelevant content and focus on useful and relevant data sets. Likewise, the tech giant relies on publishers for licensed content, although it did reveal the names of the media companies it relies on. The company also uses smaller models to collect synthetic data, especially when it comes to image language tasks, code or instruction execution, for better fine-tuning.

This multi-approach also involves visual data, as the giant has more than 10 billion image-caption pairs, including screenshots and handwritten notes. It also uses its own model to generate richer subtitles. All of these training methods help Apple build smarter, more powerful models. Apple’s approach to training its AI models is very clear. It’s a balancing strategy that ensures the system remains powerful and versatile without compromising its core value: privacy.