Apple has calmly observed the excitement of generative AI over the past year, and has been building up its ecological foundation. It is also moving step by step towards the release of the AiPhone next year. In October, Apple's machine learning research team launched a "Ferret" model (Ferret). This large multi-modal model understands space more accurately than GPT-4v. Recently this model has been open sourced. Simply put, if you point out any object anywhere in a picture, no matter how small it is, "Ferret" can explain it clearly.


This sensitivity to space plays an important role in Apple's upcoming VisionPro, which combines natural/virtual visual perception with large language models. The authors of this unassuming paper are all Chinese. It has attracted more and more attention in the industry recently. It has reached SOTA level in spatial computing.


"Ferret" model architecture (Source: FERRET: REFERANDGROUNDANYTHINGANYWHEREATANYGRANULARITY)

Apple recently launched MLX, an open source framework that can train and deploy large models on its M3 chip. This means Mac laptop developers can develop applications that support large models.

Apple also launched "Lightning Large Model" (LLMinaFlash), which uses flash memory to solve the problem of insufficient large model memory (DRAM) deployed on mobile phones.

This is a paper that I personally like very much. Yes, Apple, who was working secretly, also began to publish papers silently.

This paper talks about a small but critical issue: how to deploy a large model in a very limited memory space such as a mobile phone, and the inference speed should be fast without using up battery.

Its biggest advantage is not to talk about algorithms based on algorithms, but to propose its own methods based on a deep understanding of hardware and combined with algorithms. Its starting point is completely consumer thinking. The large model is in the distant cloud. It is scary big and expensive. Only by putting it in the user's pocket can consumers have a sense of generative AI.

DRAM is too small to hold a multi-billion model. Although it can be put into flash memory, the bandwidth of flash memory is not enough. In order to minimize the load on the bandwidth from flash memory to DRAM, Apple has innovated two technologies, windowing and row-column bundling (see the paper for details).

Tested on two models, OPT6.7B and FALCON7B, the results are quite "explosive". The size of the model that can be run is twice that of the available DRAM. Compared with the simple loading method in the CPU and GPU, the inference speed is increased by 4~5 times and 20~25 times respectively.

Aligning the chip, operating system and system design and finding a way to build a reasoning cost model is something only Apple can do.

Recently, the performance of a series of small models with billions of parameters has shocked the world. For example, Mistral and Phi-2 can challenge large models with tens of billions of parameters; Google has launched a Nano-level Gemini model with 1.8 billion parameters, which can be directly installed on Pixel phones. Qualcomm's Snapdragon chips can run tens of billions of parameter-level models. Samsung may launch a Galaxy S24 phone with generative AI gaming capabilities in early 2024.

There is also the inference engine PowerInfer launched by the Shanghai Jiao Tong University team, which has cleared one obstacle after another for the deployment of large models on device terminals, especially mobile phones and laptops. It also indicates that in 2024, consumer electronics products will usher in a wave of large model loading climax.

The launch of these "small" models has a common feature, which is the use of high-quality data, "textbook-level" data. Undoubtedly, the archives of mainstream news media can provide good language training materials.

Apple is negotiating with major media organizations in the United States to obtain the archive corpus of these media for about US$50 million to train its own large models, which may use Siri's conversation service.

Think about the quality of these training corpora, Conde Nast’s Vogue, New Yorker, NBCNews, and IAC’s People, TheDailyBeast, BetterHomes, Gardens, etc. It includes a wealth of beautiful text and images featuring fashion and lifestyle content. Also includes selected news articles, images and videos.

But other mainstream media showed little interest. In the past, the cooperation between print media and social media did not bring them much benefit. In addition, the media handed over the news in its archives to Apple to train the large model. The legal disputes that may arise during the use process also caused these media to worry.

Apple's approach is considered more authentic. Other AI companies or technology giants first used other people's corpus and then negotiated licensing after being caught, which has triggered some lawsuits.

Apple is reluctant to scrape data directly from the Internet because of its emphasis on privacy. Apple is also not allowed to collect data on its customers.

So, what will Apple do in 2024?

The first thing everyone thinks of is that it will launch Siri, a language assistant powered by generative AI, next year. It will be the biggest highlight of iPhone16 and iOS18 in the fall of 2024. Some Apple fans joked that they would launch SiriGPT. But these are just the tip of the iceberg.

The approach of giants is usually to find new technologies to strengthen their inherent core capabilities and create differentiation in their original products. On the surface, it seems that Apple is resting on its laurels, moving slowly on generative AI and adopting a conservative follow-up strategy? Or can Apple really use its integrated integration capabilities and product design capabilities in chips, operating systems, and large models to create the best AI product experience? The answer is probably the latter.

In the past few years, Apple has acquired dozens of AI start-ups to use in its products, services and ecosystem. It’s just that there are no star-level startups here.

Apple only does things but doesn’t talk about it. Analysts have long noticed that Apple’s investment in AI is no less than Microsoft’s: it has spent tens of billions of dollars to build infrastructure for generative AI application development. It was earlier rumored that Apple was secretly training its own large model Ajax internally, or AppleGPT, and it was said that it could catch up with GPT-3.5 at that time.

But a series of recent Apple research results in machine learning show that Apple is going its own way. Closed-source large model companies represented by Microsoft, Google, Amazon, NVIDIA, and OpenAI have harvested the first crops around models, cloud, and computing power.

What Apple values ​​​​is the huge ecosystem formed by its 2 billion devices and their users. The new iPhone experience brought by generative AI and the use of large models to support all applications bring consumer market opportunities to Apple that other giants are difficult to take away. In the name of privacy protection, Apple controls mutual access between third-party applications, which also makes the Apple Store a gold mine for its future AI applications.

Is Apple slow? The hallucination issues of generative AI, as well as regulation, privacy protection, copyright disputes, etc., will make Apple think deeper and consider it more thoughtfully. Apple has such confidence because its integrated design at all levels, including chips, operating systems, applications, products, and manufacturing, will ultimately bring an innovative depth of product experience that may still be difficult for its competitors to match.

The "iPhone moment" triggered by OpenAI has been screamed several times, but the next Apple will still be Apple.

Reference papers:

https://arxiv.org/pdf/2312.11514.pdf

https://arxiv.org/pdf/2310.07704.pdf

Author/Zhou Jiangong

Related articles:

Apple "Ferret" - a new open source machine learning model