Researchers working for Apple and those from Columbia University quietly launched open source multimodal LLM in October, a research version called "Ferret" that can be queried using image regions. Ferret was released on Github in October largely unnoticed and without any public release or promotional action. Ferret's code was released with Ferret-Bench on October 30, and a checkpoint version was launched on December 14.
Although it didn't receive much attention at first, Saturday's release became a big deal for artificial intelligence researchers, VentureBeat reported. Bart DeWitte, who runs an AI-based medicine nonprofit, posted about the "missed" version on X, calling it "proof of Apple's commitment to impactful AI research."
Ferret's open source release is under a non-commercial license, so it cannot be commercialized in its current state. However, there's always a chance it could be used in a future Apple product or service in some way.
Apple AI/ML research scientist Gan Zhe explained Ferret's purpose in an October tweet as a system that can "reference and position anything, anywhere, at any granularity" in an image. It can also do this by using any shaped area in the image.
Simply put, the model analyzes the area drawn on the image, determines the elements within it that are useful to the user's query, and identifies them, drawing a bounding box around the detected elements. It can then use the identified elements as part of the query and respond in a typical manner.
For example, by highlighting an animal in an image and asking the LLM what animal it is, the LLM can determine the species of the animal and determine whether the user is referring to an animal in the group. It can then provide further responses using the context of other items detected in the image.
The announcement is important to researchers because it signals that Apple wants to be more open about its AI work, rather than taking the mysterious stance it has in the past.
Infrastructure is also an issue for Apple, because while it is working to increase the number of AI servers it has, it may not yet have enough scale to compete with the likes of ChatGPT. While Apple could partner with other companies to expand its capabilities, the other path is to do what it just did and release an open source model.
An interesting observation can be found in the information posted on Github. Reddit's r/Apple found that Ferret was "trained on 8 A100 GPUs and 80GB of memory." Given Apple's history of supporting NVIDIA GPUs, this is considered a rare endorsement for the GPU manufacturer.