When it comes to the technology behind the current AI craze, Alphabet's Google is actually the inventor, but the popularity of its products has lagged significantly. Google hopes to change that with the highly anticipated release of Gemini, the "largest and most powerful worker intelligence model" the company has built to date.
Since OpenAI's huge success last year with its conversational chatbot ChatGPT, a growing number of companies have been experimenting with generative AI, a technology that can automate tasks such as coding, summarizing reports or building marketing campaigns based on user requests. At a presentation ahead of the product's launch on December 6, Google emphasized that Gemini is the most flexible model it has ever built, as it comes in versions of different sizes, including one that can be executed directly on a smartphone. This sets it apart from other competitors.
This artificial intelligence model is a system used to support various generative artificial intelligence applications. It comes in three different versions: GeminiUltra, GeminiPro and GeminiNano. Eli Collins, vice president of product at Google's DeepMind division, said this diversity means Gemini "can execute on everything from mobile devices to large data centers."
“We’ve long wanted to build a new generation of artificial intelligence models inspired by the way people understand and interact with the world—one that feels more like a helpful collaborator than an intelligent piece of software,” Collins said in a phone interview. “Gemini brings us one step closer to that vision.”
Ahead of the model's release, the company tested Gemini on a series of standard industry benchmarks and said Gemini Pro outperformed OpenAI's GPT-3.5 in six out of eight tests. Google said Gemini outperformed GPT-4, the latest version of OpenAI's general-purpose model, in seven out of eight benchmarks for general language understanding, reasoning, mathematics and coding. Meanwhile, Google estimates that its latest generative AI product, AlphaCode2, which interprets and generates program code, outperforms 85% of its competitors in competitive programming. The company will release a technical report explaining Gemini’s model architecture, training process, and evaluation in more depth.
Starting December 6, Android developers who want to write Gemini-powered apps for smartphones and tablets will be able to register to use a "nano" version of this AI model that can execute directly on such devices. Google also said it will immediately enable Gemini on its flagship phone Pixel 8 Pro, which will support new generative AI features, such as summarizing key points from phone recordings. Next week, Google will make GeminiPro available to cloud customers through its VertexAI and AIStudio platforms.
Gemini Ultra, the largest version of Google's artificial intelligence model, will initially be available in an early access program for developers and enterprise companies, with details about the program to be announced next week. This version will be widely rolled out to the public early next year.
Gemini also integrates with a large number of Google's apps and services through Bard, the company's conversational chatbot and competitor to ChatGPT. Previously, Bard used Google's PaLM2 model, a large-scale language model the company announced at its annual developer conference in May.
Google has been under pressure over the past year both to reinvent its core search business and to contend with the rise of generative artificial intelligence programs. Although the company has long been regarded as a pioneer in artificial intelligence research, some have criticized its management for being slow to market AI products, especially after the success of products such as ChatGPT and the image generator Dall-E. Since OpenAI released GPT-4 in March, Google has been working to reaffirm its leadership in the field, including injecting the new technology into its mature search business.
Gemini is the company's answer to this market pressure. Google says the AI model is "naturally multimodal," meaning it's pre-trained from the start to handle text- and image-based prompts given by users. For example, in a video demonstration, Google showed that parents can help their children complete homework by uploading an image of a certain math problem and photos of steps to solve the problem on scratch paper.
In the demo video, Applebaum, a software engineer at Google, said: "Gemini can not only solve these questions, but it can also read the answers and understand which ones are right and which ones are wrong, and explain concepts that need further clarification." The company also said that its "search generative experience" - an experimental version of the search engine built by Google using its generative artificial intelligence technology - will be integrated into Gemini's new features next year.
Still, company representatives warned that Gemini is still prone to "hallucinations," or false or fabricated information produced by generative AI. Collins calls this phenomenon "an unsolved research question." The demo video the company showed to reporters was pre-recorded.
Collins said Gemini "has the most comprehensive security assessment of any AI model at Google." To assess Gemini's security, Google conducted adversarial testing of the AI model, which imitates a bad actor trying to exploit the program and gives prompts, he said. The test included "Real Toxicity Prompts," a test developed by the Allen Institute for Artificial Intelligence that contains more than 100,000 prompts pulled from the web to help AI researchers examine large language models for hate speech and political bias.
Google also stressed that the tool will be fast. Gemini uses a new underlying supercomputer architecture and newer processing chips, allowing it to perform faster than earlier, smaller models, the company said. Google is using a new version of its cloud chip, Cloud Tensor Processing Units (TPUs for short), an internally designed chip that can train existing models 2.8 times faster than its predecessor. Amin Wahdat, Google's vice president of machine learning, said the approach gives Google "a new look at future standard AI infrastructure." He added that the company will still use third-party AI chips to execute its Gemini model.
Gemini will be integrated into Bard, Google's generative AI chatbot launched in March, giving it access to the company's most popular services, including Gmail, Maps, Docs and YouTube. The rollout will take place in two distinct phases: Starting on December 6, Bard will be powered by GeminiPro, which will enable high-level reasoning, planning, understanding and other capabilities. It will be able to be executed in English in 170 countries and regions, but notably not in Europe or the UK, where the company said it is consulting with local regulators.
Early next year, the company plans to release BardAdvanced, which will be powered by a more powerful Gemini Ultra model. Google says it will soon launch a trusted beta program to improve BardAdvanced before its wider rollout to the public. Sissie Hsiao, Google's vice president of Bard products, said, "With the blessing of Gemini, Bard is undergoing its biggest and best upgrade to date, which will open up new ways for people to create, interact and collaborate."