Times have changed? The largest and most powerful Google model to date is here. On December 6, local time, Google CEO Sundar Pichai officially announced that Gemini version 1.0 was officially launched.
The Gemini large model released this time is a native multi-modal large model.
Now, Google's ChatGPT-like application Bard has been upgraded to the GeminiPro version, enabling more advanced reasoning, planning, understanding and other capabilities, while continuing to remain free. Google expects to launch "BardAdvanced" early next year, which will use GeminiUltra.
This is the biggest update since Bard came out.
Since the release of ChatGPT, we have been very curious about the capabilities of the competing Gemini model claimed by Google. This large model was rumored as early as March this year and entered the "coming soon" status at the I/O conference in May.
As people familiar with the matter continue to reveal new information, we can learn: Gemini is said to have trillions of parameters, and the computing power used for training is five times that of GPT-4. However, the official release of Gemini seems to have been repeatedly delayed due to various reasons.
In order to compete with OpenAI and Microsoft, Google decisively switched from PaLM2 to Gemini, and even directly merged Google Brain and DeepMind together in April this year. Gemini used the newly formed Google DeepMind to combine the strengths of the two laboratories to tackle key problems.
This shows Google’s all-or-nothing mentality in the large-scale model arms race.
So, can Gemini really surprise us? In addition to achieving the best results on various Benchmarks and even surpassing humans, what is interesting is that at the press conference, when faced with a reporter’s question about “What new capabilities does Gemini have compared to previous large models?” Eli Collins, vice president of product at Google DeepMind, replied: “I suspect it does”, indicating that Google is still working hard to understand the full capabilities of Gemini Ultra.
The following is a statement from Google CEO Pichai:
Every technological change is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the AI shift we are seeing now will be the most profound shift in our lifetimes, far greater than the previous shifts to mobile or the web. Artificial intelligence has the potential to create opportunities for people around the world, from the everyday to the extraordinary. It will usher in a new wave of innovation and economic progress and drive knowledge, learning, creativity and productivity at an unprecedented scale.
This excites me: the opportunity to make artificial intelligence helpful to everyone, everywhere.
We're nearly eight years into our journey as an AI-first company, and the pace of progress is only accelerating: Millions of people are now using generative AI in our products to do things they couldn't do a year ago, from finding answers to more complex problems to using new tools to collaborate and create. At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing using our AI tools.
This is incredible momentum, yet we’ve only begun to scratch the surface of what’s possible.
We are doing this work boldly and responsibly. This means being ambitious in our research, pursuing capabilities that can bring huge benefits to people and society, while building safeguards and working with governments and experts to address the risks of AI becoming more powerful. We will continue to invest in the best tools, foundational models, and infrastructure and bring them into our products and beyond, guided by our AI principles.
Google's large model Gemini is officially released
Google DeepMind CEO and co-founder Demis Hassabis officially launched the large model Gemini on behalf of the Gemini team.
Hassabis said that Google has been wanting to build a new generation of large AI models for a long time. In his view, what AI brings to people is no longer just intelligent software, but more useful and intuitive expert assistants or assistants.
Today, Google’s large model Gemini finally debuted, becoming the most powerful and versatile model it has ever built. Gemini is the result of large-scale collaboration between teams across Google, including researchers from Google Research.
Of particular note, Gemini is a multimodal large model, meaning it can generalize and seamlessly understand, manipulate, and combine different types of information, including text, code, audio, images, and video.
Google said that Gemini is also their most flexible model to date and can run efficiently on multiple types of platforms such as data centers and mobile devices. The SOTA capabilities provided by Gemini will significantly enhance the way developers and enterprise customers build and scale AI.
Currently, Gemini1.0 provides three different size versions, as follows:
GeminiUltra: The largest and most capable, used to handle highly complex tasks;
GeminiPro: the best model that scales on a variety of tasks;
GeminiNano: The most efficient model for on-device tasks.
Google rigorously tests Gemini models and evaluates their performance on a variety of tasks. From natural image, audio and video understanding to mathematical reasoning and other tasks, GeminiUltra has been used in 32 academic benchmark test sets widely used in large-scale language model development, and the performance of 30 of them exceeds the current SOTA results.
In addition, GeminiUltra scored as high as 90.0% in the MMLU (large-scale multi-task language understanding dataset), surpassing human experts for the first time. The MMLU data set contains 57 subjects including mathematics, physics, history, law, medicine and ethics, and is used to test the knowledge reserve and problem-solving ability of large models.
New methods for the MMLU test set allow Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, resulting in significant improvements in performance compared to just answering based on first impressions of the question.
Gemini outperforms GPT-4 in most benchmarks.
For more details, please view the detailed test report: https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf
In the latest version of the MMMU test set, GeminiUltra also achieved the best result of 59.4%. The enhanced test set consists of multimodal tasks that require deliberative reasoning.
In the image benchmark test, GeminiUltra did not need to extract text from the image to perform OCR processing, which highlighted Gemin's built-in powerful multi-modal capabilities and also initially showed the harbinger of Gemini's more complex reasoning capabilities.
Next generation all-round capability upgrade
Gemini is designed to support multi-modality natively, pre-trained on different modalities from the beginning, and then fine-tuned with additional multi-modal data to improve effectiveness. As a result, Gemini is able to seamlessly understand and reason about a variety of inputs, far better than existing multi-modal models, and its capabilities are among the strongest in almost every domain.
complex reasoning ability
Gemini1.0 has complex multi-modal reasoning capabilities that can help understand complex written and visual information. This makes it particularly good at discovering hard-to-discern knowledge in massive amounts of data. Gemini1.0 has the extraordinary ability to extract insights from hundreds of thousands of documents by reading, filtering and understanding information, which helps make new breakthroughs at ultra-fast speeds in many fields such as science and finance.
Simultaneously understand information in text, images, audio and more modalities
After training, Gemini1.0 can simultaneously recognize and understand text, images, audio, etc., so it can more fully understand the details of the information in the input and can also answer questions related to complex topics. As such, it is particularly good at reasoning about problems in complex subjects such as mathematics and physics.
As shown below, a teacher draws a physics problem of a skier coming down a slope, while a student proposes a solution to calculate the skier's speed at the bottom of the slope. Utilizing Gemini's multi-modal reasoning capabilities, the model can read messy handwriting, correctly understand problem formulations, convert both problems and solutions into mathematical formulas, identify the specific reasoning steps where students make mistakes when solving problems, and then provide the correct solution to the problem.
advanced coding
Gemini can understand, interpret and generate high-quality code in popular programming languages (such as Python, Java, C++, Go). Its powerful ability to work across languages and reason about complex information makes it one of the world's leading coding basic models.
GeminiUltra performs well on several coding benchmarks, including HumanEval, an important industry standard for evaluating performance on coding tasks, and Natural2Code, an internal Google dataset that uses author-generated source code rather than web-based information.
Gemini can also be used as an engine for more advanced encoding systems. Two years ago, Google launched AlphaCode, the first artificial intelligence code generation system to reach competitive levels in programming competitions.
Using a specialized version of Gemini, Google created AlphaCode2, a more advanced code generation system that excels at solving competitive programming problems that go beyond coding and involve complex mathematics and theoretical computer science.
Evaluated on the same platform as the original AlphaCode, AlphaCode2 showed a huge improvement, solving almost twice the number of problems.
Dedicated TPU training
Google trained Gemini 1.0 at scale on AI-optimized infrastructure using in-house designed Tensor Processing Units (TPU) v4 and v5e, designed to be the most reliable, scalable training model and most efficient serving model.
On the TPU, Gemini runs significantly faster than earlier, smaller, less capable models. These custom-designed AI accelerators are at the heart of Google's artificial intelligence products, which power billions of users in Search, YouTube, Gmail, Google Maps, Google Play and Android. They also help companies around the world cost-effectively train large-scale AI models.
Today, Google also released the most powerful, efficient, and scalable TPU system to date—CloudTPUv5p, which is designed for training cutting-edge artificial intelligence models. The new generation of TPU will accelerate the development of Gemini, help developers and enterprise customers train large-scale generative AI models faster, and allow new products and new features to meet customers faster.
A row of CloudTPUv5pAI accelerator supercomputers in Google data centers.
Google products will be upgraded across the board
Starting today, Google will add Gemini to its products. Bard, for example, will use a fine-tuned version of GeminiPro to perform more advanced reasoning, planning, understanding and other tasks. This is also Bard’s biggest upgrade since its launch.
The upgraded Bard will be available in English in more than 170 countries, and will expand to more modalities and support more languages in the near future.
Google is also bringing Gemini to Pixel. Pixel 8 Pro will be the first smartphone to run Gemini Nano.
Pixel 8 Pro uses Gemini Nano in the audio recorder app to summarize meeting audio even when there is no network connection.
In the next few months, Gemini will gradually appear in more Google products and services, including search, advertising, Chrome, DuetAI, and more.
Google said it has been experimenting with Gemini in search, making the search generation experience (SGE) faster for users, reducing latency by 40%, and improving quality.
User guide and future plans
Finally, how do developers use Gemini?
Starting December 13, developers and enterprise customers can access Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.
Starting with Pixel 8 Pro devices, Android developers can also build with Gemini Nano through AICore. AndroidAICore is a new system service in Android 14 that handles model management, runtime, security functions, etc., simplifying users' work of integrating AI into applications.
AICore implements low-rank adaptation (LoRA) fine-tuning through GeminiNano. This powerful concept enables application developers to create small LoRA adapters based on their own training data. The LoRA adapter is loaded by AICore, resulting in a large language model fine-tuned for the application's own use cases.
In addition, Google revealed that GeminiUltra will be released soon, as well as Bard's next upgrade plan.
The GeminiUltra model is currently undergoing a trust and safety check phase, including a red team of trusted external parties, and further refinement of the model using fine-tuning and reinforcement learning with human feedback (RLHF).
In this process, Google will first provide Gemini Ultra to some customers, developers, partners, and security and liability experts for early experimentation and feedback, and then launch it to developers and enterprise customers early next year.
GeminiUltra is Google's largest and most powerful model, designed for highly complex tasks. The first way ordinary users will experience GeminiUltra will be through BardAdvanced, which Google will launch early next year.
Google said it will work to expand Gemini's capabilities in the future, including advancements in planning and memory, as well as increasing contextual windows to process more information for better responses.
Blog link: https://blog.google/technology/ai/google-gemini-ai/#scalable-efficient