Along with the Gemini generative artificial intelligence model, Google this morning released AlphaCode2, an improved version of the code generation AlphaCode launched by Google DeepMind Labs about a year ago.AlphaCode2 is actually powered by the Gemini model, or at least a variant of it (GeminiPro) fine-tuned based on programming competition data. Google says that in at least one benchmark test, AlphaCode2 far outperforms its predecessor.

According to Google, AlphaCode2 (coded in languages ​​​​such as Python, Java, C++ and Go) outperformed about 85% of competitors on average in a subset of programming competitions hosted by Codeforces, a programming competition platform. In comparison, the average score of the previous generation AlphaCode on the same subset was only 50%.

"We selected 12 recent competitions with more than 8,000 participants, either from the second group or the more difficult '1+2' group. This gave us a total of 77 problems to solve," AlphaCode2's technical white paper reads. "AlphaCode2 can solve 43% of the problems within 10 attempts, which is nearly twice as fast as the original AlphaCode (25%).

AlphaCode2 can understand difficult programming problems involving "complex" mathematics and computer science theory. DeepMind research scientist Rémi Leblond explained in a pre-recorded video that AlphaCode2 is capable of dynamic programming, among other rather complex techniques.

AlphaCode2 not only knows when to implement this strategy correctly, but also where to use it. AlphaCode2 not only knows when to implement this strategy correctly, but also under what circumstances to use it, Leblond said. This is worth noting given that programming problems requiring dynamic programming were a major stumbling block for the original AlphaCode.

"[AlphaCode2] needs to show a certain level of understanding, a certain level of reasoning and the design of a code solution before it can actually execute, solve [a] coding problem. It can do all of that on problems it's never seen before," Leblond said.

AlphaCode2 solves problems by first leveraging a family of "strategy models" to generate a large number of code samples for each problem. Code samples that do not fit the problem description are filtered out, while a clustering algorithm groups "semantically similar code samples" to avoid any redundancy. Finally, the scoring model in AlphaCode2 selects the best candidate from each of the 10 largest "clusters" of code samples. This is AlphaCode2's answer to the question.

Now, all artificial intelligence models have flaws, and AlphaCode2 is no exception. According to the white paper, AlphaCode2 requires a lot of trial and error, is too expensive to operate at scale, and relies heavily on being able to filter out obviously bad code samples. The white paper speculates that migrating to a more powerful version of Gemini, such as GeminiUltra, may alleviate some of the issues.

Eli Collins, vice president of product at DeepMind, hinted at this possibility during a briefing.

"What excites me most about the latest results is that when programmers collaborate with [AlphaCode2 powered by Gemini], [the model's] performance gets better by defining certain properties that the code adheres to," Collins said. "In the future, we will see programmers leveraging highly capable AI models as collaborative tools to assist with the entire software development process, from reasoning about problems to assisting with implementation."