Nature: DeepMind’s large model breaks through 60-year-old mathematical problems and the solution exceeds human cognition

Using large models to solve problems that have troubled mathematicians for more than 60 years, the latest results of Google DeepMind are published in Nature.Pushmeet Kohli, one of the authors and vice president of research at Google DeepMind, said:This scheme will not be in the training data, and it will not even be known to humans before.

This technology is called FunSearch, where Fun is the abbreviation of function.

Leverage large models to solve long-standing scientific challenges, generating new information that is verifiable and valuable* that did not exist before.

In the news interpretation accompanying the Nature paper, the person in charge of DeepMind stated that "the way we use large models is as a creativity engine."

This is the first time anyone has shown that systems based on large models can surpass the cognition of mathematicians and computer scientists.

Not only is it new, it's more effective than anything else that exists today.

In response to this achievement, some netizens lamented:

If this is true, it is the most important discovery of mankind since the fire.

So, what problems does FunSearch solve?

Find better solutions to NP-hard problems

DeepMind specifically demonstrates two types of problems, both of which are NP-hard problems.

According to the academic community, there is no and there may never be an algorithm that can find exact solutions to NP-hard problems in polynomial time under all circumstances.

Faced with such problems, researchers usually look for approximate solutions or efficient algorithms suitable for specific situations.

Specific to FunSearch, the first type of NP-hard problem it solves is the Capset problem, which is a type of upper limit set problem. Its description is as follows:

There are equidistant n points in each dimension of an n-dimensional space (n^n in total, for example, 3 dimensions are 3*3*3). Find as many points as possible to form a set. It is required that any 3 points in the set are not collinear. How many points are there at most in such a set?

If it seems a bit difficult to understand, you might as well learn about the predecessor of the Capset problem-a card game invented by geneticist Marsha Falco in the 1970s.

There are a total of 81 cards in this card game. Each card has 1 to 3 color patterns. The colors, shapes and shadows of the patterns in the same card are exactly the same.

This deck has a total of 3 colors, 3 shapes and 3 shades. Including the number of patterns, there are a total of 3*3*3*3=81 cards. Players need to turn over some cards to find the special combination of 3 cards.

If the specific way of this "special combination" is expressed in discrete geometric form, we get the Capset problem.

The Capset problem was also born in the 1970s, proposed by Oxford University mathematician Ron Graham, but the first important results did not appear until the 1990s.

In 2007, Terence Tao mentioned in a blog post that this was his favorite open-ended mathematics problem.

Before the emergence of FunSearch, the most significant breakthrough in the Capset problem was proposed by American mathematician Jordan Ellenberg and Dutch mathematician Dion Gijswijt in 2016.

Through the polynomial method, Ellenberg and Gijswijt reduced the supremum bound of the solution to this type of problem to 2.756^n when n>6 (the maximum set can be found accurately when n≤6).

Also when n>6, the newer number of the lower bound is 2.218^n, proposed by Fred Tyrrell, a PhD student at the University of Bristol in 2022.

But this lower bound only exists in theory - when n=8, there are only 496 points in the largest set that humans can construct, and according to Tyrrell's conclusion, the number of points should be no less than 585.7.

FunSearch expanded the set size to 512 points - although there is still a gap with the theoretical value, it is still regarded as the most significant breakthrough on this issue in 20 years.

At the same time, the lower bound of the Capset set size was also raised to 2.2202^n by FunSearch.

The second category is online boxing problems:

Suppose there is a set of standard containers with a capacity C and a sequence of n items (the size of the items does not exceed C). These items arrive in a certain order.

"Online" means that the operator cannot see all the items in advance, but must immediately decide which container to load the items into when they arrive.

The ultimate goal is to keep the number of containers used as small as possible.

The online binning problem has attracted extensive research since the 1970s, and the earliest can be traced back to the layout problem studied by Gauss in 1831.

After nearly 200 years of research, there is still no mature theory and effective numerical calculation method.

Traditionally commonly used greedy algorithms include FirstFit and BestFit:

FirstFit means placing each item into the first box that can accommodate it. BestFit puts each item into the box that can accommodate it and has the smallest remaining space in the box.

FunSearch proposed a new algorithm, which significantly reduced the number of containers used in both OR and Weibull test data sets.

Especially when the number of items in the test set reaches 100,000, the solution found by FunSearch consumes only 0.03% more containers than the theoretical lower bound.

(The data in the table below represent the difference from the theoretical lower bound. The smaller the number, the better the performance)

So, how is FunSearch implemented?

Search for "program" instead of "answers"

On the whole, FunSearch's workflow is an iterative process, and the core is to search for programs that can solve the problem, rather than the answer to the question itself.

Search is exactly what DeepMind has been exploring since AlphaGo.

Co-founder Shane Legg once explained in an interview:

Where did the key "Step 37" in AlphaGo's defeat of Lee Sedol come from? Not from human play data, but from a search of probability space.

Current large models just imitate and mix different training data. To generate real creativity and surpass the current architecture, it needs to be combined with search.

Returning to the latest achievement FunSearch, there is a program library in the system. At each iteration, the system will search for the initial program and input the large model (PaLM2 is used for the experiment, other codes are also compatible as long as they support it).

The large model is built on this basis to generate new programs and handed over to the automatic evaluation system. The program with the highest score will be added to the program library, thereby realizing a self-circulation.

Among them, the evaluation system generates test cases based on user questions and then determines whether the output of the candidate program is correct.

Depending on the complexity, methods of determining correctness include directly inspecting the output value or calling related functions.

At the same time, the evaluation system is also equipped with fault-tolerant logic to avoid timeout and other problems from affecting the overall process.

Finally, the system will give an overall score based on the behavior of the candidate program on these test cases, providing a basis for result generation and subsequent program library updates.

Jordan Ellenberg of the University of Wisconsin-Madison, co-author of the paper, believes that an important feature of FunSearch is that people can see the successful solutions generated by AI and learn from them, which is completely different from the previous black box model of AI.

The most exciting thing for me is building new models of human-machine collaboration, which I hope to use not as a replacement for human mathematicians, but as a force multiplier.