After 23 years of generative AI, what new breakthroughs will AI have in 24? The boss predicted that even if GPT-5 is released, LLM is still limited in nature, and basic AGI will not be enough to achieve in 24 years.


2023 is the well-deserved ‘Year of Generative AI’. What breakthroughs will AI technology make in 24 years?

NVIDIA senior scientist Jim Fan said that 2024 will be the year of video. Although robots and embodied agents are just getting started, I think video AI will have its breakthrough moment in the next 12 months.


OpenAI co-creator Greg Brockman predicts that 2024 will be a breakthrough year in terms of AI’s capabilities, security, and potential impact.

Of course, in the longer term, this is just another exponential year that leaves everyone better off than they are today.


In the new year, will artificial intelligence still shine as brilliantly as it did in 2023?

AI Big Boss 2024 Predictions

Meta researcher Martin Signoux made 8 major predictions for AI in 2024, and even LeCun deeply agreed.


First of all, artificial intelligence smart glasses have become popular. With the rise of multi-modal technology, leading AI companies will redouble their efforts to develop AI wearable devices. What could be more suitable for hosting an artificial intelligence assistant than the shape of glasses?



ChatGPT is to artificial intelligence assistants what Google is to search. In 2023, ChatGPT began to shine, with Bard, Claude, Llama, Mistral and thousands of derivative products coming out one after another.

As productization continues to advance, ChatGPT will no longer be the only reference standard in this field, and its valuation will also face revisions.



Goodbye big model models, hello multimodal models. LMMs will continue to emerge and replace LLMs in the debate about multimodal assessment, multimodal security, multimodal this, multimodal that. Furthermore, LMM is a stepping stone towards a truly general artificial intelligence assistant.



There are no major breakthroughs, but there are improvements in every aspect. The new model will not bring a real breakthrough (GPT-5), LLM is still limited in nature and prone to hallucinations. We won't see any leaps forward to making them reliable enough to 'solve basic AGI' in 2024.

Improvements in RAG, data wrangling, better fine-tuning, quantification, etc. will make LLM powerful/useful enough in many use cases, driving adoption of various services across industries.


Small models (SLMs) are already emerging, but cost-efficiency and sustainability considerations will accelerate this trend. Quantification technology will also be greatly improved, thus driving a wave of device integration for consumer services.


The open source model defeated GPT-4, and the debate between open source and closed model gradually subsided. Looking back at the energy and progress of the open source community over the past 12 months, it's clear that open source models will soon close the performance gap.


Benchmarks remain a difficult problem. No set of benchmarks, rankings, or evaluation tools can be a one-stop shop for model evaluation. Instead, we will see a series of improvements (such as HELM) and new initiatives (such as GAIA), especially in multimodality.


The risks that exist do not generate much discussion compared to existing risks. While X risks dominate the headlines in 2023, public discussion will focus more on existing risks and controversies related to bias, fake news, user safety, election integrity, and more.


LightningAI founder William Falcon’s prediction for 2024 is:

-1B model performance will be better than 70B.

- Deployment of models on CPU is almost free as opposed to API service.

-Data quality will improve performance by 10x.

-A combination of open source models will beat the best private models.

- The compiler will make models (training and inference) faster by at least 80%.

-Legislation will support content creators, not model developers.


Jerry Liu, founder of the open source tool platform LlamaIndex, said,

-RAG will continue to be a big focus

-Every AI engineer still needs a strong software engineering foundation.

-Vector Database begins to develop SQL-like interfaces and supports multi-modality

-Multimodal models get more use in document processing (but first, computational cost/latency needs to be reduced)

-Full capabilities like GPT-4 become open source and faster/cheaper.

-If this were the case, the development of intelligent agents would flourish again.

- Hints are as important as before, but hint engineering will be less important


In 2023, ChatGPT will rank first in the world in terms of visits

Over the past year, AI has become ubiquitous and has even redefined entire industries.

WriterbuddyAI, an online content writing company, used SEMrush, a well-known tool in the SEO industry, to research 3,000+ AI tools by crawling AI tool data.

It was found that from September 2022 to August 2023, the top 50 AI tools generated an astonishing number of more than 24 billion visits, with an average monthly increase of 236.3 million times.

Among them, ChatGPT monopolized 14 billion traffic, accounting for 60% of the analyzed traffic.


Here are the key findings from the report:

-The AI ​​industry grew an average of 236.3 million visits per month. The 50 artificial intelligence tools analyzed experienced a growth rate of 10.7 times, with an average increase of 236.3 million visits per month.

-In the past 12 months, AI applications have received an average of 2 billion visits per month. Over the past six months, average monthly visits have surged to 3.3 billion.

-The visits to ChatGPT, CharacterAI and GoogleBard increased by 1.8 billion, 463.4 million and 68 million times respectively.


-The most visited AI chatbot: ChatGPT is in the absolute lead, accounting for 76.31% of the total visits in the AI ​​chatbot category. Followed by CharacterAI, ranking second with 19.86% of visits.

-Craiyon, Midjourney, and Quillbot faced the largest traffic declines.


-The United States contributed 5.5 billion visits, accounting for 22.62% of the total visits, while European countries contributed a total of 3.9 billion visits.

-The AI ​​chatbot tool is the most popular, with 19.1 billion visits.

-More than 63% of AI tool users access via mobile devices. There is a gender difference, 69.5% are male users and 30.5% are female users


In addition to ChatGPT, which is popular around the world, these 23-year-old technologies are also amazing

23 years have passed, and the key word of this year is undoubtedly ‘generative AI’.

The launch of ChatGPT at the end of 2022 and the release of GPT-4 in March 2023 allowed the world to see the widespread availability of large language models, making 2023 a year of text, audio and video generative AI.

In addition to this year’s ‘darling’ ChatGPT, the strengths of other companies should not be ignored, such as the company that released the first open source language model, and several new AI startups, including Mistral, which released Mixtral8x7B, the best open source language model currently available, at the end of the year.

In addition, there are the following impressive technologies.

Stamford Town and Doraemon

‘Stanford Town’, demonstrates impressive application to text and coding tasks.

The team created a sandbox environment inspired by The Sims, in which 25 AI agents, each with their own profession and personality, can interact independently.


The agents demonstrated believable personal and emergent social behaviors, including making plans and attending a Valentine's Day party. This work shows how LLM-based agents interact with each other and produces interesting results.

This idea has been adopted by other research and open source projects, such as Auto-GPT and BabyAGI, and OpenAI simplifies it greatly through AssistantAPI.


Basic models such as GPT-4 have also been used in robotics and have made some progress, such as Google's robots RT-2 and RoboCat.


RT-2 is an AI model for robot control that can learn from robot and network data. The model can process text and image inputs and leverage its extensive network knowledge to perform tasks for which it has not been explicitly trained.

In more than 6,000 robot tests, RT-2 was nearly twice as successful as its predecessor in untrained tasks.

RoboCat, on the other hand, is an AI that generates training data to improve robot control.

Technologies from other companies, such as NVIDIA's multimodal VIMA model, also use the underlying model in robotics.


DreamerV3 and FunSearch

In the field of reinforcement learning, researchers have also achieved many important results.

An example is DreamerV3, which handles a completely different problem without any adjustments.

DreamerV3 learns how to mine diamonds in Minecraft without a human model.


Earlier this year, DeepMind also demonstrated AdA (AdaptiveAgent), a basic reinforcement learning model of DeepMind.

AdA follows the classic recipe of the base model and is trained on tasks with large amounts of data. AdA is significant because it shows that scaling in reinforcement learning can make models perform better on other tasks.


Deep learning is playing an increasing role in various scientific fields.

DeepMind has developed AlphaTensor, a new algorithm for fast matrix multiplication.

At the same time, the latest version of DeepMind's AlphaFold protein structure prediction system overcomes many weaknesses of the previous version and opens up new possibilities for computational structure prediction.


Additionally, Google DeepMind demonstrated FunSearch, the first use of code generation language models combined with evolutionary search algorithms to find previously unknown solutions to mathematical problems.


OthelloGPT, Q-Star and the AI ​​Bill

2023 is also the year of AI regulation and the year of warnings about the existential risks of AI.

This trend will undoubtedly stimulate industry research so that humans can better understand the inner workings of LLM.

There were some interesting papers during this period, such as OthelloGPT, Microsoft's GPT-4 embodying the AGI spark, and Google's paper on the "epiphany" of large models.


Tip Engineering fields provide insights into LLM.

François Chollet explains prompt engineering as finding the right vector program and promptbreeder, suggesting that prompting may become more automated in the future.


At the end of the year, rumors about Q-Star spread, along with people's fear of AI, the hype of AGI, and the farce of the OpenAI palace battle that reversed many times in just a few days.


In 2024, perhaps we will see less speculation and more negotiations.

To what extent is the data used for AI training reasonable? The recent lawsuit filed by the New York Times against OpenAI has aroused widespread discussion throughout society.


Similar debates will also be staged in the EU. Before the end of this year, EU countries will reach an agreement on the EU Artificial Intelligence Act. The details of this bill will be decided next year and will have a significant impact on the European artificial intelligence market.

2024AI Outlook

After experiencing an explosive year in 2023, what progress will be made in the field of artificial intelligence in 2024?

Needless to say, in this new year, we will still see leading AI applied in many new and creative ways, driving progress throughout the industry.

CopilotAI takes the stage: The era of intelligent agents is coming

OpenAI released GPTs, Assistants and other tools at the first developer conference, Microsoft products were renamed Copilot, etc., and intelligent agents ushered in a big explosion this year.

These tools are already starting to make an impact in industry after industry, but what we’ve seen so far is nothing compared to what’s to come.

Earlier this year, the ReAct paper published by teams from Princeton and Google showed how large models can effectively learn how to use tools and promoted a lot of research in this area.

Companies including OpenAI and Anthropic have spent a year tweaking their models to make better use of the technology.

For example, OpenAI’s function calls, and Anthropic’s ClaudeXML support.


There are also some research institutions that specifically train specialized large models, such as Berkeley's GorillaLLM.

In addition, open source code libraries Langchain, Rivet, etc. make it much easier to create intelligent agents.

As you can see, AI agents are easier and cheaper to develop than ever before. They harness human ingenuity while deeply connecting to the data that matters most to users and companies.

In 2024, we will see the arrival of the ‘Age of Agents’, the beginning of a new direction in meeting needs and interacting with technology through software.

Multimodal large models break through visual barriers

ChatGPT's ability to understand and express natural human language is a breakthrough feature that attracts users and developers.

However, 2024 will see AI vision potentially becoming even more important and far-reaching.

Words are powerful, but images, videos, and audio can convey information and emotion in a more focused way. Spatial representation of ideas is a very powerful tool for communicating complex concepts simply.

LLM can not only train text data, but also train visual data, and its multi-modal capabilities are more obvious.

We have already seen the development of wearable devices such as AiPin and AppleVision, which are expected to help our daily lives.

For example, they can provide contextual information about the person you are communicating with, visual cues related to the job, or real-time suggestions for completing tasks.


Where will innovation go? How fast? It’s hard to say yet, but being able to interpret images and videos and react instantly to physical changes in the environment adds an extremely important dimension to how intelligent AI can only help humans.

AI manipulation reaches dangerous levels

While the outbreak of AI has brought about earth-shaking changes in various fields, it has also allowed us to see that false information generated by AI has caused troubles to our lives.

Never in human history has the ability to influence and manipulate AI at scale been so powerful, nor so pervasive.

Artificial intelligence has made it almost impossible to discern ‘real’ social interactions and content, as images and even videos can be easily generated.

The coming year is likely to see a surge in AI manipulation, from automated blackmail and fraud to the spread of conspiracy theories.

All in all, AI will bring many incredible things to the world in 2024, but it will also challenge us in new ways.


Predictions from friends

My thoughts on this topic are also on the Zhihu hot list.


Zhi friend "Leader Xiaobai" predicts that in 2024, the model effect will further break through, and it may only need 7B model inference resources to be on par with the current GPT-4.

As deployment costs drop significantly, 2024 may be the first year of AIAgent and a hit.

It is possible that a multi-modal model can emerge that leads to a unified world.

The first AI movie is also expected to appear in 24 years.


A PhD student in the Department of Automation at Tsinghua University believes that ‘further breakthroughs have been made in multi-modal large models, and image and video generation capabilities have been further improved. More human jobs, especially those that require some creativity, are being replaced. The emerging capabilities of large models in some fields have been further highlighted, showing some more creative behaviors. ’


AI architect ‘Chunyang CYang’ estimates that 2024 should be the first year for the implementation of large-scale AI model applications.

Throughout 2023, although large models are popular, there are still very few products that can actually be implemented, and they only focus on shallow applications such as rewriting copywriting.

But now, there are many creative products in the field of large models being launched, and we can look forward to a wave of them.


Programmer @小五哥 predicted——

Large language models will calculate and reason on mobile phones; Agents will do more practical things on behalf of people; the most exciting thing is that humanoid robots are likely to help us wash, mop the floor, cook, and clean the room!