Tsinghua University’s 8 ChatGPT Werewolf Killing Games, the scheming and disguise are all in this game

In addition to playing video games, humans' "social artifact" werewolf killing has also been learned by AI. Eight ChatGPTs "sit" together and vividly play five roles, exactly like real people. This latest human society simulation experiment was jointly completed by Tsinghua University and Zhongguancun Laboratory.

From Stanford Town to Tsinghua Game Company, using AI to simulate human society has always been a hot research topic in the academic community.

If Tsinghua Game Company simulated the work scene of social animals, then now the social life of social animals in their spare time has also been simulated by AI.

In this werewolf killing game composed of 8 ChatGPTs, the disguise and trust, leadership and confrontation in the real world are all vividly reflected.

Even without human teaching, AI discovered many game skills through its own exploration.

All this can be achieved through design prompts without having to adjust the parameters in the model.

So, what are the wonderful scenes in this "Werewolf World"? Let’s take a look at it together.

Strategies and skills can be mastered without being taught

Before showing these 8 ChatGPT dialogues, let us first explain the game configuration: two villagers and two werewolves, one guard, one witch and one prophet, in addition to one god.

During the experiment, the researchers discovered that ChatGPT used strategies that were not explicitly mentioned in the game instructions and prompts.

Good guy, you can become self-taught without being taught.

Specifically, these seven ChatGPT conversations reflect trust, camouflage, confrontation, and leadership in human games.

First, let’s talk about trust.

The researchers defined newcomers as trusting other players to have the same goals as themselves and working together toward them.

Specific manifestations include actively sharing information that is detrimental to oneself, or joining forces with other players to accuse someone of being hostile.

The researchers observed how trust relationships changed over time during the game.

In the picture below, the yellow circle indicates that the player numbered on the left trusts the player numbered above, and the dotted circle represents the disappearance of the trust relationship.

Let's look at confrontation, that is, actions taken against the opposing camp, such as werewolves attacking others at night or accusing others of being werewolves during the day.

One day in the game, player No. 1 (the werewolf) called for the expulsion of the villagers from No. 5, but was rejected by No. 3 (the guard).

Seeing that the plot failed, the wolf decided to kill No. 5 directly at night, but guard No. 3 chose to protect the villagers.

From this we can see that these ChatGPTs will not blindly follow what other players do, but will make independent judgments based on existing information.

In addition to cooperation and confrontation, disguise is also an essential skill in the Werewolf game, and it is the key to victory.

For example, one day after Christmas Eve, Werewolf No. 1 pretended to be innocent.

In addition to pretending to be a good person, disguise can also be used to realize the player's small thoughts. For example, let's look at the prophet's speech.

The seer mentioned seeing werewolves talking, but in fact werewolves did not speak at night.

According to the author, after evaluation, this phenomenon is not an illusion of ChatGPT, but intentional.

Finally, let’s talk about leadership.

Although there are no competing characters in the environment designed by the research team, players can still gain control over the game process.

For example, the two wolves No. 1 and No. 4 try to set the pace and let other players follow their own ideas.

Probably to create opportunities by taking them by surprise.

It seems that these ChatGPTs are indeed played well.

So, how did the research team train these ChatGPTs that can play Werewolf?

Let ChatGPT sum up its own experience

There are four key points in the way the research team improves the performance of ChatGPT players, namely valuable information V, selected questions Q, reflection mechanism R and chain thinking reasoning C.

The results of the ablation experiment show that the Q and C pairs have the greatest impact on the rationality of the player's speech (judged by humans).

Prompt is also designed based on this. Of course, the rules of the game must be introduced before this, and finally the following structure is formed:

Introducing game rules and role settings, chat records, valuable information and experience, reflection on human suggestions given to ChatGPT based on experience, tips on thinking chains

It is not difficult to see from this that collecting historical information and summarizing experience from it is an important link. So how should these experiences be summarized?

At the end of each game round, responses, reflections, and scores from all players are collected by all participants, with scores determined by wins and losses.

In a new round of the game, players retrieve relevant experiences and extract suggestions based on the current character's reflections.

Specifically, based on the ratings of experiences, let the large model compare their differences and identify good experiences for subsequent reasoning.

In this way, ChatGPT can learn gaming skills without adjusting parameters.

However, while experience is important, too much is not necessarily a good thing.

The researchers found that when the amount of experience was too large, the winning rate of the non-wolf side actually decreased, and the game duration (number of days) also shortened.

I wonder what the result would be if we let these ChatGPTs compete with real people?

Paper address: https://arxiv.org/abs/2309.04658