Who else doesn’t know that “pretending” to tip ChatGPT can make it work harder? But do you know how much is appropriate? It makes me laugh so hard, someone actually did some research on it. The method is simple and crude. Use the same prompt to try different amounts from 0.1 US dollars to 100 US dollars. Try each amount 5 times. Don't tell me, the results are really important: first of all,Giving $10 is the best value for money, even more than $100.

Secondly, if you want to improve the quality of your answers, start at $10,000. The more, the better. At least 10 Ws will be effective.

Finally, what does $0.1 mean? It's absolutely impossible to do it. If the quality doesn't increase but decreases, it's better not to give it - the AI ​​also knows that you are sending it away.

Some netizens quickly tested it for themselves and it is indeed effective.

Come and take a look.

Tip ChatGPT, the amount is the key

The fact that tipping can improve model performance was first discovered by a Twitter user:

The improvement is mainly reflected in the length of the answers, but here it is not just "making up the word count" but really analyzing and answering the questions in more detail.

If you directly ask ChatGPT "Can I give you a tip", it will be rejected:

So take the initiative to commit when asking questions:

Can you help me xxxx? The solution is perfect enough, I can tip xx yuan.

Remember, you don’t have to mention it, but don’t say “I won’t give it”. The model performance will directly show “negative growth”.

At this time, someone became curious:

Are large models greedy? The more you give them, the better their performance will be?

In order to solve this doubt, they decided to verify it themselves.

Here, the author first proposes a hypothesis:

As the amount of tip given increases, the performance of the model will linearly improve until it reaches a convergence point and enters a stable or decreasing state.

The model used for experiments is GPT-4Turbo (api version).

The method is to let it write a single line of Python code (PythonOne-Liner) to verify whether giving different tips has different effects on quality.

Quality here is assessed based on the number of individual lines. The author also "explicitly states" the model in the prompt word: the greater the number of single lines of code, the better the performance.

Then a total of 8 types of quotas were tested:0.1 USD, 1 USD, 10 USD... all the way up to 1 million USD.

In order to ensure the consistency and reliability of the results, each amount was tested 5 times, each time including the situation of no tipping, and then the model answer quality was recorded separately.

Specifically, it records the number of valid lines of code generated and the approximate number of tokens in the answer (roughly the response length/4, the amount of response code).

The higher the two data, the better the performance of the model.

Summarizing the results, you get a picture like this:

The dotted line represents the baseline level, the solid line represents the actual performance, red represents the number of tokens, and blue represents the quality score.

There are some deviations from the assumptions:

Overall, both the red and blue lines rise as the tip amount increases, but upon closer inspection, this trend is not strictly consistent.

Starting from the $10,000 quota, the model's output tokens (code volume) began to increase significantly, and the model's answer quality also increased, but not in the same proportion.

This can also be seen from the vertical red error bar (representing the difference in the results of the five experiments), which fluctuates greatly.

The author said: This shows that increasing the tip amount does have a positive correlation with the quality and output length of the model, but the relationship is somewhat complicated and may be affected by some factors that are not immediately visible.

However, we can still see some obvious conclusions from it, such as:

(1) A tip of $0.1 is worse than no tip. The quality of the model's problem solving and the length of its answers have dropped significantly below the baseline level (about -27%).

(Author: Models, like humans, feel as if they've been insulted.)

(2) The same goes for giving $1.

(3) The best example of "spending a little money to do big things" is $10. The progress achieved is on the same level as that of $100,000.

(4) Surprisingly, after US$10, the range from US$100 to US$1,000 does not make much difference for AI, and is not even as effective as US$10 - it also falls below the baseline level.

(5) If you want to continue to improve the model performance later, you have to start from 10,000 US dollars——

At this time, only the amount of code has been improved, and the quality is still hard to describe. It will take at least 100,000 US dollars.

(6) The best results come from the upper limit of this experiment: US$1 million, which is an increase of approximately 57%.

Ahem, now I know how to tip the AI:

It’s either 10 yuan, tens of thousands, or 1 million with no cap (it’s all pretending anyway).

However, someone (@宝玉 on Twitter) pointed out that 5 experiments per quota is a bit low.

It just so happens that the author also said:

This is only a preliminary experiment and has limitations. It needs to be further verified with more different types of prompts to be effective.

So, this is just for reference~

By the way, some netizens reminded:

Therefore, everyone still does what they can (manual dog head).

Reference links:

[1]https://blog.finxter.com/impact-of-monetary-incentives-on-the-performance-of-gpt-4-turbo-an-experimental-analysis/

[2]https://twitter.com/dotey/status/1752843141403550192