Today is the first day of Spring Festival travel. Has everyone got their tickets? How many migrant workers rush for tickets at regular intervals, constantly refresh the web page, and in the end still have to spend a lot of money to buy "ticket grabbing acceleration packages" to buy tickets? In the past two days, news about "12306 has applied for a patent to prevent automatic ticket grabbing" has become a hot search topic, attracting the attention of many netizens.

So, why can we really buy the tickets that are in short supply with the software? How can we prevent automatic ticket grabbing? Let’s talk today from a technical perspective.

What exactly happens when you buy tickets online?

The basic process of buying train tickets on 12306 is similar to buying things on e-commerce websites such as Taobao and JD.com. It can be roughly divided into several steps: login, query, selection, confirmation, and payment.

Login is a prerequisite for purchasing tickets. It will verify whether the user's identity is who he claims to be and involves the confirmation of personal information.

The principle is also very simple: the user enters his or her user name and login password, and the ticket purchase system queries its own user database to see if the user name and password are correct. If correct, the user's identity is considered credible.

At this stage, the risk is that the user's identity may be impersonated.Impersonators may try a large number of different passwords or use passwords leaked by other websites to impersonate the real user's identity.

The common solution is a two-pronged approach: when the user enters the wrong password several times in a row, the user is prohibited from logging in for a period of time; and after the user enters the password, a verification step is performed to allow the user to drag the puzzle or find a picture that meets the requirements in a series of pictures.

In the past, 12306 was criticized for being too difficult to verify - in the early days, the first verification rate of 12306 was only a pitiful 8%. Of course, after years of continuous improvements, this problem has been almost solved.

After confirming the user's identity, the next step is a smooth journey. Users first check the remaining tickets according to their place of departure and arrival, and then select the train they want; then select the passenger and seat information, and pay the ticket after confirming that the selection is correct.

This process is actually the same as the user handing their ID card to the staff at the train station ticket office and the staff voting on their behalf, except that it is completely self-service.

Of course there is no problem with this process when there are extra tickets. Everyone just buys tickets one by one on a first-come, first-served basis. But if there are not enough tickets left and everyone wants to buy tickets, then someone will inevitably want to use some technical means.

Why is it possible to grab tickets through technical means?

The technical means when purchasing tickets is to automatically grab tickets. The basic idea of ​​​​automating ticket grabbing is to let the computer simulate human behavior.

Personal use:If you develop an automated ticket grabbing program that you only use yourself, it is relatively simple. First log in with your personal information, manually pass the identity verification, and then continue to query the desired train number at a higher frequency. When the data returned by the query indicates that there are remaining tickets, place the order immediately.

The key at this point is to analyze the returned query results. And this is not difficult, after all, the query result is a string of text returned, and it is easier to extract information from it. It's like standing at the ticket office window and asking once every two minutes if there are tickets. Anyway, the server of 12306 is a machine and not a ticket office staff. As long as the computing power allows, you can answer all questions and you won't find it annoying to ask back and forth.

Multi-person use:And if you have to grab tickets for many people, it will be more troublesome. Helping users log in and pass verification is troublesome. After all, it is still difficult for computers to recognize graphics and pass the verification process like humans do. Verification codes were invented to make it difficult for computers to impersonate people.

But since it is a technical problem, there will often be a technical solution. With the development of computer vision technology, it is not difficult to break the graphical verification code, it just raises the technical threshold.

Therefore, the automated process of grabbing tickets is like a group of people blocking the ticket office window. After a few seconds, someone will come over and ask if there are tickets. They will not stop until they buy the tickets or the ticket sales time has passed.

Automated ticket grabbing will bring several consequences: it is unfair to those users who queued up to buy tickets; it wastes the computing resources of the 12306 server, which may reduce the ticket buying experience of 12306 users; and reduces the happiness of all users - users who did not buy tickets will naturally be unhappy, and users who paid extra money to grab tickets may not be happy either.

So of course, in order to prevent automated ticket grabbing, there should also be corresponding technical means.

How to prevent automated ticket grabbing?

There are several basic ideas to improve the technical difficulty of automated ticket grabbing.

1. Identify behaviors from automated ticket grabbing software and find those machine scalpers that automatically grab tickets.

Specifically, you can analyze the access status of the server to filter out those machines that frequently query ticket information in a short period of time and prohibit their access; in order to cope with this solution, automated ticket grabbing software often adopts the method of frequently changing IP addresses. Therefore, this kind of thinking can only be used as a basis.

2. Prevent automated ticket grabbing software from obtaining valid remaining ticket information.

As mentioned before, every time we initiate a remaining ticket query on 12306, a string of text will be returned to the user's browser, and the ticket grabbing software will analyze this string of text to obtain the remaining ticket information. If the returned text is not text, it will be much more troublesome for the ticket grabbing software to process. After all, computer vision is very different from human vision. What humans can see at a glance is not easy for computers to recognize.

In November 2021, the Institute of Electronic Computing Technology of the China Academy of Railway Sciences applied for a patent called "A method and system, equipment and storage medium to prevent automatic ticket grabbing", which adopts this idea. In this patent, researchers convert the query remaining ticket information into a scalable vector image (SVG, ScalableVectorGraph), and then send it back to the user's browser.

SVG is an interesting image format. It is an image, but uses text to describe information such as position, color, line width, etc. in the image; it uses relative points to save data, so it can be scaled to any size without distortion. These two features make it easy to draw programmatically and suitable for display on any size monitor.

These two features are very useful when it comes to automated ticket grabbing: the query returns images, and traditional automated ticket grabbing software cannot extract the text related to the ticket information, so naturally it cannot grab tickets. Users who buy tickets manually can identify the ticket information in these pictures, and can still just click on the train information they want to continue purchasing tickets.

In the patent mentioned above, a clever verification method is also proposed: using text combination to achieve behavioral verification based on text reasoning. This is to allow users to pass another behavioral verification before purchasing tickets.

Specifically, it looks like this: First, randomly select a few Chinese characters, convert them into SVG images, and then divide them into upper and lower parts. Then, show the upper part of these Chinese characters and the lower part of one of them. Finally, the user is asked to find the correct combination method. Only when the correct Chinese characters are formed can the verification pass.


Only Figure 5 is correct. Image source: The above patent specification

For automated ticket grabbing software to pass this verification, it needs to "recognize characters", which means it needs to have a font library and be able to compare it with the font library to find the correct combination method, which will undoubtedly increase the difficulty of automated ticket grabbing.

All in all, to prevent automated ticket grabbing, it is necessary to design some obstacles for the ticket purchasing system. These obstacles are not difficult for humans, but are still difficult for computers for the time being.

After all, demand is motivation, and computer capabilities will also increase. Technology will gradually upgrade, and there will be a continuous tug-of-war between automated ticket grabbing and anti-automated ticket grabbing.