vx-underground, a research team that claims to have the world's largest malware source code collection, recently posted on the social platform X that the total amount of malware data it currently saves is about 30 TB (terabytes). Soon after, Bernardo Quintero, founder of the online multi-engine virus scanning service VirusTotal, said in a reply that the total number of malware samples submitted to VirusTotal by users over the years has reached approximately 31 petabytes (petabytes). According to commonly used conversions, one petabyte is approximately equal to one thousand terabytes, which means that the data scale of both has far exceeded the intuitive imagination of ordinary users.

Cybersecurity companies, artificial intelligence researchers, and threat intelligence agencies generally regard this type of malware sample library as key basic data for training detection models, studying the evolution path of attack techniques, and analyzing new threats. However, when these data, often measured in terabytes and petabytes, are abstracted into numbers, it is often difficult to find a reference in the physical world for how "big" they are. Therefore, someone raised a rather graphic question: If all this data is stored in a traditional hard drive and stacked one by one, how high can these "malware banks" be piled up? How does it compare with landmark buildings in the real world?

A TechCrunch reporter tried asking this question to an AI chatbot in the newsroom, but the answer he got was “so outrageous that it was so unbelievable” that he had to give up. So, the editorial team switched to the most direct method - taking out pen and paper and doing some "rough calculations on the back of a napkin" based on common sense. Considering that both vx-underground and VirusTotal describe their data volume in terms of "approximately TB/PB", the reporter also followed this "approximate" approach.

In order to facilitate readers to form an intuitive concept, the reporter assumes that a standard 1 TB 3.5-inch desktop mechanical hard drive is used - this type of hard drive is basically the same physical size in order to fit into a general chassis, with a height of about 1 inch (about 2.54 cm). Under this premise, you only need to pay attention to the dimension of "height" to simulate the effect of "raising the hard disk one by one". The article also ignores the difference between the nominal capacity and the available capacity of the hard disk in actual use, and directly calculates it based on the nominal 1 TB to simplify the deduction.

According to the results of an online storage unit conversion tool, vx-underground claims about 30 TB of malware data, which is roughly equivalent to the capacity of 30 1 TB hard drives. If these 30 hard drives were stacked from bottom to top, they would be about 30 inches tall, or about 2.5 feet (less than 1 meter). The author of the article used his own height as a comparison. Compared with his height of 6 feet (about 1.83 meters), such a stack of hard drives is more like a small box piled at his feet.

When the perspective turns to VirusTotal, this comparison immediately occurs a "magnitude jump." Converting the total volume of 31 PB, approximately 31,744 1 TB hard drives are needed to fully accommodate it. If the same "stacked vertically" method is used, the theoretical height of this "data tower" composed of hard drives will reach approximately 2,645 feet (approximately 806 meters). On the world's list of supertall buildings, this height is already approaching that of the Burj Khalifa, the world's tallest building in Dubai, which is approximately 2,722 feet (approximately 829 meters).

In other words, if VirusTotal’s malware sample is viewed as a column made entirely of hard drives, its height is only less than 80 feet shorter than the Burj Khalifa, which is enough to rival the skyline of this “vertical city.” The reporter also chose another iconic reference - the Eiffel Tower in Paris, which is approximately 1,083 feet (approximately 330 meters) tall. According to a rough estimate in this article, the amount of malware samples currently accumulated by VirusTotal is roughly equivalent to a stack of hard drives as high as "two and a half Eiffel Towers."

TechCrunch distributed a schematic diagram of information visualization in the report, arranging multiple reference objects horizontally from high to low according to height. From left to right: the approximately 2,722-foot-tall Burj Khalifa; the approximately 2,645-foot-tall VirusTotal "data tower" of hard drives; the approximately 1,792-foot-tall One World Trade Center; the approximately 1,083-foot-tall Eiffel Tower; the 6-foot-tall reporter himself; and a small stack of hard drives, only about 2.5 feet tall, representing vx-underground's 30 terabytes of data. Through this arrangement, readers can clearly feel the huge gap in data size between different "malware banks".

The report finally pointed out that these staggeringly large malware sample libraries are not only "necessities" for security research, but also invisibly portray the huge shadow of today's network threat situation. When security companies and researchers search, label, and model these data piles, they are actually racing against "invisible towers" to discover the clues of the next wave of attacks as early as possible.