Who will become the dragon slayer? At the beginning of 2024, technology stocks that surged last year plummeted, but NVIDIA, the leader in the AI ​​wave, still has unabated momentum. There is no chip company that is not jealous of Nvidia's status. As the pie in the AI ​​industry grows bigger and bigger, the hardware track is also becoming more crowded to the naked eye. A large number of startups are trying to get a piece of Nvidia's GPU budget.

The media summarized 12 companies currently at the forefront of competition. The average history of these start-ups is only five years, and the highest financing amount has reached US$720 million. They are all powerful challengers to Nvidia.

Cerebras

Date of establishment: 2015

Application areas: training

Cerebras is famous for making giant chips. Co-founded by Gary Lauterbach and Andrew Feldman. The two also co-founded Seammicro, a company focusing on ultra-high-density computer server business, which was acquired by AMD in 2012 for a whopping $357 million.

Cerebras' main products are supercomputer chips and systems that can be used for AI training. They are built specifically for supercomputing tasks. Such chips are approximately 56 times the size of ordinary GPUs.

Cerebras' customers are concentrated in national defense, academic laboratories and other institutions. The flagship product CS-2 supercomputing system has been deployed in the U.S. Department of Energy's Argonne National Laboratory, the Pittsburgh Supercomputing Center, the University of Edinburgh Supercomputing Center and other places.


However, although it has received up to US$700 million in financing, Cerebras faces difficult challenges in winning commercial customers due to the dominance of Nvidia's GPU and CUDA ecosystem.

In January, the company announced that it would cooperate with the Mayo Clinic, a top medical institution in the United States. The Mayo Clinic will use Cerebras' computing chips and software to develop proprietary AI models based on decades of anonymized medical records and data.

Some models will reportedly be able to read and write text, such as summarizing the most important parts of a medical record for a new patient. Other models can analyze complex medical pictures or analyze genomic data.

Cerebras CEO Andrew Feldman said this is a multi-year agreement worth "multi-million dollars".

d-Matrix

Date of establishment: 2019

Application areas: reasoning

Founded in 2019, d-Matrix is ​​developing a specialized chip and software for running machine learning models. The company's chip can combine processing and memory, which are usually separate and distinct components on the chip.

d-Matrix's chips generate less heat and therefore require less cooling, making them more cost-effective than mainstream GPU and CPU chips. The company CEO said that many companies hope to use large models to design AI applications, and cost is very important.

d-Matrix chooses to focus on inference, that is, running AI models, rather than training. The company believes that over time, models will get larger and more expensive to run. The company already has customers testing its chips and software, and plans to put them into commercial use in the first half of 2024.

Etched

Date of establishment: 2023

Application areas: reasoning

Etched was founded by two Harvard dropouts, Gavin Uberti and Chris Zhu, in June last year. The company plans to produce an AI inference acceleration chip called Sohu, with inference performance 10 times that of H100. The company was valued at US$34 million shortly after its establishment.


According to reports, in terms of manufacturing technology, Sohu adopts a revolutionary method of directly engraving the transformer structure into the core of the chip. As a result, performance can reach unprecedented heights, with Sohu running large models in simulations up to 140 times faster than traditional GPUs. Sohu also supports better encoding through tree search, which can compare hundreds of responses in parallel, and can also perform multicast speculative decoding (Multicast speculative decoding), which can generate new content in real time.

Etched's blog states that this architecture will allow running trillion-parameter models with unparalleled efficiency. The system has a single core and accommodates a fully open source software stack scalable to 100T parametric models.

Extropic

Date of establishment: 2022

Application areas: inference & training

Extropic is the most mysterious of these startups. The founder of the company came from "X", Google's "moon landing factory" department focused on cutting-edge technology exploration. According to reports, Extropic focuses on quantum computing and plans to develop a chip specifically for running large models, but no details about specific products have been revealed yet.

At the end of last year, the company just completed a $14.1 million seed round of financing.

According to the company's press release, as the world's need for scalable, cost-effective, and efficient computing increases dramatically with the rise of generative artificial intelligence, Extropic hopes to enable computers in the future to harness entropy as an asset, program themselves to learn, and operate with unprecedented efficiency:

Extropic's computing paradigm is built on the principles of thermodynamics and aims to seamlessly integrate generative artificial intelligence with the fundamental physics of the world. Our goal is to eventually embed generative artificial intelligence into physical processes and break through the efficiency limits dictated by physical laws in terms of space, time, and energy.

Groq

Date of establishment: 2016

Application areas: reasoning

Graphcore was founded in 2016 and is headquartered in Bristol, UK. The company's main product is the intelligent processing unit (LPU), which focuses on large model reasoning.

The biggest feature of the company's products is its extremely fast generation speed, which ensures a smooth terminal experience. In consumer AIGC applications, users have high requirements for speed, and GroqLPU paired with the open source model MetaLlama270B can generate 300 words per second, and the same number of words as Shakespeare's "Hamlet" can be generated in 7 minutes, which is 75 times faster than the typing speed of ordinary people.

Groq co-founder and CEO Jonathan Ross believes that the cost of inference is becoming an issue for companies that use artificial intelligence in their products, because as the number of customers using these products increases, the cost of running models is also increasing rapidly. GroqLPU clusters will provide higher throughput, lower latency and lower cost for large model inference compared to NVIDIA GPUs.

In addition, due to the production capacity of HBM3 and CoWoS packaging, the current production capacity of NVIDIA GPU cannot fully meet customer needs. The uniqueness of GroqLPU is that it does not rely on Samsung or Hynix's HBM, nor does it rely on TSMC's CoWoS packaging technology, so it will not face production bottlenecks like NVIDIA.

Lightmatter

Date of establishment: 2017

Application areas: training & inference

Lightmatter uses light from lasers to transmit data between chips and server farms. The company was founded by MIT students using the school's patented technology.

According to company co-founder and CEO Nicholas Harris, Lightmatter's products can reduce data center energy costs by about 80% compared with chip manufacturers such as Nvidia, AMD and Intel that transmit data through cables.

MatX

Date of establishment: 2022

Application areas: not announced

MatX was founded by former Google employees. CEO Reiner Pope is one of the developers of the Google Pathways large model, and Chief Technology Officer Mike Gunter is one of the developers of Google TPU.

MatX is developing LLM-specific chips for text applications. The company said that compared with Nvidia GPU hardware, its self-developed chips run faster and cost less, and can support a variety of artificial intelligence applications including image generation.

MatX said that the company has received support from several venture capital companies, but did not disclose the specific funds. It also said that it has received "strong support from well-known large model developers" but did not disclose the specific companies.

Modular

Date of establishment: 2022

Application areas: Reasoning; starting to get involved in training this year

Modular focuses on creating a development platform and coding language for training and running large models. Users can use various AI tools on the platform, including Google's open source software TensorFlow and Meta's open source software PyTorch.

The company believes that AI development today is hampered by overly complex and fragmented technology infrastructure, and Modulal's mission is to eliminate the complexity of building and maintaining AI systems at scale.

Building and running artificial intelligence applications requires a lot of computing power, and to control costs, a company may use different types of AI chips, but the software of these chips is often incompatible with each other. In particular, Nvidia's Cuda software for writing machine learning applications can only run on its own chips, essentially locking developers into its GPUs. Cuda's users are so sticky that it reportedly took one computer vision startup two years to switch to a non-Nvidia chip.

Modular hopes to change that by developing a Cuda alternative that solves software compatibility issues for different chips and makes it easier to use non-Nvidia chips.

RainAI

Date of establishment: 2017

Application areas: inference & fine-tuning

The training and inference process of traditional GPUs is expensive. This cost is partly due to the heat generated by these chips when transferring data from memory and processing components. Therefore, GPUs need to be continuously cooled, thus increasing the power costs of data centers.

RainAI’s NPU chip can simulate the human biological brain and combine memory and processing functions. It not only performs well in terms of computing speed and energy efficiency, but can also customize or fine-tune artificial intelligence models in real time according to the surrounding environment. However, the company has not produced finished products yet.

According to media reports, a letter of intent signed in 2019 shows that OpenAI plans to spend US$51 million to purchase RainAINPU chips, which will be used for the training and deployment of GPT models.

Sima.ai

Date of establishment: 2018

Application areas: reasoning

Sima.ai focuses on developing hardware and software for edge computing devices, which are used in scenarios such as aircraft, drones, automobiles and medical equipment, rather than data centers.

Company founder Krishna Rangasayee has worked at chipmaker Xilinx for nearly two decades. Previously, in an interview with the media, he said that many industries are unable to use cloud-based AI services due to various reasons, and Sima.ai will focus on serving those decentralized edge computing devices.

For example, self-driving cars need to make decisions on the fly, and only built-in AI can meet their demanding latency requirements. And in industries like healthcare, companies may not want to send sensitive data to the cloud but rather keep it on the device.

In June 2023, Sima.ai said it had begun mass production of its first-generation edge artificial intelligence chips. The company said it is working with more than 50 customers in manufacturing, automotive and aviation.

Tenstorrent

Date of establishment: 2016

Application areas: training & inference

Tenstorrent was founded by three former AMD employees and is headquartered in Toronto, Canada.

Tenstorrent develops RISC-V and AI chips in the form of heterogeneous and chiplet designs. Currently, two chips, Grayskull and Wormhole, based on the 12nm process, have been developed, with FP8 computing power up to 328TFlops. The company's goal is to reduce the price to 1/5 to 1/10 of similar performance GPUs.

In 2021, Tenstorrent also launched DevCloud, which allows AI developers to run large models without purchasing hardware.

However, in recent years, perhaps feeling the pressure from hardware manufacturers such as Nvidia, Tenstorrent has shifted its focus to technology licensing and services.

TinyCorp

Date of establishment: 2022

Application areas: training & inference

TinyCorp was founded by George Hotz, the founder and former CEO of autonomous driving startup CommaAI. Its products will be built with an open source deep learning tool called tinygrad, which is said to help developers speed up training and running large language models.

Hotz believes that tinygrad can become a "strong competitor" to Pytorch, the deep learning product derived from Meta. But he has not yet revealed specific details about the product.

Risk warning and disclaimer

The market is risky and investment needs to be cautious. This article does not constitute personal investment advice, nor does it take into account the special investment objectives, financial situation or needs of individual users. Users should consider whether any opinions, views or conclusions contained in this article are appropriate to their particular circumstances. Invest accordingly and do so at your own risk.