When it comes to large language models (LLMs), scale certainly matters because it affects where the model is run. StabilityAI, a manufacturer known for its stable diffusion text-to-image generation artificial intelligence technology, today released one of its smallest models to date - StableLM21.6B.
StableLM is a text content generation LLM that Stable AI first launched in April 2023 with 3 billion and 7 billion parameter models. The new StableLM model is actually the second model released by StabilityAI in 2024, after the company released StableCode3B earlier this week.
The new StableLM model is compact and powerful, designed to lower the barrier to entry for more developers to participate in the generative AI ecosystem and incorporate multilingual data in seven languages: English, Spanish, German, Italian, French, Portuguese and Dutch. The model leverages the latest algorithmic advances in language modeling to achieve the best balance of speed and performance desired by StabilityAI.
Carlos Riquelme, head of the language team at StabilityAI, told VentureBeat: "In general, larger models trained with similar training recipes on similar data tend to perform better than smaller models. However, over time, as new models are able to implement better algorithms and train on more and higher quality data, we sometimes see recent smaller models outperform older larger models."
According to StabilityAI, the model outperforms other small language models with parameters below 2 billion on most benchmarks, including Microsoft's Phi-2 (2.7 billion), TinyLlama1.1B, and Falcon1B. The new, smaller StableLM is even capable of outperforming some of the larger models, including StabilityAI’s earlier StableLM3B model.
Riquelme said: "StableLM21.6B. Performs better than some of the larger models trained a few months ago. Consider similar trends in computers, televisions or microchips, where they become smaller, thinner and better over time."
To be clear, the smaller StableLM21.6B does have some disadvantages due to its small size. Due to the nature of small, low-capacity language models, StableLM21.6B may also exhibit some common issues, such as high hallucination rates or potentially toxic language.
Over the past few months, StabilityAI has been working on smaller and more powerful LLM options. In December 2023, the StableLMZephyr3B model was released, which is smaller in size but more powerful than the initial model released in April.
The new StableLM2 model is trained on more data, including multilingual documents in 6 languages (Spanish, German, Italian, French, Portuguese and Dutch) in addition to English. Another interesting aspect highlighted by Riquelme is the order in which data is presented to the model during training. He points out that focusing on different types of data at different stages of training can be rewarding.
Taking things a step further, StabilityAI is offering new models with pretraining and fine-tuning options, as well as a format the researchers call "...last model checkpoint before pretraining cooldown."
"Our goal is to provide individual developers with more tools and artifacts to innovate, adapt and build on existing models. Here, we provide a concrete, semi-finished model for people to use," said Riquelme.
During the training process, the model is updated sequentially and its performance improves. In this case, the first model knows nothing, while the last model has consumed most of the data and is expected to learn it. At the same time, models may become less flexible toward the end of training because they are forced to end learning.
"We decided to make the model available in its current form before starting the final phase of training so that - hopefully - it would be easier to specialize it for other tasks or datasets that people might want to use," he said. "We're not sure if this will work well, but we really believe in people's ability to leverage new tools and models in amazing ways."