Stability AI released the StableLM family of open-source AI language models on Wednesday. Stability wants to make sure that its open-source image synthesis model, Stable Diffusion, which came out in 2022, has the same result. If StableLM was improved, it could be used to make an open-source version of ChatGPT.
According to Stability, 3 billion and 7 billion parameter models for the alpha version of StableLM are available on GitHub.
In the future, 15 billion and 65 billion parameter models will be added. The company is sharing the models under the Creative Commons BY-SA-4.0 license, which says that any changes must give credit to the original creator and use the same license.
London-based Stability AI Ltd. has set itself up as an open-source competitor to OpenAI, which, despite its “open” name, doesn’t share many open-source models and keeps its neural network weights, which define the core functionality of an AI model, secret.
In an initial blog post, Stability says, “Language models will be the backbone of our digital economy, and we want everyone to have a say in how they are designed.” “Models like StableLM show how much we want AI technology that is open, easy to use, and helpful.”
Like GPT-4, the most powerful form of ChatGPT, which is powered by a large language model (LLM), StableLM makes text by predicting the next token (word fragment) in a sequence. At the beginning of that process, a person gives information in the form of a “prompt.” So, StableLM can write text that looks like it was written by a person and write programs.
Like other recent “small” LLMs like Meta’s LLaMA, Stanford Alpaca, Cerebras-GPT, and Dolly 2.0, StableLM claims to achieve similar results to OpenAI’s benchmark GPT-3 model while using far fewer parameters—7 billion for StableLM and 175 billion for GPT-3.
A language model uses factors called parameters to learn from training data. Fewer parameters make a language model smaller and more efficient, which can make it easier to run on local devices like smartphones and computers. But to get high performance with fewer parameters, careful engineering is needed, which is a big challenge in the area of AI.
“Our StableLM models can generate text and code and will power a wide range of downstream applications,” says Stability. “They show how, with the right training, small, efficient models can deliver high performance.”
Stability AI says that StableLM was learned on “a new experimental data set” that was based on an open source data set called The Pile, but was three times bigger.
Stability says that the “richness” of this data set, whose details it plans to share later, is what makes the model’s “surprisingly high performance” at conversational and coding tasks with smaller parameter sizes possible.
In our informal tests with a fine-tuned version of StableLM’s 7B model built for dialog based on the Alpaca method, we found that it seemed to perform better (in terms of results you would expect given the prompt) than Meta’s raw 7B parameter LLaMA model, but not as well as GPT-3. Versions of StableLM with more parameters may be more flexible and powerful.
Researchers at the CompVis group at the Ludwig Maximilian University of Munich made Stable Diffusion. In August of last year, Stability paid for and promoted the open-source start of Stable Diffusion.
Stable Diffusion was an early open-source latent diffusion model that could make pictures from prompts. It was the start of a time when image-synthesis technology grew quickly. It also got a lot of pushback from artists and businesses, and some of them have sued Stability AI. The move of Stability into language models could lead to similar results.
The 7 billion-parameter StableLM base model can be tested on Hugging Face, and the fine-tuned model can be tested on Replicate. Hugging Face also has a conversation-tuned version of StableLM that works the same way as ChatGPT.