media-blend
text-black

A close up of code on a computer monitor

Small language models: Powerful tools in the AI tool belt

This bite-size alternative to large language models can deliver big benefits for the right business scenarios.

There seems to be no limit to what artificial intelligence (AI) can help people do. It can explain the best way to conduct your morning routine based on neuroscience, measure exactly four liters of water using a three- or five-liter jug, and teach a 5-year-old what blockchain is using LEGO pieces. But the vast body of knowledge used to do these things—billions, even trillions of parameters used to train large language models (LLMs)—can be overkill for many specific business scenarios.

An advertising agency, for instance, might just need a generative AI model that can perform a focused task, like analyzing consumer behavior so it can develop more highly targeted ads. A smaller model trained specifically on the agency’s consumer and ad performance data could be more effective than one that’s been trained on the entire Internet.

Similarly, a law firm analyzing two parties’ legal obligations in a contract dispute doesn’t need vast amounts of legal precedent—it just needs a model that can draft initial versions of motions, affidavits, and discovery requests.

A small language model (SLM) is a powerful arrow in the AI quiver for these use cases. SLMs are models trained on relatively small amounts of specific data, which makes them ideal for performing simpler or more specialized tasks. How small? A reasonable range for an SLM is fewer than 10 billion parameters, according to one Capgemini estimate. Compare that with LLMs, whose parameters can reach into the tens of billions and even trillions. But the real distinction between an LLM and SLM is less about its exact size and “more a matter of whether it’s tailored to become better at a specific task through additional training,” says Johannes Hoffart, AI CTO at SAP.

Because of their small size and fine-tuning, SLMs require less processing power and lower memory. This means they’re faster, use less energy, can run on small devices, and may not require a public cloud connection.

The real distinction between an LLM and SLM is less about its exact size and “more a matter of whether it’s tailored to become better at a specific task through additional training.
Johannes Hoffart, AI CTO, SAP

What’s more, SLMs help democratize AI because they make AI models more affordable and easier to use. Training LLMs requires large, industrialized hardware, whereas SLMs can run off consumer-grade hardware, which makes them more accessible to small and midsize organizations.

Like LLMs, SLMs can understand natural language prompts and respond with natural language replies and are built using streamlined versions of the artificial neural networks found in LLMs. But SLMs are trained on focused datasets, making them very efficient at tasks like analyzing customer feedback, generating product descriptions, or handling specialized industry jargon. On the other hand, LLMs are more suited to applications that require orchestration of complex tasks involving advanced reasoning, data analysis, and an understanding of context.

“LLMs are like a starship. It’s very powerful and can go far, far away, but if you’re doing something very tactical and specific, that starship is way too powerful and you only need to travel one mile and not to the next galaxy,” says Neil Sahota, CEO of research firm ASCILabs and an AI advisor to the United Nations. “If speed and costs are concerns, SLMs are the better way to go.”

SLMs open up new opportunities for businesses, but also new complexity. Companies will need to determine the best portfolio of models for their scenario. To do this, they will need to ask important questions before incorporating SLMs into their AI strategy, including examining their use case, budget, performance requirements, and growth needs.

LLMs are like a starship. It’s very powerful and can go far, far away, but if you’re doing something very tactical and specific, that starship is way too powerful.
Neil Sahota, CEO of research firm ASCILabs and AI advisor to the United Nations

Two coders in a darkly lit room looking at various computer monitors with code

SLMs gain ground on LLMs

SLMs aren’t new, but they are increasingly hitting the market with impressive speed and processing power. Microsoft’s Phi-3-mini, for example—introduced in April 2024, with 3.8 billion parameters—performs as well as or better than models twice its size when it comes to performing the tasks it was specifically designed to do, such as reasoning, language understanding, math, and code generation, according to benchmarks.

SLMs are becoming easier to procure, too. The big AI players, like Meta’s Llama 3, Microsoft’s Phi-4, and Google’s Gemma, offer today’s versions of SLMs through a service model. SLMs can also be found embedded in large foundational model “families” from Google and OpenAI through Microsoft Azure, AWS, Anthropic, Mistral, and open-source models from Meta.

“Today I see some companies wanting to start with the SLM first,” Sahota says. A large mortgage company he recently worked with, for instance, reasoned that if it started using an SLM in one small area, it could see how it works without a lot of liability. “If something goes wrong, they can manage that and not worry about a million people getting exposed or getting the wrong information.”

But don’t expect a significant shift from LLMs to SLMs, experts say. Rather, organizations are more likely to implement a portfolio of models, each selected to suit a specific scenario. “Some customers may only need small models, some will need big models, and many are going to want to combine both in a variety of ways,” says Luis Vargas, vice president of AI at Microsoft, in an article published on Microsoft’s website.

AI developers, in fact, often work through a pipeline of models, with data prompts moving through several types of large and small models to come up with an answer. The query might first go to an LLM, then to an SLM for classification, then back to the LLM to extract the information and generate a response. The LLM understands the best models to use for each purpose and in what order to query them.

At larger organizations, an LLM could be used for complex tasks—like developing a long-term business strategy that considers an array of macroeconomic policies, global effects, and broad-based themes—while multiple SLMs handle dozens of business-unit-specific tasks such as analyzing consumer feedback and social media posts to help guide the direction of new product development.

Comparing RAG to SLMs

SLMs also come into play in retrieval augmented generation (RAG) strategies. RAG is a way to improve the relevancy of LLMs by retrieving contextually relevant, domain-specific data and using it to supplement responses to user prompts. SLMs also have a hand in playing this role, but ultimately, RAG and SLMs serve different purposes.

While RAG focuses on retrieving relevant information from external sources to enhance text generation, SLMs primarily use their internal knowledge base to understand and generate text. RAG can be thought of as a research assistant that gathers information, whereas SLM is a knowledgeable expert that draws upon internal wisdom.

SLMs can be used in a RAG context because they can work with vector databases to perform some of the background work needed to supply LLMs with the data that makes the LLM’s responses more accurate and relevant, like generating embeddings or creating semantic representations. “You could use the LLM for that, but it would be super expensive,” Hoffart says.

A multitude of SLM benefits

The sweet spot for SLMs tends to be narrow tasks in high-volume niche applications or in low-power environments, such as on smartphones or Internet of Things (IoT) gadgets, and for operations where data privacy is crucial. Key differentiators between SLMs and LLMs include:

No need to connect to the cloud. In the oil and gas industry, a field service engineer on an oil rig, offshore platform, or remote drilling site may not have high-bandwidth Internet access. With an SLM on their device, they could still use generative AI to query their field service manual to solve technical issues on a pump, processor, or sensor. Low computational requirements and local processing make this possible.

Can be used on a mobile device. A salesperson might want to access a generative AI model containing sensitive data at a client site. The SLM on the sales rep’s tablet could include customer-specific information—such as past interactions, the client’s preferences and pain points, contract terms, and pricing agreements—to come up with tailored recommendations for products or services that would be most relevant to that client. An SLM could provide those results without the lag and potential privacy concerns that often come with using a mobile device.

A closed lock resting on top of a computer chip

Addresses privacy issues. The fact that data never needs to leave the device is a huge benefit for privacy. For instance, SLMs can assist clinicians in analyzing patient data, extracting relevant information, and generating diagnoses and treatment options. With an SLM, the patient’s personal information can be kept inside the organization.

The challenges of SLMs

SLMs may be a cost-effective alternative to LLMs, but they still have limitations. They don’t understand complex language well, they lose accuracy when doing complex tasks, and they have a narrow scope of knowledge.

There are other trade-offs. While SLMs generally don’t cost a lot to run, costs could add up if multiple SLMs are in use. “If you have five models deployed and they’re each using GPUs and occupying space and electricity in the data center, that costs more versus having one huge model,” says Sean Kask, AI chief strategy officer at SAP. “Sure, the LLM uses a lot of electricity, but it’s being used for a lot of different things, and you can refine data for smaller, more specific queries through prompt engineering,” where users carefully craft and refine input instructions to guide the model toward the desired output.

While SLMs generally don’t cost a lot to run, costs could add up if multiple SLMs are in use. If you have five models deployed and they’re each using GPUs and occupying space and electricity in the data center, that costs more versus having one huge model.
Sean Kask, AI Chief Strategy Officer, SAP

What’s more, SLMs present many of the same challenges as LLMs when it comes to governance and security. “You still need a risk and regulatory framework,” says Jim Rowan, head of AI at Deloitte Consulting LLP. “You need an AI policy because you don’t want business units using data and AI models without your knowledge. And you still have to set up guardrails because SLMs hallucinate too,” he adds.

SLMs also aren’t necessarily easier to manage than LLMs. Even though the big AI players offer versions of SLMs through a service model where they provide the underlying engine, “you still need people who know what the right data is. You need domain experts and a data scientist who can develop a good training strategy for the model,” Sahota says.

The training is the secret sauce. “AI doesn’t follow a set of instructions or a fixed path like traditional software does. It’s like a high-energy intern. It’s ready to do whatever you need it to do, however tedious or menial, but you have to teach it first how to do it,” he adds.

Factors to consider when choosing an SLM or LLM

Businesses will need to determine the best portfolio of models for their business needs. For any use case, consider these five factors:

  1. What business case are you solving for?
    A narrowly focused problem might benefit from an SLM. If the dataset is very small, controlled, and available, such as HR documents or product descriptions, it makes great sense to use an SLM. “But if it’s a large stack of constantly changing data or there’s lots of variability in it, such as current mortgage rates or daily geopolitical events, you probably want to go the LLM route,” Sahota says.

    Another question is where will the model be used. If it will be used in remote locations, like an oil field or at a client site, then an SLM would be better.
  2. How fast do you need the model, and how much are you willing to invest in it?
    Commercial, off-the-shelf SLM platforms are fairly inexpensive—usually around US$100 a month for hosting and running the engine, according to Sahota. Some open-source SLMs, meanwhile, are free. “Infrastructure costs to maintain that model are pretty low, whereas an LLM can cost tens to hundreds of thousands of dollars,” he says.
  3. What kind of performance and accuracy are needed?
    SLMs can be very accurate about straightforward questions, like an employee making an inquiry into their current benefits. But if an employee says “I would like to pay a third mortgage; can I draw off my 401(k)?” they may get a more generic answer because it’s not specific to their employee benefits. An LLM might be better at handling this type of question, as it could include information on general HR and tax standards for 401(k) use.
  4. What are your growth needs?
    Businesses need to anticipate how big the SLM might get over time. “If you’re a retailer and you’re going to toss tens of thousands of products into the model over the next few years, that’s certainly an LLM,” Sahota says.
  5. Can the SLM be incorporated into a pipeline of models that can help produce the most accurate results? With some vendors’ AI systems, queries can go through a pipeline of models to generate a response. This is possible when the provider engages with an assortment of SLMs that fit the business problem you’re trying to solve and can assemble the models in a way that yields the best results, Kask says.

Creating the AI model portfolio

SLMs are a powerful tool in the AI toolbelt, and they’re well-suited to a variety of business use cases. As the number and type of available AI models continue to grow, businesses will need to understand the range of what’s available to arrive at a best-case method for their AI model portfolio.

“Choice is very important to your strategy,” Kask says. “Pick the model that’s right for you and for your embedded use case.”

The benefits of NLP

Natural language processing allows you to use complex software without becoming an expert.

Discover NLP