News

News Highlights

KYZEN Brings Reliability to Life at productronica 2025 with ANALYST² Process Control Demos

Indium Expert to Present on Advancing Thermal Performance at TestConX China

Flex Sites in Brazil and Mexico Recognized as Manufacturing Leaders by the Association for Manufacturing Excellence

More News
Books

Featured Books

Download

Download

Download
smt007 Magazine

Latest Issues

Current Issue
October 2025
Production Software Integration

EMS companies need advanced software systems to thrive and compete. But these systems require significant effort to integrate and deploy. What is the reality, and how can we make it easier for everyone?

Flip Book PDF Download

September 2025
Spotlight on India

We invite you on a virtual tour of India’s thriving ecosystem, guided by the Global Electronics Association’s India office staff, who share their insights into the region’s growth and opportunities.

Flip Book PDF Download

August 2025
Supply Chain Strategies

A successful brand is built on strong customer relationships—anchored by a well-orchestrated supply chain at its core. This month, we look at how managing your supply chain directly influences customer perception.

Flip Book PDF Download
Articles

Article Highlights

Simplifying Software Integration for Every Factory

FCT Leverages Flex Design and Total Build Solutions to Drive Innovation

Five Ways to Revolutionize EMS Sales With AI

More Articles
Columns

Latest Columns

Knocking Down the Bone Pile: Revamp Your Components with BGA Reballing

Global Sourcing Spotlight: Balancing Speed and Flexibility Without Sacrificing Control

SMT Perspectives & Prospects: Artificial Intelligence Part 6: Data Module 1

See all of our columnists
Links
Media kit

Media Kit - Choose Your Primary Marketing Focus:

Estimated reading time: 5 minutes

SMT Perspectives and Prospects

By Dr. Jennie Hwang

< Back To Columns

Connect:

October 9, 2024

SMT Perspectives and Prospects: The AI Era, Part 3: LLMs, SLMs, and Foundation Models

Since the introduction of ChatGPT on Nov. 30, 2022, and ChatGPT4 on March 14, 2023, large language models (LLMs) have been in everyday news and conversations. LLMs represent a significant advancement in AI, which has the potential to revolutionize multiple fields. This column offers a snapshot of LLMs from the user’s perspective.

As a subset of AI models, LLMs are designed to understand, process, and manipulate human language and generate human-like text through learning patterns and relationships. A model is trained on vast datasets, which allow it to recognize, translate, predict, and generate text or other content and perform a wide range of tasks related to natural language processing (NLP). The recent success of LLMs stems from the following:

The introduction of transformer architectures
The capability of increased computational power
The availability and use of vast training data

LLMs’ underlying technology is based on deep learning, particularly neural networks. Deep learning algorithms are capable of a wide range of natural language tasks. The most common architecture for LLMs is the transformer model, introduced in the groundbreaking paper, “Attention Is All You Need” by Vaswani in 2017¹.

Transformer Architectures
Transformers can derive meanings from long text sequences to understand how different words or semantic components might be related. They can then determine how likely they are to occur in proximity to each other.

The key components include attention mechanisms that focus on different parts of the input sequence when generating output, and self-attention mechanisms to process input data—allowing the model to weigh the importance of different words in a sentence sequence and understand context when making predictions. Its feed-forward neural networks process the attention outputs to produce the final predictions.

The architecture comprises an encoder-decoder structure. The encoder processes the input sequence and produces a set of continuous representations (embeddings), while the decoder takes the encoder's output and generates the final prediction, e.g., a translated sentence or a continuation of text. Additionally, a multi-head attention mechanism can improve the model's ability to focus simultaneously on different parts of the input sequence. Multiple attention heads enhance the model's capacity to capture diverse linguistic patterns and relationships within the data. Transformer architecture also uses positional encoding to compensate for the lack of sequential processing and maintains information about word order.

Transformer architecture facilitates effective pre-training on large datasets and subsequent fine-tuning for specific tasks. It is a key aspect of LLM development. This pre-training allows the transformer architecture to learn general language patterns while fine-tuning works on specific datasets to improve performance tasks. Many iterations are required for a model to reach the point where it can produce plausible results. The mathematics and coding that go into creating and training generative AI models, particularly LLMs, can be incredibly time-intensive, costly, and complex.

One of the unique advantages of transformer architecture is that it can handle input data in parallel. Parallel processing offers greater efficiency and scalability compared to other architectures, such as a recurrent neural network (RNN) or long short-term memory (LSTM), which process data sequentially.

LLMs
Based on the concept of transformer architecture, LLMs consist of intricate neural networks trained on large quantities of unlabeled text. An LLM breaks the text into words or phrases and assigns a number to each, using sophisticated computer chips and neural networks to find patterns in the pieces of text through mathematical formulas, and learns to “guess” the next word in a sequence. Then, using NLP, the model can understand what’s being asked and reply. Because it uses mathematical formulas rather than text searching to generate responses, it is not ready-made information waiting to be retrieved. Rather, it uses billions or even trillions of numbers to calculate responses from scratch; producing new sequences of words on the fly. However, LLMs are computationally intensive, requiring high computing power and parallel computing, such as graphic processing units (GPUs).

LLMs are characterized by their large parameters, which act as the model's knowledge bank. Table 1² shows the relative number of parameters and the maximum sequence length of the progressive ChatGPT models: GPT-1, GPT-2, GPT-3, and GPT-4. Models can handle tasks such as generating text, translating, making summaries, answering questions, and analyzing sentiments. They can also be fine-tuned to undertake specific tasks.

How large are LLMs? There is no universally agreed figure. However, they are generally characterized by the number of parameters (billions or even trillions) and the size of the training data they are exposed to. Usually, LLMs have at least 1 petabyte of storage (the human brain stores about 2.5 petabytes of memory data.)

This leads us to another related terminology: foundation models.

LLMs vs. Foundation Models
Foundation models are base models that provide a versatile “foundation” that can be fine-tuned and adapted for a wide range of applications, from language processing to image recognition. Foundation models are multimodal and can be trained on different data or modalities. In essence, LLMs are foundational models, but not all foundational models are LLMs.

LLMs vs. SLMs
Recently, “smaller” language models have come into vogue due to practical factors such as cost and readiness. So, what is considered a small language model (SLM)? In terms of size, there are no hard and fast rules. In general, LLMs typically have over 20 billion parameters. For example, GPT-3 has 175 billion as shown in Table 1, while SLMs range from 500 million to 20 billion parameters.

LLMs are broad-spectrum models trained on massive datasets, excelling at deep reasoning, complex context handling, and extensive content generation. SLMs are more specialized, focusing on specific domains or tasks. They may exhibit less bias and are less costly. They are also faster and potentially more accurate (less hallucination) and, accordingly, are more readily able to be put to work.

Artificial intelligence is still nascent but continues to advance. It would not be surprising to see the new frontier offering another level of capabilities and accuracy with a new architecture.

References

"Attention Is All You Need," by Ashish Vaswani, et al., Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS).
Professional Development Course: “Artificial Intelligence: Opportunities, Challenges, and Possibilities,” by Jennie S. Hwang.

Appearances

Dr. Jennie Hwang will instruct a professional development course on “Artificial Intelligence—Opportunities, Challenges and Possibilities” at SMTA International 2024, Oct. 21 in Chicago. She will also deliver the keynote speech titled “Artificial Intelligence Era: Work, Life, Technology, Leadership, and Women!” at the Women’s Leadership Program on Oct. 21.

This column originally appeared in the October 2024 issue of SMT007 Magazine.

Share on:

More Columns from SMT Perspectives and Prospects

SMT Perspectives & Prospects: Artificial Intelligence Part 6: Data Module 1
SMT Perspectives and Prospects: Warren Buffett’s Perpetual Wisdom, Part 2
SMT Perspectives and Prospects: Warren Buffett’s Perpetual Wisdom, Part 1
SMT Perspectives and Prospects: Artificial Intelligence, Part 5: Brain, Mind, Intelligence
SMT Perspectives and Prospects: Artificial Intelligence, Part 4—Prompt Engineering
SMT Perspectives and Prospects: A Dose of Wisdom
SMT Prospects and Perspectives: AI Opportunities, Challenges, and Possibilities, Part 1
SMT Perspectives and Prospects: Critical Materials—A Compelling Case, Part 3

News Highlights

More News

Featured Books

Latest Issues

Production Software Integration

Spotlight on India

Supply Chain Strategies

Article Highlights

More Articles

Latest Columns

See all of our columnists

Media Kit - Choose Your Primary Marketing Focus:

SMT Perspectives and Prospects

By Dr. Jennie Hwang

SMT Perspectives and Prospects: The AI Era, Part 3: LLMs, SLMs, and Foundation Models

Share on:

More Columns from SMT Perspectives and Prospects