News

News Highlights

MacDermid Alpha Electronics Solutions Unveils Unified Global Website to Deepen Customer, Talent, and Stakeholder Engagement

Foxconn, TECO Announce Strategic Alliance Targeting AI Data Center Capabilities

Sypris Wins Awards to Support Subsea Cable Systems

More News
Books

Featured Books

Download

Download

Download
smt007 Magazine

Latest Issues

Current Issue
August 2025
Supply Chain Strategies

A successful brand is built on strong customer relationships—anchored by a well-orchestrated supply chain at its core. This month, we look at how managing your supply chain directly influences customer perception.

Flip Book PDF Download

July 2025
What's Your Sweet Spot?

Are you in a niche that’s growing or shrinking? Is it time to reassess and refocus? We spotlight companies thriving by redefining or reinforcing their niche. What are their insights?

Flip Book PDF Download

June 2025
Moving Forward With Confidence

In this issue, we focus on sales and quoting, workforce training, new IPC leadership in the U.S. and Canada, the effects of tariffs, CFX standards, and much more—all designed to provide perspective as you move through the cloud bank of today's shifting economic market.

Flip Book PDF Download
Articles

Article Highlights

I-Connect007 Editor’s Choice: Five Must-Reads for the Week

Solving the Toughest BGA Challenges in Electronics

Creating Connections in Mexico

More Articles
Columns

Latest Columns

SMT Perspectives and Prospects: Warren Buffett’s Perpetual Wisdom, Part 1

Standard of Excellence: Training Your Team to Excel in Customer Service

Smart Automation: What Industry 4.0 Means for Mid-sized Electronics Manufacturing

See all of our columnists
Links
Media kit

Media Kit - Choose Your Primary Marketing Focus:

Semidynamics Releases Tensor Unit Efficiency data for its new All-In-One AI IP

June 25, 2024 | Semidynamics

Estimated reading time: 4 minutes

Semidynamics, the European RISC-V custom core AI specialist, has announced Tensor Unit efficiency data for its ‘All-In-One’ AI IP running a LlaMA-2 7B-parameter Large Language Model (LLM).

Roger Espasa, Semidynamics’ CEO, explained, “The traditional AI design uses three separate computing elements: a CPU, a GPU (Graphical Processor Unit) and an NPU (Neural Processor Unit) connected through a bus. This traditional architecture requires DMA-intensive programming, which is error-prone, slow, and energy-hungry plus the challenge of having to integrate three different software stacks and architectures. In addition, NPUs are fixed-function hardware that cannot adapt to future AI algorithms yet-to-be-invented.

“In contrast, Semidynamics has re-invented AI architecture and integrates the three elements into a single, scalable processing element. We combine a RISC-V core, a Tensor Unit that handles matrix multiplication (playing the role of the NPU) and a Vector Unit that handles activation-like computations (playing the role of the GPU) into a fully integrated, all-in-one compute element, as shown in Figure 1. Our new architecture is DMA-free, uses a single software stack based on ONNX and RISC-V and offers direct, zero-latency connectivity between the three elements. The result is higher performance, lower power, better area and a much easier-to-program environment, lowering overall development costs. In addition, because the Tensor and Vector Units are under the direct control of a flexible CPU, we can deploy any existing or future AI algorithm, providing great protection to our customer’s investments.”

Figure 1: Comparison of traditional AI architecture to Semidynamics’ new All-In-One integrated solution

Large Language Models (LLMs) have emerged as a key element of AI applications. LLMs are computationally dominated by self-attention layers, shown in detail in Figure 2. These layers consist of five matrix multiplications (MatMul), a matrix Transpose and a SoftMax activation function, as shown in Figure 2. In Semidynamics’ All-In-One solution, the Tensor Unit (TU) takes care of matrix multiplication, whereas the Vector Unit (VU) can efficiently handle Transpose and SoftMax. Since the Tensor and Vector Units share the vector registers, expensive memory copies can be largely avoided. Hence, there is zero latency and zero energy spent in transferring data from the MatMul layers to the activation layers and vice versa. To keep the TU and the VU continuously busy, weights and inputs must be efficiently fetched from memory into the vector registers. To this end, Semidynamics’ Gazzillion™ Misses technology provides unprecedented ability to move data. By supporting a large number of in-flight cache misses, data can be fetched ahead-of-time yielding high resource utilization. Furthermore, Semidynamics’ custom tensor extension includes new vector instructions optimized for fetching and transposing 2D tiles, greatly improving tensor processing.

Figure 2: Attention Layer in LLM

Semidynamics has run the full LlaMA-2 7B-parameter model (BF16 weights) on its All-In-One element, using Semidynamics’ ONNX Run Time Execution Provider, and calculated the utilization of the Tensor Unit for all the MatMul layers in the model. The results are shown in Figure 3. The results are aggregated and presented organized by the A-tensor shape. There are a total of 6 different shapes in LlaMA-2, as shown in the x-axis labels in Figure 2. As it can be seen, utilization is above 80% for most shapes, in sharp contrast with other architectures. Results are collected in the most challenging conditions, i.e., with a batch of 1 and for the first-token computation. To complement this data, Figure 4 presents the Tensor Unit efficiency for large matrix sizes, to demonstrate the combined efficiency of the Tensor Unit and the Gazzillion™ technology. Figure 4 is annotated with the A+B matrix size. One can see that as the number of elements in the N, M, P dimensions of the matrix increase, the total size in MBs quickly exceeds any possible cache/scratchpad available. The noteworthy aspect of the chart is that the performance is stable slightly above 70%, irrespective of the total size of the matrices. This quite surprising result is thanks to the Gazzillion technology being capable of sustaining a high streaming data rate between main memory and the Tensor Unit.

Figure 3: LlaMA-2 Tensor Unit efficiency organized by Tensor-A shape

Figure 4: Tensor Unit utilization for 8-bit (left side) and 16-bit matrices (right side) for different matrix sizes

Espasa concluded, “Our new All-In-One AI IP not only delivers outstanding AI performance but is also so much easier to program as there is now just one software stack instead of three. Developers can use the RISC-V stack they already know and they do not have to worry about software-managed local SRAMs, or DMAs. Furthermore, Semidynamics provides an ONNX runtime optimized for the All-In-One AI IP, which allows programmers to easily run their ML models. Therefore, our solution represents a big step forward in programmer friendliness and ease-of-integration into new SOC designs. Our customers using All-In-One will be able to pass on to their customers, developers, and users all these benefits in the form of better and easier-to-program silicon.

“Moreover, our All-In-One design is completely resilient to future changes in AI/ML algorithms and workloads. This is a huge risk protection for customers starting a silicon project that will not hit the market for several years. Knowing that your AI IP will still be relevant when your silicon enters volume production is a unique advantage of our technology.”

Share on:

Testimonial

"The I-Connect007 team is outstanding—kind, responsive, and a true marketing partner. Their design team created fresh, eye-catching ads, and their editorial support polished our content to let our brand shine. Thank you all! "

Sweeney Ng - CEE PCB

Suggested Items

Real Time with... IPC APEX EXPO 2024: Advancements in Laser Depaneling with LPKF

04/24/2024 | Real Time with...IPC APEX EXPO
Jake Benz, LPKF sales manager for North America, discusses the company's advancements in laser depaneling. LPKF has introduced a green wavelength laser for processing rigid FR-4 circuit boards, bringing significant gains in processing speeds to market. The company transitioned from IR CO2 to UV wavelength due to heat and burning issues.

Altus: Leading CEM Invests in High Performance Laser Depaneling System to Future-Proof Production

02/23/2023 | Altus Group
Laser depaneling is one of the most advanced techniques for cutting and separating PCBs from the panel today and is changing how electronic components are manufactured.

Intel oneDNN AI Optimizations Enabled as Default in TensorFlow

05/26/2022 | Intel
In the latest release of TensorFlow 2.9, the performance improvements delivered by the Intel® oneAPI Deep Neural Network Library (oneDNN) are turned on by default.

ALPHA tensoRED Now Available in New Sizes

11/16/2017 | Alpha Assembly Solutions
Alpha Assembly Solutions has expanded the range of sizes available in the ALPHA tensoRED Master Tensioning Frame to now include all common printer sizes, as well as configuration options for full 29” printer openings.

Faster Big-Data Analysis

10/31/2017 | MIT
System for performing “tensor algebra” offers 100-fold speedups over previous software packages.

News Highlights

More News

Featured Books

Latest Issues

Supply Chain Strategies

What's Your Sweet Spot?

Moving Forward With Confidence

Article Highlights

More Articles

Latest Columns

See all of our columnists

Media Kit - Choose Your Primary Marketing Focus: