Practical Parallelism

July 3, 2017 | MIT

Estimated reading time: 4 minutes

The chips in most modern desktop computers have four “cores,” or processing units, which can run different computational tasks in parallel. But the chips of the future could have dozens or even hundreds of cores, and taking advantage of all that parallelism is a stiff challenge.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory have developed a new system that not only makes parallel programs run much more efficiently but also makes them easier to code.

In tests on a set of benchmark algorithms that are standard in the field, the researchers’ new system frequently enabled more than 10-fold speedups over existing systems that adopt the same parallelism strategy, with a maximum of 88-fold.

For instance, algorithms for solving an important problem called max flow have proven very difficult to parallelize. After decades of research, the best parallel implementation of one common max-flow algorithm achieves only an eightfold speedup when it’s run on 256 parallel processors. With the researchers’ new system, the improvement is 322-fold — and the program required only one-third as much code.

The new system, dubbed Fractal, achieves those speedups through a parallelism strategy known as speculative execution.

“In a conventional parallel program, you need to divide your work into tasks,” says Daniel Sanchez, an assistant professor of electrical engineering and computer science at MIT and senior author on the new paper. “But because these tasks are operating on shared data, you need to introduce some synchronization to ensure that the data dependencies that these tasks have are respected. From the mid-90s to the late 2000s, there were multiple waves of research in what we call speculative architectures. What these systems do is execute these different chunks in parallel, and if they detect a conflict, they abort and roll back one of them.”

Constantly aborting computations before they complete would not be a very efficient parallelization strategy. But for many applications, aborted computations are rare enough that they end up squandering less time than the complicated checks and updates required to synchronize tasks in more conventional parallel schemes. Last year, Sanchez’s group reported a system, called Swarm, that extended speculative parallelism to an important class of computational problems that involve searching data structures known as graphs.

Irreducible atoms

Research on speculative architectures, however, has often run aground on the problem of “atomicity.” Like all parallel architectures, speculative architectures require the programmer to divide programs into tasks that can run simultaneously. But with speculative architectures, each such task is “atomic,” meaning that it should seem to execute as a single whole. Typically, each atomic task is assigned to a separate processing unit, where it effectively runs in isolation.

Atomic tasks are often fairly substantial. The task of booking an airline flight online, for instance, consists of many separate operations, but they have to be treated as an atomic unit. It wouldn’t do, for instance, for the program to offer a plane seat to one customer and then offer it to another because the first customer hasn’t finished paying yet.

With speculative execution, large atomic tasks introduce two inefficiencies. The first is that, if the task has to abort, it might do so only after chewing up a lot of computational cycles. Aborting smaller tasks wastes less time.

The other is that a large atomic task may have internal subroutines that could be parallelized efficiently. But because the task is isolated on its own processing unit, those subroutines have to be executed serially, squandering opportunities for performance improvements.

Fractal — which Sanchez developed together with MIT graduate students Suvinay Subramanian, Mark Jeffrey, Maleen Abeydeera, Hyun Ryong Lee, and Victor A. Ying, and with Joel Emer, a professor of the practice and senior distinguished research scientist at the chip manufacturer NVidia — solves both of these problems. The researchers, who are all with MIT’s Department of Electrical Engineering and Computer Science, describe the system in a paper they presented this week at the International Symposium on Computer Architecture.

With Fractal, a programmer adds a line of code to each subroutine within an atomic task that can be executed in parallel. This will typically increase the length of the serial version of a program by a few percent, whereas an implementation that explicitly synchronizes parallel tasks will often increase it by 300 or 400 percent. Circuits hardwired into the Fractal chip then handle the parallelization.

Time chains

The key to the system is a slight modification of a circuit already found in Swarm, the researchers’ earlier speculative-execution system. Swarm was designed to enforce some notion of sequential order in parallel programs. Every task executed in Swarm receives a time stamp, and if two tasks attempt to access the same memory location, the one with the later time stamp is aborted and re-executed.

Fractal, too assigns each atomic task its own time stamp. But if an atomic task has a parallelizable subroutine, the subroutine’s time stamp includes that of the task that spawned it. And if the subroutine, in turn, has a parallelizable subroutine, the second subroutine’s time stamp includes that of the first, and so on. In this way, the ordering of the subroutines preserves the ordering of the atomic tasks.

As tasks spawn subroutines that spawn subroutines and so on, the concatenated time stamps can become too long for the specialized circuits that store them. In those cases, however, Fractal simply moves the front of the time-stamp train into storage. This means that Fractal is always working only on the lowest-level, finest-grained tasks it has yet identified, avoiding the problem of aborting large, high-level atomic tasks.

Share on:

Testimonial

"Our marketing partnership with I-Connect007 is already delivering. Just a day after our press release went live, we received a direct inquiry about our updated products!"

Rachael Temple - Alltemated

Suggested Items

India’s Aerospace and Defence Engineered for Power, Driven by Electronics

09/16/2025 | Gaurab Majumdar, Global Electronics Association
With a defence budget of $82.05 billion (2025–26) and a massive $223 billion earmarked for aerospace and defence spending over the next decade, India is rapidly positioning itself as a major player in the global defence and aerospace market.

I-Connect007 Launches Advanced Electronics Packaging Digest

09/15/2025 | I-Connect007
I-Connect007 is pleased to announce the launch of Advanced Electronics Packaging Digest (AEPD), a new monthly digital newsletter dedicated to one of the most critical and rapidly evolving areas of electronics manufacturing: advanced packaging at the interconnect level.

VIDEOTON EAS's Bulgarian Subsidiary Expands Into Automotive Products

09/15/2025 | Videoton
VEAS Bulgaria, engaged in electronics manufacturing, has joined the ranks of VIDEOTON companies authorized to produce automotive products.

Variosystems Strengthens North American Presence with Southlake Relaunch 2025

09/15/2025 | Variosystems
Variosystems celebrated the relaunch of its U.S. facility in Southlake, Texas. After months of redesign and reorganization, the opening marked more than just the return to a modernized production site—it was a moment to reconnect with our teams, partners, and the local community.

Hanwha Aerospace to Collaborate with BAE Systems on Advanced Anti-jamming GPS for Guided Missiles

09/15/2025 | Hanwha
Hanwha Aerospace has signed a contract with BAE Systems to integrate next-generation, anti-jamming Global Positioning System (GPS) technology into Hanwha Aerospace’s Deep Strike Capability precision-guided weapon system.

News Highlights

More News

Featured Books

Article Highlights

More Articles

Latest Columns

See all of our columnists

Media Kit - Choose Your Primary Marketing Focus: