Artificial Intelligence Could Help Data Centers Run Far More Efficiently
August 23, 2019 | MITEstimated reading time: 6 minutes
One concern, however, is that some workload sequences are more difficult than others to process, because they have larger tasks or more complicated structures. Those will always take longer to process — and, therefore, the reward signal will always be lower — than simpler ones. But that doesn’t necessarily mean the system performed poorly: It could make good time on a challenging workload but still be slower than an easier workload. That variability in difficulty makes it challenging for the model to decide what actions are good or not.
To address that, the researchers adapted a technique called “baselining” in this context. This technique takes averages of scenarios with a large number of variables and uses those averages as a baseline to compare future results. During training, they computed a baseline for every input sequence. Then, they let the scheduler train on each workload sequence multiple times. Next, the system took the average performance across all of the decisions made for the same input workload. That average is the baseline against which the model could then compare its future decisions to determine if its decisions are good or bad. They refer to this new technique as “input-dependent baselining.”
That innovation, the researchers say, is applicable to many different computer systems. “This is general way to do reinforcement learning in environments where there’s this input process that effects environment, and you want every training event to consider one sample of that input process,” he says. “Almost all computer systems deal with environments where things are constantly changing.”
Aditya Akella, a professor of computer science at the University of Wisconsin at Madison, whose group has designed several high-performance schedulers, found the MIT system could help further improve their own policies. “Decima can go a step further and find opportunities for [scheduling] optimization that are simply too onerous to realize via manual design/tuning processes,” Akella says. “The schedulers we designed achieved significant improvements over techniques used in production in terms of application performance and cluster efficiency, but there was still a gap with the ideal improvements we could possibly achieve. Decima shows that an RL-based approach can discover [policies] that help bridge the gap further. Decima improved on our techniques by a [roughly] 30 percent, which came as a huge surprise.”
Right now, their model is trained on simulations that try to recreate incoming online traffic in real-time. Next, the researchers hope to train the model on real-time traffic, which could potentially crash the servers. So, they’re currently developing a “safety net” that will stop their system when it’s about to cause a crash. “We think of it as training wheels,” Alizadeh says. “We want this system to continuously train, but it has certain training wheels that if it goes too far we can ensure it doesn’t fall over.”
Page 2 of 2Suggested Items
Koh Young Installs 24,000th Inspection System at Fabrinet Chonburi
04/23/2025 | Koh YoungKoh Young, the global leader in True 3D measurement-based inspection and metrology solutions, proudly announces the installation of its 24,000th inspection system at Fabrinet Chonburi in Thailand. This advanced facility is operated by Fabrinet Co., Ltd., a global provider of advanced manufacturing services, specializing in complex optical, electro-optical, and electronic products
Alphawave Semi Delivers Foundational AI Platform IP for Scale-Up and Scale-Out Networks
04/23/2025 | BUSINESS WIREAlphawave Semi, a global leader in high-speed connectivity and compute silicon for the world’s technology infrastructure, bolsters its leadership in foundational AI silicon connectivity subsystems through silicon proven chiplets and IP subsystems on advanced process nodes and package types. This is set to be showcased at the TSMC 2025 North America Technology Symposium.
Yamaha Boosts YRi-V AOI Productivity with 3D Component Update
04/22/2025 | Yamaha RoboticsYamaha Robotics Europe SMT Section has introduced instantaneous 3D component update, included with the latest software release for YRi-V automatic optical inspection (AOI) systems, letting users optimise inspection programs without stopping production.
In-Memory Computing: Revolutionizing Data Processing for the Modern Era
04/21/2025 | Persistence Market ResearchIn a world where milliseconds matter, traditional computing architectures often struggle to keep up with the massive influx of real-time data.
Neways Firmly Expands Defense Business
04/21/2025 | NewaysNeways Electronics, the global innovator in mission-critical electronics, foresees very strong growth in its defense-related activities in the near term.