“In existing systems, these are hard-coded parameters that you have to decide up front.”
MIT researchers have created a reinforced learning (RL) system that can reduced the time needed to complete data-processing operations.
Decima is a RL-based scheduler that could potentiality save firms millions when they run datacentres as it can help to reduce the energy requirement for processing each task.
Normally the fine-tuning of scheduling algorithms is done by humans, however, Decima aims to move most of these decisions over to automation.
Hongzi Mao, a PhD student in the Department of Electrical Engineering and Computer Science (EECS) commented in a blog that: “If you have a way of doing trial and error using machines, they can try different ways of scheduling jobs and automatically figure out which strategy is better than others. That can improve the system performance automatically. And any slight improvement in utilization, even 1 percent, can save millions of dollars and a lot of energy in data centers.”
When a workload enters into a datacentre it can be often be represented by graphs of nodes and edges. Every node is a task that requires computational power to complete. A scheduling algorithm is then tasked with assigning each node to a server.
Typically developers deploy software in the form of a scheduling agent that makes decisions and chooses the best fit for each node.
MIT state that their system’s “agent” is a: “Scheduling algorithm that leverages a graph neural network, commonly used to process graph-structured data. To come up with a graph neural network suitable for scheduling, they implemented a custom component that aggregates information across paths in the graph — such as quickly estimating how much computation is needed to complete a given part of the graph.”
To train their RL systems the developers simulated an array of graph sequences that copied how real-world workloads would enter a datacentre. The RL agent then made decisions on how best to allocate each node in the graph to each server. The agent is given a score reward for each instance when it reduces the time it takes to complete a task. The system strives to get the highest reward score so it keeps on improving its decisions which helps to optimise the centre.
The papers co-author Mohammad Alizadeh, an EECS professor and researcher in the Computer Science and Artificial Intelligence Laboratory (CSAIL) notes that the system by nature is adaptive and its ability to learn and tweak gives it huge advantage over traditional systems. “In existing systems, these are hard-coded parameters that you have to decide up front. Our system instead learns to tune its schedule policy characteristics, depending on the data center and workload,” Alizadeh states.