Deep Reinforcement Learning with Different Rewards for Scheduling in High-Performance Computing Systems

Abstract

Scheduling is a challenging task for high-performance computing systems since it involves complex allocations of various types of resources among jobs with different characteristics. Because incoming jobs often vary with resource requests and may interact with other jobs, heuristics based scheduling algorithms tend to be suboptimal and require substantial amount of time to design and test under diverse conditions. As a result, reinforcement learning (RL) based approaches have been proposed to tackle various job scheduling challenges. We have also used deep neural networks for approximating the decisions in RL agents as table-based RL agent is not scalable for largescale problem sizes. The performance of RL agents, however, has proven to been notoriously instable and sensitive to training hyperparameters and the reward signal. In this work, we aim to study how different reward signals might affect the RL agents’ performance. We trained RL agents with four different reward signals and simulations results under Alibaba workloads showed that trained RL agents’ improve the performance for 60-65% of the jobset compared to two popular heuristics.