75. EFFICIENT TEMPERATURE MANAGEMENT IN MULTIPROCESSOR SYSTEMS
Department: Computer Science & Engineering
Faculty Advisor(s):
Tajana Simunic-Rosing
Primary Student
Name: Shervin Sharifi
Email: shsharif@ucsd.edu
Phone: 858-534-9892
Grad Year: 2011
Abstract
Thermal hot spots and temperature variations have brought new challenges in reliability, performance and cooling costs in deep submicron system-on-chips (SoCs) and also in data centers. Conventional thermal management sacrifices performance to control the temperature by slowing down or stalling processors upon reaching a critical temperature threshold. In this work, we explore the benefits of low-overhead temperature-aware task scheduling for multiprocessor systems. In addition to reactive strategies that respond to the change in on-die temperature, we utilize predictors for forecasting future temperature and workload dynamics, and propose proactive thermal management techniques for multiprocessor SoCs. On an UltraSPARC T1 chip, we demonstrate that proactive methods achieve significantly better thermal profiles in comparison to their reactive counterparts, while avoiding the performance cost.
Our thermal management techniques require monitoring system characteristics (i.e., temperature and workload dynamics) at runtime. We leverage Continuous System Telemetry Harness (CSTH) for collecting real-time feedback from the hardware and software sensors. Temperature sensors typically are prone to inaccuracy, and also some systems lack a sufficient number of sensors on the die. To address these issues, we propose two novel solutions. Our first technique is used at design time for sensor allocation and placement to minimize the number of sensors while maintaining the desired accuracy. Second, we introduce a method for indirect temperature sensing, which accurately estimates the temperature at desired locations on the die based on the noisy temperature readings obtained from a limited number of thermal sensors.