By Dr. Marc Snir, Director, Mathematics and Computer Science Division, Argonne National Laboratory
At the University of Illinois at Urbana Champaign, Dr. Snir is Michael Faiman and Saburo Muroga Professor in Computer Science.
Large-scale simulations have become in the last decades an essential tool in science and engineering: Simulations have replaced wind-tunnels because they are cheaper and faster; they have replaced combustion experiments because they enable us to peek at the center of a flame that is hard to observe in real experiments; they help us understand how the universe started and how a black hole behaves.The performance of top supercomputers has increased exponentially over the last decades. The current top system, deployed in China, contains the equivalent of 16,000 powerful server nodes and consumes 18 MW power. While this is an impressive system, it is dwarfed by the data centers of leading cloud providers: For example, the data centers of Amazon are estimated to contain 50,000 to 80,000 servers each; Amazon is said to have about 30 such centers. Have clouds dethroned supercomputers?
“Technologies that facilitate the integration of big computing with big data have the potential to significantly change industrial use of High- Performance Computing”There is much similarity between clouds and supercomputers: Both are built by connecting together tens of thousands of server nodes. However, clouds and supercomputers are typically optimized for different workloads. Supercomputers are large because any one job can utilize a sizeable fraction of the system. Clouds are large because larger data centers are more efficient and can better accommodate variations in a large number of small individual workloads. Supercomputing jobs often require interconnects with higher performance and the ability to access files in parallel. Therefore, they are typically fitted with more performing and more expensive interconnects and use a different I/O architecture. However, clouds are “good enough” for less demanding parallel workloads: i.e., computations that are “loosely coupled” and require less frequent interaction between the computing nodes. Clouds can replace small clusters in support of such workloads. In addition, technologies developed in support of cloud computing will be increasingly relevant to high-performance computing. For example, solutions for increased energy efficiency apply to both; the problem of monitoring the health of tens of thousands of servers and reacting quickly to failures is common. Container technologies, such as Docker can facilitate the porting of applications across platforms