Advertisement

Sunday, 19 April 2015

IBM vs. Intel in Supercomputer Bout

The U.S. wants to one-up China in supercomputers and it's looking for a few good semiconductor architectures to help out.

The fastest supercomputer in the world is currently the Chinese Tianhe-2 running at a peak of 55 petaflops on Intel Xeon and Xeon Phi processors. The Collaboration of Oak Ridge, Argonne and Lawrence Livermore (CORAL) project financed by the U.S. Department of Energy (DOE) aims to one-up the Chinese with up to 200 petaflops systems by 2018. The three systems, named Summit, Aurora and Sierra, respectively, have also pitted IBM/Nvidia and their graphics processing units (GPUs) against Intel/Cray's massively parallel x86 (Xeon Phi) architecture.

"Over 100 experts were involved in picking two different architectures as mandated by the CORAL request for proposals," the director of science for the National Center for Computational Sciences at Oak Ridge National Labs, Jack Wells, told EE Times. "Two locations — Oak Ridge and Livermore — were chosen to go with IBM, and Argonne was chosen to use Intel processors."


DOE's purpose if forcing the National Labs to choose two different architectures is mysterious, but was stated to be "in order to meet DOE's mission needs." Other than the general rule of not single-sourcing anything important, the purpose was perhaps just what they said — to meet the very different needs of the three labs — ranging from designing new semiconductor materials to simulating the explosive power of U.S. atomic bombs.

Oak Ridge National Laboratory (ORNL), for instance, today announced its intensions for Summit with 13 Center for Accelerated Application Readiness (CAAR) projects chosen to run on its current Titan AMD/Nvidia-based supercomputer as a warm up for Summit's IBM/Nvidia architecture, which will run 5-to10 times faster than Titan, about from 135-to-270 petaflops for Summit.

"We got 100s of proposals for how to make best use of the Summit's CPU/GPU architecture," Wells told us. "But we narrowed it down to 13 which we hope to have ready to run on Summit when its installation is complete in 2018."

Of the 13 projects, only one directly involves searching for new semiconductor materials, specifically superconductors, by Research Scientist Paul Kent at Oak Ridge National Laboratory. The others involve climate simulation by Research Scientist David Bader at Lawrence Livermore National Laboratory, relativistic chemistry by Professor Lucas Visscher at the Free University of Amsterdam, astrophysics by Research Scientist Bronson Messer at Oak Ridge National Laboratory, plasma physics by Professor Zhihong Lin at the University of California-Irvine, cosmology by Research Scientist Salman Habib at Argonne National Laboratory, electron-structure by Professor Poul Jørgenson at Aarhus University, biophysics by Professor Klaus Schulten at the University of Illinois at Urbana-Champaign, nuclear physics applications by Research Scientist Gaute Hagen at Oak Ridge National Laboratory, computational chemistry by Research Scientist Karol Kowalski at the Pacific Northwest National Laboratory, combustion engineering by Research Scientist Joseph Oefelein at Sandia National Laboratories, seismology by Professor Jeroen Tromp at Princeton University and plasma physics by Professor Choong-Seock Chang, at Princeton Plasma Physics Laboratory. All of which will be prepared on AMD/Nvidia's Titan for IBM/Nvidia's Summit, which will be based on the "data centric" principles of the OpenPOWER Foundation's approach to handling "big data."

"With the world generating more than 2.5 billion gigabytes of data every day, a holistic approach to performance, across things like memory, bandwidth, and data movement, is necessary to ensure today’s supercomputers are not missing vital pieces of information infiltrating through the big data pipeline. Through this holistic perspective, IBM with members of the OpenPOWER Foundation are creating data centric systems that will solve these data challenges and minimize data movement within the supercomputer to radically reduce data-movement induced latency," Herb Schultz, IBM manager for Technical Computing and Big Data & Analytics told EE Times.

In fact, IBM just finished installing perhaps the most beautiful supercomputer installation in the world to handle Spain's big data in a former Barcelona church (see photo). In contrast, Summit may not be as beautiful (see photo) but it will pack Multiple IBM POWER9 processors and multiple Nvidia Volta GPUs connected with high speed NVLink, a large coherent memory of more than 512 GBytes, an additional 800 GBytes of NVRAM — configurable as either a burst buffer or as extended memory. Dual-rail Mellanox optical interconnects will be configured as a full, non-blocking fat-tree with a file system transferring 1TByte per second to a 120 petaByte disk farm.


Due to security concerns about Livermore Labs stewardship of the U.S. nuclear arsenal, its a little more secretive about the exact configuration of Sierra, but it does admit that the IBM Power9 based supercomputer with Nvidia GPUs will top 200 petaflops. Sierra will also support the DoE's Advanced Scientific Computing Research (ASCR) program and will target what it calls "non-recurring engineering" (NRE) research and development (code reuse).

Intel on the other hand, is open and talking about its 50,000 node Xeon Phi based Aurora supercomputer for Argonne National Laboratory which will run at 180 petaflops. (The Aurora contract was actually awarded to Intel Federal LLC, a wholly-owned subsidiary of Intel Corp. and also involves a second smaller supercomputer called Theta, that will serve as an early production system for the Aurora by providing 8.5 petaflops and requiring only 1.7 megawatts of power.) Intel's Xeon Phi based Aurora will use Cray's next-generation Shasta chassis and will be used to design more powerful, efficient and durable batteries and solar panels, according to Intel, as well as improved biofuels, more effective disease control, improved transportation systems and more efficient and quieter engines and wind turbines.

Communications will be handled by Intel's Omni-Path Fabric optical interconnect technology, non-volatile memories and Intel's Lustre storage system and software. The Theta system, on the other hand will be based on Cray's XC supercomputer chassis, with a similar, scaled down, memory and interconnection fabric.

No comments:

Post a Comment