HomeSC is the International Conference for
 High Performnance Computing, Networking, Storage and Analysis
scyourway

SC Conference - Activity Details



A Massively Parallel Adaptive Fast-Multipole Method on Heterogeneous Architectures

Authors:
Ilya Lashuk  (Georgia Institute of Technology)
Aparna Chandramowlishwaran  (Georgia Institute of Technology)
Harper Langston  (Georgia Institute of Technology)
Tuan-Anh Nguyen  (Georgia Institute of Technology)
Rahul Sampath  (Georgia Institute of Technology)
Aashay Shringarpure  (Georgia Institute of Technology)
Rich Vuduc  (Georgia Institute of Technology)
Lexing Ying  (University of Texas at Austin)
Denis Zorin  (New York University)
George Biros  (Georgia Institute of Technology)
Papers Session
Particle Methods
Tuesday,  03:30PM - 04:00PM
Room PB255
Abstract:
We present new scalable algorithms and a new implementation of our kernel-independent fast multipole method (Ying et al. ACM/IEEE SC '03), in which we employ both distributed memory parallelism (via MPI) and shared memory/streaming parallelism (via GPU acceleration) to rapidly evaluate two-body non-oscillatory potentials. On traditional CPU-only systems, our implementation scales well up to 30 billion unknowns on 65K cores (AMD/CRAY-based Kraken system at NSF/NICS) for highly non-uniform point distributions. On GPU-enabled systems, we achieve 30X speedup for problems of up to 256 million points on 256 GPUs (Lincoln at NSF/NCSA) over a comparable CPU-only based implementations. We use a new MPI-based tree construction and partitioning, and a new reduction algorithm for the evaluation phase. For the sub-components of the evaluation phase, we use NVIDIA's CUDA framework to achieve excellent performance. Taken together, these components show promise for ultrascalable FMM in the petascale era and beyond.
The full paper can be found in the ACM Digital Library and IEEE Computer Society
   Sponsors    ACM    IEEE