Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture

Chen, Jun-Shi; An, Hong; Han, Wen-Ting; Lin, Zeng; Liu, Xin

doi:10.1007/s11390-020-9826-z

Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture

Regular Paper
Published: 30 January 2021

Volume 36, pages 123–139, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Jun-Shi Chen¹,
Hong An¹,
Wen-Ting Han¹,
Zeng Lin¹ &
…
Xin Liu²

159 Accesses
1 Citation
Explore all metrics

Abstract

The short-range pair interaction consumes most of the CPU time in molecular dynamics (MD) simulations. The inherent computation sparsity makes it challenging to achieve high-performance kernel on the emerging many-core architecture. In this paper, we present a highly efficient short-range force kernel on the Sunway, a novel many-core architecture with many unique features. The parallel efficiency of this algorithm on the Sunway many-core processor is strongly limited by the poor data locality and write conflicts. To enhance the data locality, we adopt a super cluster based neighbor list with an appropriate granularity that fits in the local memory of computing cores. In the absence of a low overhead locking mechanism, using data-privatization force array is a more feasible method to avoid write conflicts, but results in the large overhead of data reduction. We adopt a dual-slice partitioning scheme for both hardware resources and computing tasks, which utilizes the on-chip data communication to reduce data reduction overhead and provide load balancing. Moreover, we exploit the single instruction multiple data (SIMD) parallelism and perform instruction reordering of the force kernel on this many-core processor. The experimental results show that the optimized force kernel obtains a performance speedup of 226x compared with the reference implementation and achieves 20% of peak flop rate on the Sunway many-core processor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures

Article 12 February 2018

Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi

ExaStamp: A Parallel Framework for Molecular Dynamics on Heterogeneous Clusters

References

Hollingsworth S A, Dror R O. Molecular dynamics simulation for all. Neuron, 2018, 99(6): 1129-1143. https://doi.org/10.1016/j.neuron.2018.08.011.
Article Google Scholar
Kumar S, Huang C, Zheng G et al. Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system. IBM Journal of Research and Development, 2008, 52(1/2): 177-188. https://doi.org/10.1147/rd.521.0177.
Shaw D E, Grossman J P, Bank J A et al. Anton 2: Raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer. In Proc. the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2014, pp.41-53. https://doi.org/10.1109/SC.2014.9.
Shaw D E, Deneroff M M, Dror R O et al. Anton, a special-purpose machine for molecular dynamics simulation. Commun. ACM, 2008, 51(7): 91-97. https://doi.org/10.1145/1364782.1364802.
Götz A W, Williamson M J, Xu D et al. Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. generalized born. Journal of Chemical Theory and Computation, 2012, 8(5): 1542-1555. https://doi.org/10.1021/ct200909j.
Pennycook S J, Hughes C J, Smelyanskiy M, Jarvis S A. Exploring SIMD for molecular dynamics, using Intel^® Xeon^® and Intel^® Xeon Phi coprocessors. In Proc. the 27th IEEE International Symposium on Parallel and Distributed Processing, May 2013, pp.1085-1097. https://doi.org/10.1109/IPDPS.2013.44.
Wang H Q, Peng S L, Zhu X Q et al. A method to accelerate GROMACS in offload mode on Tianhe-2 supercomputer. In Proc. the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2015, pp.781-784. https://doi.org/10.1109/CCGrid.2015.65.
Hu C J, Wang X M, Li J J et al. Kernel optimization for short-range molecular dynamics. Computer Physics Communications, 2017, 211: 31-40. https://doi.org/10.1016/j.cpc.2016.07.010.
Law T R, Hancox J, Wright S A, Jarvis S A. An algorithm for computing short-range forces in molecular dynamics simulations with non-uniform particle densities. Journal of Parallel and Distributed Computing, 2019, 130: 1-11. https://doi.org/10.1016/j.jpdc.2019.03.008.
Article Google Scholar
Peng S L, Cui Y B, Yang S Y et al. A CPU/MIC collaborated parallel framework for GROMACS on Tianhe-2 supercomputer. IEEE/ACM Trans. Comput. Biology Bioinform., 2019, 16(2): 425-433. https://doi.org/10.1109/TCBB.2017.2713362.
Anderson J A, Lorenz C D, Travesset A. General purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Computational Physics, 2008, 227(10): 5342-5359. https://doi.org/10.1016/j.jcp.2008.01.047.
Article MATH Google Scholar
Friedrichs M S, Eastman P, Vaidyanathan V et al. Accelerating molecular dynamic simulation on graphics processing units. Journal of Computational Chemistry, 2009, 30(6): 864-872. https://doi.org/10.1002/jcc.21209.
Minkin A S, Knizhnik A A, Potapkin B V. GPU implementations of some many-body potentials for molecular dynamics simulations. Advances in Engineering Software, 2017, 111: 43-51. https://doi.org/10.1016/j.advengsoft.2016.05.013.
Article Google Scholar
Spellings M, Marson R L, Anderson J A, Glotzer S C. GPU accelerated Discrete Element Method (DEM) molecular dynamics for conservative, faceted particle simulations. Journal of Computational Physics, 2017, 334: 460-467. https://doi.org/10.1016/j.jcp.2017.01.014.
Article MathSciNet Google Scholar
Fu H H, Liao J F, Yang J Z et al. The Sunway TaihuLight supercomputer: System and applications. Science China Information Sciences, 2016, 59(7): Article No. 072001. https://doi.org/10.1007/s11432-016-5588-7.
Dong W Q, Kang L T, Quan Z et al. Implementing molecular dynamics simulation on Sunway TaihuLight system. In Proc. the 18th IEEE International Conference on High Performance Computing and Communications, December 2016, pp.443-450. https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0070.
Dong W Q, Li K L, Kang L T, Quan Z, Li K Q. Implementing molecular dynamics simulation on the Sunway TaihuLight system with heterogeneous many-core processors. Concurrency and Computation: Practice and Experience, 2018, 30(16): Article No. e4468. https://doi.org/10.1002/cpe.4468.
Yu Y, An H, Chen J S et al. Pipelining computation and optimization strategies for scaling GROMACS on the Sunway many-core processor. In Proc. the 17th International Conference on Algorithms and Architectures for Parallel Processing, August 2017, pp.18-32. https://doi.org/10.1007/978-3-319-65482-9_2.
Duan X H, Gao P, Zhang T J et al. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In Proc. the International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 12. https://doi.org/10.1109/SC.2018.00015.
Páll S, Hess B. A flexible algorithm for calculating pair interactions on SIMD architectures. Computer Physics Communications, 2013, 184(12): 2641-2650. https://doi.org/10.1016/j.cpc.2013.06.003.
Article Google Scholar
Abraham M J, Murtola T, Schulz R et al. GROMACS: High performance molecular simulations through multilevel parallelism from laptops to supercomputers. SoftwareX, 2015, 1/2: 19-25. https://doi.org/10.1016/j.softx.2015.06.001.
Phillips J C, Braun R, Wang W, Gumbart J et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 2005, 26: 1781-1802. 10.1002/jcc.20289.
Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 1995, 117: 1-19. https://doi.org/10.1006/jcph.1995.1039.
Article MATH Google Scholar
Yao Z H, Wang J S, Liu G R, Cheng M. Improved neighbor list algorithm in molecular simulations using cell decomposition and data sorting method. Computer Physics Communications, 2004, 161(1/2): 27-35. https://doi.org/10.1016/j.cpc.2004.04.004.
Article Google Scholar
Nguyen T D. GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations. Computer Physics Communications, 2017, 212: 113-122. https://doi.org/10.1016/j.cpc.2016.10.020.
Article Google Scholar
Jia Z, Maggioni M, Staiger B, Scarpazza D P. Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv:1804.06826, 2018. https://arxiv.org/abs/1804.06826, April 2020.
Kunaseth M, Richards D F, Glosli J N et al. Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics. The Journal of Supercomputing, 2013, 66(1): 406-430. https://doi.org/10.1007/s11227-013-0915-x.
Lin J, Xu Z G, Cai L J, Nukada A, Satoshi M. Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations. Parallel Computing, 2018, 77: 128-143. https://doi.org/10.1016/j.parco.2018.06.001.
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the anonymous reviewers of this paper. Special thanks go to the National Supercomputing Center in Wuxi for providing the computational resources on the Sunway TaihuLight.

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026, China
Jun-Shi Chen, Hong An, Wen-Ting Han & Zeng Lin
National Research Center of Parallel Computer Engineering and Technology, Beijing, 100080, China
Xin Liu

Authors

Jun-Shi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hong An
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Ting Han
View author publications
You can also search for this author in PubMed Google Scholar
Zeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Ting Han.

Supplementary Information

ESM 1

(PDF 758 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, JS., An, H., Han, WT. et al. Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture. J. Comput. Sci. Technol. 36, 123–139 (2021). https://doi.org/10.1007/s11390-020-9826-z

Download citation

Received: 08 July 2020
Accepted: 08 July 2020
Published: 30 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11390-020-9826-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture

Abstract

Access this article

Similar content being viewed by others

A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures

Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi

ExaStamp: A Parallel Framework for Molecular Dynamics on Heterogeneous Clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards Efficient Short-Range Pair Interaction on Sunway Many-Core Architecture

Abstract

Access this article

Similar content being viewed by others

A thread-level parallelization of pairwise additive potential and force calculations suitable for current many-core architectures

Optimized Force Calculation in Molecular Dynamics Simulations for the Intel Xeon Phi

ExaStamp: A Parallel Framework for Molecular Dynamics on Heterogeneous Clusters

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation