BGP: CPMD

What is CPMD?

The CPMD code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio molecular dynamics.

Obtaining CPMD

A license must be obtained from http://www.cpmd.org/cpmd_licence.html

Building CPMD for Blue Gene/P

A shell script is included which can create a base configuration file for the BG/P. The preferred Makefile uses MPI and OpenMP:

./mkconfig.sh IBM-BGP-SMP

This Makefile is an optimized version of the base configuration Media:Makefile.ibm-bgp-smp. The code must be run in SMP mode with OMP_NUM_THREADS=4.

Accessing CPMD on Intrepid

CPMD 3.15.1 is available for use on Intrepid for legitimate CPMD license holders. Request for access to CPMD 3.15.1 must go directly to support@alcf.anl.gov. You will need to provide evidence of a CPMD license such as the original registration e-mail.

Performance Notes

The information below was gathered from either direct communication with Alessandro Curioni or determined from benchmarks on a test input file. Successful scaling of CPMD for a given problem equires a basic understanding of the DFT parallelization scheme.

  • The CPMD manual recomends that Trouillier Martins pseudopentials be used with the Kleinman-Bylander separation for large systems. This is also what most of the PW codes utilize.
  • ALLTOALL SINGLE performs the FFT transpose in single precision. There is a substantial improvement in the wall-clock time of the routine FFTCOM.
  • The default diagonalizes is ODIIS and it does not scale well. PCG minimize is preferred.
  • REAL SPACE WFN KEEP yields a modest performance improvement at the cost of using substantial more memory. Use with caution.
  • Taskgroups is essentially band parallelization. The 3D-FFT are divided into yz-planes. With this parallelization method, one quickly runs out of planes to distribute on the different processing elements (PEs). Instead of all processors working on the same 3D-FFT in sequences (band-by-band), Taskgroups creates groups of bands. In theory, the largest acceptable value of Taskgroup would be equal to the total number of bands. Then each group of processors would work on one band.
  • Taskgroups on the BG/P must be compatible with the partition dimensions. This is to ensure that PEs working on the same set of bands for the 3D-FFT are located in the same torus plane. For example, suppose you request 1024 nodes. This has a partition size of 8x8x16. You want to set Taskgroup to 16 and have ZYX ordering.
  • There appears to be no difference in performance between the Cartesian and standard Taskgroups communicator. Conflicting Taskgroups and the topologies were not tested.
  • DISTRIBUTE FNL ON saves memory by distributing the pseudopotential projectors.
  • DISTRIBUTE LINALG ON and BLOCKSIZE STATES set to 10-200 with the number of bands being a divisor. This distributes the matrices used in the orthogonalization procedure.