| AMBER Archive (2007)Subject: Re: AMBER: PMEMD configuration and scaling
From: Robert Duke (rduke_at_email.unc.edu)Date: Tue Oct 09 2007 - 08:15:34 CDT
 
 
 
 
Lars -
Thanks for the update.  I expect what you are seeing here in the
 worse-than-expected values for infiniband are due either to 1) the impact of
 a quad core on one infiniband card (ie., with quad you are sending twice the
 traffic through one network interface card that you would send if you had a
 dual cpu per node configuration, roughly speaking), 2) possibly still mpi
 issues - mvapich is what we have tested in the past, 3) possibly less
 high-end infiniband hardware than we have tested.  The data I have on the
 JAC benchmark, running on dual cpu opteron nodes, really nice infiniband,
 very well maintained (this is jacquard at NERSC), is:
 
 Opteron Infiniband Cluster - JAC - NVE ensemble, PME, 23,558 atoms
 #procs         nsec/day       scaling, %
     2          0.491          100
4          0.947           96
 8          1.82            92
 16          3.22            82
 32          6.08            77
 64         10.05            64
 96         11.84            50
 128         12.00            38
 
 Also nice to see the GB ethernet numbers.  Note that your calc for %scaling 
on 24 infiniband cpu's has to be wrong.
 
 Best Regards - Bob
 ----- Original Message ----- 
From: <Lars.Skjarven_at_biomed.uib.no>
 To: <amber_at_scripps.edu>
 Sent: Tuesday, October 09, 2007 6:05 AM
 Subject: AMBER: PMEMD configuration and scaling
 
 > Again, thank you Bob and Ross for your replies. It provided what I  needed 
> to get this running.. at least with scali mpi. I learned  yesterday that
 > Mvapich2 was not ready on the cluster yet.. Therefor I  stick with scampi
 > for now.. So, PMEMD now runs over the infinband with  scampi and intel
 > compilers on the opteron cluster. Maybe this can be  usefull for someone
 > else as well, and I therefore post the  configuration file below. It
 > serves as a complement to the following  post:
 > http://structbio.vanderbilt.edu/archives/amber-archive/2007/1569.php
 >
 > As Bob suggested I provide the JAC benchmark that ships with pmemd:
 >
 > With infiniband:
 > #CPU - ps/day - scaling %
 > 2 - 587 - 100
 > 4 - 1093 - 93
 > 8 - 1878 - 79
 > 12 - 2541 - 72
 > 16 - 2979 - 63
 > 20 - 3756 - 63
 > 24 - 4320 - 31
 > 28 - 4800 - 58
 > 32 - 5082 - 54
 >
 > Over GB ethernet (obviously very unstable):
 > #CPU - ps/day - scaling %
 > 2 - 587 - 100
 > 4 - 1107 - 94
 > 8 - 1694 - 72
 > 12 - 2009 - 57
 > 16 - 1093 - 23
 > 20 - 970 - 16
 > 24 - 2215 - 31
 > 28 - 2880 - 35
 > 32 - 939 - 10
 >
 > ### config.h for linux64_opteron, ifort, scampi, bintraj ###
 > MATH_DEFINES =
 > MATH_LIBS =
 > IFORT_RPATH =
 > /site/intel/fce/9.1/lib:/site/intel/cce/9.1/lib:/opt/scali/lib64:/opt/scali/lib:/opt/gridengine/lib/lx26-amd64:/site/pathscale/lib/3.0/32:/sit
 > e/pathscale/lib/3.0:/opt/gridengine/lib/lx26-amd64:/opt/globus/lib:/opt/lam/gnu/lib
 > MATH_DEFINES = -DMKL
 > MATH_LIBS = -L/site/intel/cmkl/8.1/lib/em64t -lmkl_em64t -lpthread
 > FFT_DEFINES = -DPUBFFT
 > FFT_INCLUDE =
 > FFT_LIBS =
 > NETCDF_HOME = /site/NetCDF
 > NETCDF_DEFINES = -DBINTRAJ
 > NETCDF_MOD = netcdf.mod
 > NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
 > DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
 > CPP = /lib/cpp
 > CPPFLAGS = -traditional -P
 > F90_DEFINES = -DFFTLOADBAL_2PROC
 >
 > F90 = ifort
 > MODULE_SUFFIX = mod
 > F90FLAGS = -c -auto
 > F90_OPT_DBG = -g -traceback
 > F90_OPT_LO =  -tpp7 -O0
 > F90_OPT_MED = -tpp7 -O2
 > F90_OPT_HI =  -tpp7 -xW -ip -O3
 > F90_OPT_DFLT =  $(F90_OPT_HI)
 >
 > CC = gcc
 > CFLAGS =
 >
 > LOAD = ifort
 > LOADFLAGS =
 > LOADLIBS = -limf -lsvml -Wl,-rpath=$(IFORT_RPATH)
 >
 > MPI_HOME = /opt/scali
 > MPI_DEFINES = -DMPI
 > MPI_INCLUDE = -I$(MPI_HOME)/include64
 > MPI_LIBDIR = $(MPI_HOME)/lib64
 > MPI_LIBS = -L$(MPI_LIBDIR) -lmpi -lfmpi
 > #####
 >
 >
 > On 10/7/07, Robert Duke <rduke_at_email.unc.edu> wrote:
 >
 >     Hi Lars,
 >     Okay, a library you are specifying in the link line is not being
 > found where
 >     you said it was by the linker.  So you need to be sure that 1)  you
 > actually
 >     are linking to the files needed by the current version of mvapich, and
 > 2)
 >     your have that location as the value for MPI_LIBDIR2.  You can  get
 > that info
 >     for your mvapich by doing the following:
 >     1) first enter the command 'which mpif77' to see where the mpif77
 > command
 >     currently in your path is.  If that looks like a likely location for
 > an
 >     mvapich install, move to step 2; if not, you may want to talk to
 > whoever
 >     installed mpi s/w on the machine (probably really want to do this
 > anyway).
 >     2) Once you are sure you have the right mpif77, enter a
 > mpif77  -link_info'.
 >     This should give you the location and names of all the library files
 > you
 >     need for mvapich, as installed on your machine, to run.
 >     I cover this, and a variety of other useful issues in the README  file
 > under
 >     amber9/src/pmemd (this stuff is specifically in the section  entitled
 > "Beware
 >     of Nonstandard MPI Library Installations".  The problem we  encounter
 > here is
 >     that the other products we link to are not configuration-static,  and
 > library
 >     requirements may change, or in many instances, the folks that
 > installed mpi
 >     either did not put it in the recommended location, and/or worse,
 > actually
 >     changed the names of things to get around problems with multiple
 > things with
 >     the same name (the classic case - changing the mpi compile script
 > names to
 >     specify the compiler in use).  In an ideal world, I  would do more
 >     autoconfigure.  In the real world, pmemd runs on everything from  the
 > biggest
 >     supercomputers around down to single cpu workstations (unlike a number
 > of
 >     the other amber programs, I don't attempt to also do windows  laptops,
 > enough
 >     is enough), and a lot of these configurations are nonstandard,  and
 > there are
 >     even big headaches with the compute nodes being configured
 > differently than
 >     the compile nodes.  So the bottom line is that you know nothing about
 > the
 >     hardware and system software, then the probability pmemd will install
 >     correctly and run well is small in any case (ie., I want the  installs
 > being
 >     done by folks who do know the machines).
 >
 >     One final problem.  Once you have the right mpi implementation,  it
 > needs to
 >     have been built with knowledge about which fortran compiler will  be
 > used for
 >     building your application.  This is the case because the name
 > mangling done
 >     by different fortran compilers is different (and even  configurable),
 > so when
 >     the compiler encounters a statement like 'call mpi_foo()' in the code,
 > it
 >     may assume it is actually supposed to go looking for mpi_foo, or maybe
 >     mpi_foo_, or mpi_foo__, or several other patterns that are generated
 > by
 >     different name mangling schemes (time to read the mpi  documentation
 > and the
 >     fortran compiler documentation).
 >
 >     All these issues have been discussed by both Ross and myself at  some
 > length
 >     on the amber.scripps.edu webpage.  Ross is probably more up-to-date on
 >     current changes; I typically am evolving the pmemd algorithms and
 > doing
 >     other things that relate to electrostatics methods, so I  generally
 > only look
 >     in detail at how everything has changed in the software and hardware
 >     environments in the six months or so in front of an amber release.
 >
 >     Hope this helps you to get the job done, and also helps to  explain
 > why it is
 >     not drop-dead easy.  Hopefully Russ Brown will have some specific info
 > for
 >     you about Sun machines, and your sys admin person can make sure that
 > you
 >     have the best mpi implementation for the job and it is configured  to
 > use the
 >     correct interface (ie., infiniband between the multicore nodes, and
 > shared
 >     memory between the cores themselves).
 >
 >     Best Regards - Bob Duke
 >
 >     ----- Original Message -----
 >     From: <Lars.Skjarven_at_biomed.uib.no>
 >     To: <amber_at_scripps.edu>
 >     Sent: Sunday, October 07, 2007 6:40 AM
 >     Subject: AMBER: PMEMD configuration and scaling
 >
 >
 >     > Bob, Ross, Thank you for your helpful replies. I will definitively
 > get
 >     > back here with the jac benchmark results as Bob propose. This is
 > amber9
 >     > yes..  Whether or not the scali mpi is setup to use the  infiniband
 > or
 >     > not, I have no idea, and will definitively check that  with the tech
 > on
 >     > Monday.
 >     >
 >     > After your reply yesterday I used the day to try and compile it with
 >     > ifort and mvapich2 as you suggest. However, it results in the
 > following
 >     > error:
 >     >
 >     > IPO link: can not find -lmtl_common
 >     > ifort: error: problem during multi-file optimization  compilation
 > (code 1)
 >     > make[1]: *** [pmemd] Error 1
 >     >
 >     > From the config.h file, the following is defined which may cause
 > some
 >     > trouble?
 >     > MPI_LIBS
 >     >
 >  -L$(MPI_LIBDIR) -lmpich -L$(MPI_LIBDIR2) -lmtl_common  -lvapi  -lmosal -lmpga
 >     >  -lpthread
 >     >
 >     > Using
 >     > - "Intel ifort compiler found; version information: Version 9.1"
 >     > - Intel MKL (under /site/intel/cmkl/8.1)
 >     > - NetCDF
 >     > - mvapich2 (/site/mvapich2)
 >     > - Inifinband libraries (/usr/lib64/infiniband)
 >     >
 >     > Hoping you see anything that can help me out.. Thanks again..
 >     >
 >     > Lars
 >     >
 >     > On 10/6/07, Ross Walker < ross_at_rosswalker.co.uk> wrote:
 >     >
 >     >     Hi Lars,
 >     >
 >     >     I have never used scali MPI - first question - are you certain
 > it is
 >     > setup
 >     >     to use the infiniband interconnect and not going over gigabit
 >     > ethernet? -
 >     >     Those numbers look to me like it's going over ethernet.
 >     >
 >     >     For infiniband I would recommend using MVAPICH / MVAPICH2 or
 > VMI2 -
 >     > both
 >     >     compiled using the Intel compiler (yes I know they are Opteron
 > chips
 >     > but
 >     >     surprise surprise the Intel compiler produces the fastest code
 > on
 >     > opterons
 >     >     in my experience) and then compile PMEMD with the same compiler.
 >     >
 >     >     Make sure you run the MPI benchmarks with the mpi installation
 > and
 >     > check
 >     >     that you are getting ping-pong and random-ring latencies and
 >     > bandwidths that
 >     >     match the specs of the infiniband - All to All tests etc will
 > also
 >     > check you
 >     >     don't have a flakey cable connection which can be common with
 >     > infiniband.
 >     >
 >     >     Good luck.
 >     >     Ross
 >     >
 >     >     /\
 >     >     \/
 >     >     |\oss Walker
 >     >
 >     >     | HPC Consultant and Staff Scientist |
 >     >     | San Diego Supercomputer Center |
 >     >     | Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
 >     >     | http://www.rosswalker.co.uk | PGP Key available on request |
 >     >
 >     >     Note: Electronic Mail is not secure, has no guarantee of
 > delivery, may
 >     > not
 >     >     be read every day, and should not be used for urgent or
 > sensitive
 >     > issues.
 >     >
 >     >     > -----Original Message-----
 >     >     > From: owner-amber_at_scripps.edu
 >     >     > [mailto: owner-amber_at_scripps.edu] On Behalf Of
 >     >     > Lars.Skjarven_at_biomed.uib.no
 >     >     > Sent: Saturday, October 06, 2007 04:35
 >     >     > To: amber_at_scripps.edu
 >     >     > Subject: AMBER: PMEMD configuration and scaling
 >     >     >
 >     >     >
 >     >     > Dear Amber Users,
 >     >     >
 >     >     > We recently got access to a cluster consisting of Opteron
 >     >     > dual-cpu-dual-core (4
 >     >     > cores) SUN nodes with InfiniBand interconnects. After what I
 >     >     > have read about
 >     >     > pmemd and scaling, this hardware should be good enough to
 >     >     > achieve relatively
 >     >     > good scaling up to at least 16-32 cpu's (correct?). However,
 >     >     > my small benchmark
 >     >     > test yields a peak at 8 cpu's (two nodes):
 >     >     >
 >     >     > 2 cpus: 85 ps/day - 100%
 >     >     > 4 cpus: 140 ps/day - 81%
 >     >     > 8 cpus: 215 ps/day - 62%
 >     >     > 12 cpus: 164 ps/day - 31%
 >     >     > 16 cpus: 166 ps/day - 24%
 >     >     > 32 cpus: 111 ps/day - 8%
 >     >     >
 >     >     > This test is done using 400.000 atoms and with a simulation of
 > 20
 >     > ps.
 >     >     >
 >     >     > Is it possible that our configuration of pmemd can cause this
 >     >     > problem? If so, do
 >     >     > you see any apparent flaws in the config.h file below?
 >     >     >
 >     >     > In the config.h below we use ScaliMPI and ifort (./configure
 >     >     > linux64_opteron
 >     >     > ifort mpi). We also have pathscale and portland as available
 >     >     > compilers. however,
 >     >     > I never managed to build pmemd using these..
 >     >     >
 >     >     > Any hints and tips will be highly appreciated.
 >     >     >
 >     >     > Best regards,
 >     >     > Lars Skjærven
 >     >     > University of Bergen, Norway
 >     >     >
 >     >     > ## config.h file ##
 >     >     > MATH_DEFINES =
 >     >     > MATH_LIBS =
 >     >     > IFORT_RPATH =
 >     >     > /site/intel/fce/9.1/lib:/site/intel/cce/9.1/lib:/opt/scali/lib
 >     >     > 64:/opt/scali/lib:/opt/gridengine/lib/lx26-amd64:/site/pathsca
 >     >     le/lib/3.0/32:/site/pathscale/lib/3.0:/op
 >     >     > t/gridengine/lib/lx26-amd64:/opt/globus/lib:/opt/lam/gnu/lib
 >     >     > MATH_DEFINES = -DMKL
 >     >     > MATH_LIBS
 > = -L/site/intel/cmkl/8.1/lib/em64t -lmkl_em64t -lpthread
 >     >     > FFT_DEFINES = -DPUBFFT
 >     >     > FFT_INCLUDE =
 >     >     > FFT_LIBS =
 >     >     > NETCDF_HOME = /site/NetCDF
 >     >     > NETCDF_DEFINES = -DBINTRAJ
 >     >     > NETCDF_MOD = netcdf.mod
 >     >     > NETCDF_LIBS = $(NETCDF_HOME)/lib/libnetcdf.a
 >     >     > DIRFRC_DEFINES = -DDIRFRC_EFS -DDIRFRC_NOVEC
 >     >     > CPP = /lib/cpp
 >     >     > CPPFLAGS = -traditional -P
 >     >     > F90_DEFINES = -DFFTLOADBAL_2PROC
 >     >     >
 >     >     > F90 = ifort
 >     >     > MODULE_SUFFIX = mod
 >     >     > F90FLAGS = -c -auto
 >     >     > F90_OPT_DBG = -g -traceback
 >     >     > F90_OPT_LO =  -tpp7 -O0
 >     >     > F90_OPT_MED = -tpp7 -O2
 >     >     > F90_OPT_HI =  -tpp7 -xW -ip -O3
 >     >     > F90_OPT_DFLT =  $(F90_OPT_HI)
 >     >     >
 >     >     > CC = gcc
 >     >     > CFLAGS =
 >     >     >
 >     >     > LOAD = ifort
 >     >     > LOADFLAGS = -L/opt/scali/lib64 -lmpi -lfmpi
 >     >     > LOADLIBS = -limf -lsvml -Wl,-rpath=$(IFORT_RPATH)
 >     >     > ## config.h ends ##
 >     >     >
 >     >     >
 >     >     >
 >     >     >
 >     >     > --------------------------------------------------------------
 >     >     > ---------
 >     >     > The AMBER Mail Reflector
 >     >     > To post, send mail to amber_at_scripps.edu
 >     >     > To unsubscribe, send "unsubscribe amber" to
 > majordomo_at_scripps.edu
 >     >     >
 >     >
 >     >
 >
 >       -----------------------------------------------------------------------
 >     >     The AMBER Mail Reflector
 >     >     To post, send mail to amber_at_scripps.edu
 >     >     To unsubscribe, send "unsubscribe amber" to
 > majordomo_at_scripps.edu
 >     >
 >     >
 >     >
 >     > -----------------------------------------------------------------------
 >     > The AMBER Mail Reflector
 >     > To post, send mail to amber_at_scripps.edu
 >     > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 >     >
 >
 >
 >     -----------------------------------------------------------------------
 >     The AMBER Mail Reflector
 >     To post, send mail to amber_at_scripps.edu
 >     To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 >
 >
 >
 > -----------------------------------------------------------------------
 > The AMBER Mail Reflector
 > To post, send mail to amber_at_scripps.edu
 > To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 >
 
 -----------------------------------------------------------------------
The AMBER Mail Reflector
 To post, send mail to amber_at_scripps.edu
 To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
 
 |