AMBER Archive (2003)

Subject: AMBER: PMEMD and sander from AMBER6 performances

From: Teletchéa Stéphane (steletch_at_biomedicale.univ-paris5.fr)
Date: Wed Jul 16 2003 - 18:37:34 CDT


Hi !

I've been improving our cluster performance by adding more nodes, and
fortunately last week i got a copy of PMEMD which actually multilied the
performance by 1.75x more !

Nice, but i'm not able to get the same numbers ...

--------------------------------------------------------------------

First, i must say that i have used amber6 compiled with g77-2.96 of
RH7.1, mpich-1.2.5 and PMEMD have been compiled with latest icc/ifc7
from intel, as instructed in PMEMD documentation.

I would be very pleased if you could explain me why i am not able to
reach the same performance, on, as it seems, same configurations ?

--------------------------------------------------------------------
Taken this into consideration i get for example for the IBM blade xeon
2.4Ghz/gigabit -closest to my system- :
JAC benchmark : sander6 pmemd
>From PMEMD : 130ps/day 230ps/day (2.4Ghz xeons from IBM)
Mine : 110ps/day 209ps/day (2.8Ghz xeons from Alineos)

Or my xeon is 2.8, so i should get 130*2.8/2.4=152ps or
230*2.8/2.4=268ps roughly speaking, but not LESS than the 2.4Ghz !

Any explication for this 30/40% drop ?

The same for the athlon : sander6 pmemd
from MICRONPC 1.6 GHz athlon : 62.6ps/day 122ps/day
My athlon (half of the bi 1.2): 37.6ps/day 67ps/day

Again : i should get 62.6*1.2/1.6=47ps or 122*1.2/1.6=91ps.

Any explication for this 25/35% drop ?

Performance increase between sander6 and PMEMD) is as described (from
1.78 to 2.20 faster between the 2 !).

Scalability is poor on my system compared to what is published.

Any hint ?

May be the nfs homes ?
I'm using PBS to handle the jobs, i've tried locally (on the node) to
launch it, but i can get the same results.

All needed parameters are (hopefully) bellow.

I've installed src.pmemd in amber6 tree as indicated, did i miss one
step ?

Sincerely yours,
Stéphane TELETCHEA

--------------------------------------------------------------------

The cluster is gigabit linked (with its own switch), home files are
mounted on each node on a separate nfs fast ethernet network (with its
own switch).

There are 2*4 athlons 1.2Ghz and 2*3 xeons 2.8 Ghz, controlled by one
master (1.2Ghz AMD).

Here are my numbers from the JAC benchmark (input file at the bottom of
the mail) downloaded directly from ftp :

-----------------------------------------------------
-----------------------------------------------------
Relative performance analysis of sander6 vs pmemd
System : DHFR, also known as JAC
23558 atoms - 7182 molecules - Box : 64x64x64 Ang.
1000 steps of dynamices run - time is ps/day.
-----------------------------------------------------
Note that this benchmark uses a 1fs timestep, so
the calculation time is for 1 ns of trajectory.
----------------------------------------------------------------------
| Processor(s) | Clock | SANDER6 | PMEMD* | PMEMD*/sander6 |
----------------------------------------------------------------------
| 1 athlon(s) | 1.2Ghz | 37.6 (1x) | 0 (est. 67)| 1.78x |
| 2 athlon(s) | 1.2Ghz | 68.6 (1.82x) | 122 (1.82x) | 1.78x |
| 4 athlon(s) | 1.2Ghz | 118 (3.14x) | 216 (3.22x) | 1.83x |
| 6 athlon(s) | 1.2Ghz | 153 (4.07x) | 299 (4.46x) | 1.95x |
| 8 athlon(s) | 1.2Ghz | 189 (5.03x) | 365 (5.44x) | 1.93x |
------------------------------------------ PMEMD_p4 [--------------
| 1 xeon(s) | 2.8Ghz | 63.7 (1x) | 0 (est.115)| 1.80x |
| 2 xeon(s) | 2.8Ghz | 110 (1.73x) | 209 (1.82x) | 1.90x |
| 4 xeon(s) | 2.8Ghz | 176 (2.76x) | 348 (3.03x) | 1.98x |
| 6 xeon(s) | 2.8Ghz | 214 (3.36x) | 470 (4.09x) | 2.20x |
----------------------------------------------------------------------

For the whole cluster (no PMEMD_p4) :

| Processor(s) | SANDER6 | PMEMD* | PMEMD | PMEMD/sander6 |
----------------------------------------------------------------
| 14 processors | 280 | 649 | 700 | 2.32x / 2.5x |
speedup/1 athlon : 7.45x [ 9.69x | 10.45x |
speedup/1 xeon : 4.40x [ 5.64x [ 6.09x
-----------------------------------------------------
PMEMD* indicates PMEMD has been compiled with the
option -DSLOW_NONBLOCKING_MPI
 
PMEMD_p4 indicates PMEMD has been compiled specifically for
taking advantage of P4 instructions.

PMEMD indicates PMEMD has been compiled WITHOUT the
option -DSLOW_NONBLOCKING_MPI

-----------------------------------------------------
-----------------------------------------------------

An AMD dual MP2800+ is about 5% slower than od dualxeon2.8Ghz with
intel's compiler:

[root_at_master0 bin]# icid
OS information:
Red Hat Linux release 7.1 (Seawolf)
Kernel 2.4.20 on an i686
glibc-2.2.4-19
 
===========================================================
Support Package IDs for Intel(R) Compilers in
/opt/intel/compiler70/ia32/bin
Please use the following information when submitting customer support
requests.
 
C++ Support Package ID : l_cc_p_7.1.006-NCOM
Fortran Support Package ID: l_fc_p_7.1.008-NCOM
===========================================================
C++ & Fortran License Expiration Date: never expire
C++ & Fortran Support Services Expiration Date: never expire
 
All Installed Compiler Components on this OS:
intel-isubh7-7.1-6: Substitute Headers for Intel(R) C++ Compiler for
                    32-bit applications, Version 7.1
intel-ifc7-7.1-8: Intel(R) Fortran Compiler for 32-bit applications,
                  Version 7.1 Build 20030307Z
intel-icc7-7.1-6: Intel(R) C++ Compiler for 32-bit applications, Version
                  7.1 Build 20030307Z

------------------------------------------------------------------

The input file for JAC, i've just changed the number of steps.

[stephane_at_master0 DM_Tcte300_H2O]$ more dn300K
 short md, nve ensemble
 &cntrl
   ntx=7, irest=1,
   ntc=2, ntf=2, tol=0.0000001,
   nstlim=1000,ntcm=1,nscm=1000,
   ntpr=50, ntwr=10000,
   dt=0.001, vlimit=10.0,
   cut=9.,
   ntt=0, temp0=300.,
 &end
 &ewald
  a=62.23, b=62.23, c=62.23,
  nfft1=64,nfft2=64,nfft3=64,
  skinnb=2.,
 &end

-- 
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
Teletchéa Stéphane - CNRS UMR 8601
Lab. de chimie et biochimie pharmacologiques et toxicologiques
45 rue des Saints-Pères 75270 Paris cedex 06
tél : (33) - 1 42 86 20 86 - fax : (33) - 1 42 86 83 87
mél : steletch_at_biomedicale.univ-paris5.fr
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*

----------------------------------------------------------------------- The AMBER Mail Reflector To post, send mail to amber_at_scripps.edu To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu