AMBER Archive (2003)

Subject: Re: AMBER: tru64 alpha

From: Robert Duke (rduke_at_email.unc.edu)
Date: Thu Oct 16 2003 - 20:27:27 CDT


Mu -
One additional point. Look at the nonsetup cpu secs and the nonsetup
wallclock secs at the bottom of the mdout file. If the wallclock time is
more than about 5% larger than the cpu time, this indicates that the
interconnect is not working well. On the blades, you will start seeing this
sort of thing as you go past 8 nodes, or if the gigabit ethernet is doing
other things, or if there are other jobs running on the nodes you are using
(you must have exclusive use of each node, and must be sure that your job
startup commands start up 1 and only 1 process per node). You will also see
this sort of thing if the interconnect hardware is flakey and dropping
packets or otherwise having to do resends due to noise, etc. (our myrinet
stuff has recently been hopeless due to bad hardware, which are systems
people are trying to get a handle on. One symptom is that wall clock times
go way up while cpu times remain relatively reasonable). Getting the
parallel hardware to work right can be a real pain. I have recently gotten
real fond of big iron at major centers that are well managed (things like
ibm sp4's, hp alphaservers, and sgi's at supercomputer centers where they
worry every minute about what the interconnect is doing).
Regards - Bob
----- Original Message -----
From: "Mu Yuguang (Dr)" <YGMu_at_ntu.edu.sg>
To: <amber_at_scripps.edu>
Sent: Thursday, October 16, 2003 8:40 PM
Subject: RE: AMBER: tru64 alpha

> Thanks David, Bill and Rob for your helpful reply.
> Now I try to complie PMEMD with little changed machine file, using
> mpif90 and mpicc, and then submit with corresponding mpirun.
> It works well in one node with 4 cpus with scaling up to 92%, but the
> scaling drops to 25% using 2 nodes with 8 cpus.
> My system is 18er duplex DNA with total 56999 atoms using PME.
>
> The inter-node connections should be a little better than Myrinet, and
> here the MPI is mpich-1.2.5.
> I am not sure that the scaling failure is due to the mpich or something
> else.
>
>
> -----Original Message-----
> From: Bill Ross [mailto:ross_at_cgl.ucsf.edu]
> Sent: Wednesday, October 15, 2003 10:39 PM
> To: amber_at_scripps.edu
> Subject: RE: AMBER: tru64 alpha
>
> > FATAL dynamic memory allocation error in subroutine alloc_ew_dat_mem
> > Could not allocate ipairs array!
>
> In unix,
>
> % man ulimit
>
> Bill Ross
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>
>
>
> -----------------------------------------------------------------------
> The AMBER Mail Reflector
> To post, send mail to amber_at_scripps.edu
> To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
>
>

-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu