| AMBER Archive (2004)Subject: RE: AMBER: Questions about MD sampling
From: Ross Walker (ross_at_rosswalker.co.uk)Date: Tue Oct 12 2004 - 12:25:30 CDT
 
 
 
 
Dear Xin,
 >I do not know why the sander job crashed (Is there a way to check?).
 >I got a kind of error message "pipe broken". (I found it frequently
 happened
 >after the job runing up to three days). Seems something wrong with the
 parallel.
 >I run the sander on a linux cluster with 16 dual-nodes (AMD1.6GHz). My MD
 >system includes protein (580 residues) and up to 25000 water.
 
 Are you using a queing system like pbs? If you are then it is possible that
 your job is being killed by the queue system after it has reached a
 specified cpu time. If this is the case check with whoever set up your
 cluster and see if they can increase the time a job is allowed to run for.
 Alternatively you could split your job up into sections that are less than
 the queue time limit and then queue the next one as each finishes.
 
 You might also be hitting a hard disk quota limit on your account which is
 preventing any further disk writes.
 
 Another thing to check is whether you are hitting the 'infamous' 2GB file
 limit. This plagues 32bit systems such as AMD's and Pentiums. Although not
 Opterons or Itaniums (these are 64 bit). Essentially if any of your output
 files reaches 2GB in size any subsequent write will fail - this may manifest
 itself in the broken pipe error you are seeing. Check your mdcrd file to see
 if it is almost 2GB in size when your job fails. If it is then you will have
 to either write to your mdcrd file less frequently or alternatively split
 your job up into chunks each writing to a seperate mdcrd file. Note: It is
 possible to compile a version of sander that supports large files on 32 bit
 architecture but it is significantly mroe involved than simply splitting a
 job into bits.
 
 The mdcrd file size will be roughly:
 
 (8N+24)*S  bytes
 
 Where N is the number of atoms in your system and S is the number of frames
 in the mdcrd file (nstep/ntwx).
 
 Note, the mdcrd files are ascii format and so compress very well (6 to 7
 times) using something like gzip. Thus before you move on to the next
 segment of a job you probably want to compress the previous job's mdcrd file
 to save space.
 
 >The 2 ns of MD would take almost 2 weeks in general (without competition).
 Is it normal?
 > (I am using " mpirun -np 32 $AMBERHOME/exe/sander .....). I feel it is
 kind of slow.
 > Maybe something wrong with the parallel setting, or maybe I need to find
 an optimzed
 > number of processors (I heard NOT the more the fast)?
 
 This is a question for which there are many many many answers... The
 performance in parallel can depend on a large number of issues. The speed of
 the individual CPUS, the interconnect speed, the amount memory available on
 each node, the mpi implementation, the network buffer sizes, the size of
 your system, the options you have selected, the size of your cut off etc
 etc... The scaling can also differ from cluster to cluster since slower
 cpu's will actually probably scale better than fast cpu's since they put
 less strain on a slow interconnect.
 
 The best option would be to first of all try running the JAC benchmark on 1
 cpu to see how your system compares to ours:
 http://amber.scripps.edu/amber8.bench1.html. This will tell you if sander is
 running as it should. Then I would try running your system first on 1 cpu
 (just 500 steps will do) and then on 2, 4, 8, 16, 32 and see how the timings
 compare. If you have a slow interconnect (e.g gigabit ethernet) then you may
 find that your calculation actually tops out at around 8 or 16 cpus and that
 going to 32 actually causes the calculation to slow down, since the code
 spends all it's time communicating and not much time actually calculating.
 There are no hard and fast rules here, the best thing you can do is to try
 it out and see what the optimal value is.
 
 All the best
 Ross
 
 /\
 \/
 |\oss Walker
 
 | Department of Molecular Biology TPC15 |
| The Scripps Research Institute |
 | Tel:- +1 858 784 8889 | EMail:- ross_at_rosswalker.co.uk |
 | http://www.rosswalker.co.uk/ | PGP Key available on request |
 
  
  
 -----------------------------------------------------------------------
The AMBER Mail Reflector
 To post, send mail to amber_at_scripps.edu
 To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu
 
 
 
 |