AMBER Archive (2006)

Subject: AMBER: AMBER parallel run bombs

From: Rahaman, Asif (asif-rahaman_at_uiowa.edu)
Date: Mon Dec 04 2006 - 18:29:53 CST


Dear All,
 
I am trying to run MPI (parallel) version of amber. I have a total of 9600 atoms in the system and I use PBC. I am getting the following error when I try to run:
_____________________________________________________________
running /usr/local/amber9/exe/sander.MPI on 4 LINUX ch_gm processors
Program binary is: /usr/local/amber9/exe/sander.MPI
Machines file is /home/rasif/fast_verylarge/mach
Shared memory for intra-nodes coms is enabled.
gm receive mode used: polling.
4 processes will be spawned:
        Process 0 (/usr/local/amber9/exe/sander.MPI "-O" "-i" "md.inp" "-p" "new1.top" "-c" "new1.crd" "-o" "md.out" "-r" "md.rst" "-x" "md.crd" ) on node001
        Process 1 (/usr/local/amber9/exe/sander.MPI "-O" "-i" "md.inp" "-p" "new1.top" "-c" "new1.crd" "-o" "md.out" "-r" "md.rst" "-x" "md.crd" ) on node001
        Process 2 (/usr/local/amber9/exe/sander.MPI "-O" "-i" "md.inp" "-p" "new1.top" "-c" "new1.crd" "-o" "md.out" "-r" "md.rst" "-x" "md.crd" ) on node001
        Process 3 (/usr/local/amber9/exe/sander.MPI "-O" "-i" "md.inp" "-p" "new1.top" "-c" "new1.crd" "-o" "md.out" "-r" "md.rst" "-x" "md.crd" ) on node001
Open a socket on head...
Got a first socket opened on port 44225.
Shared memory file: /tmp/gmpi_shmem-3027994:[0-9]*.tmp
MPI Id 0 is using gm port 2, board 0 (MAC 0060dd47b81f).
MPI Id 1 is using gm port 4, board 0 (MAC 0060dd47b81f).
MPI Id 3 is using gm port 5, board 0 (MAC 0060dd47b81f).
MPI Id 2 is using gm port 6, board 0 (MAC 0060dd47b81f).
Received data from all 4 MPI processes.
Sending mapping to MPI Id 0.
Sending mapping to MPI Id 1.
Sending mapping to MPI Id 2.
Sending mapping to MPI Id 3.
Data sent to all processes.
Received valid abort message !
Reap remote processes:
 * NB pairs 154 799894 exceeds capacity ( 800000) 2
     SIZE OF NONBOND LIST = 800000
 SANDER BOMB in subroutine nonbond_list
 Non bond list overflow!
 check MAXPR in locmem.f
--------------------------------------------------------------
 
As you can see I am trying to do a 4 processor job. and seems that in sander the nonbonded list is overflowing. I have changed the MAXINT in /src/anal/sizes.h. There is not any sizes.h file in /src/sander and I also could not locate any place where I can assign the value for MAXPR in locmem.f. It seems to me that in locmem.f MAXPR is asigned or calculated as [# of atoms (9600)*(cut +scnb)**3/3 = 3200000] and for four processor it will 800000.
 
Could anybody please let me know what should I do to make the parallel version work or what to do???
Or do I need to change the NONBOND list? If so what do I need to change and where in sander I should make the change?
 
Thank you in advance.
 
With best regards, Asif
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" to majordomo_at_scripps.edu