AMBER Archive (2008)Subject: Fw: AMBER: Non bond list error
From: Robert Duke (rduke_at_email.unc.edu)
Date: Fri Oct 24 2008 - 08:20:29 CDT
----- Original Message -----
From: Robert Duke
To: neville_forlemu_at_yahoo.com
Sent: Friday, October 24, 2008 9:07 AM
Subject: Re: AMBER: Non bond list error
Hi Neville,
Okay, I am responding to this one to decode the error msg, but have read the later mails too.
FIRST, TO THE WHOLE LIST:
PLEASE, PLEASE, PLEASE, DO NOT DINK WITH THE PARTICLE MESH EWALD PARAMETERS UNLESS YOU KNOW WHAT YOU ARE DOING!!! (sorry for the drama, but this is a hot button for me).
This is different than the issue Ross will raise in a later mail about not messing with forcefield params unless you know what you are doing - that is bad too, but different.
Anyway, in a later email Neville shows he used a cutoff of 12 for the direct force calcs in a pme calc (cut = 12 in &cntrl). Okay, one MIGHT think that this will improve accuracy for the pme calc, but two points: 1) you have to consider the settings of other things like dsum_tol, fft gridsize (nfft1,2,3) and order before you know that it will actually help, and 2) you are increasing the cost of the direct force computation (which is probably in the neighborhood of 50-60% of the total cost) by a factor of 3.75 here!!! (12**3/8**3, for pair df calc, 14**3/10**3 for nonbond pairlist build). THIS IS A BAD THING TO DO WITHOUT GOOD REASONS (maybe you need to justify more computers in the lab? ;-)) Now, all I can figure out for why somebody might be doing this is they are coming over from namd, and at least in the past, they have used a default cut of 12. BUT they were also using a respa calc which only does the reciprocal force fft calc every 4 steps, or some such. We DON'T use respa by default (or recommend it), and a long cutoff is a compensation for "near force" contamination by "longrange (reciprocal space) force" in the respa algorithm (in semi-English, the reason namd would use a longer cutoff is that they need to do it to keep from totally whacking the accuracy of a respa calc, and they use (or at least have a preferred benchmark of) a respa calc because it is how they run fastest (this all may have recently changed, but was the case a year or two ago I believe). So I have seen two users with 12 angstrom cutoffs in a sander pme calc; there is little good reason to do this. Some folks, perhaps a lot, DO increase the cutoff to 9 in order to get better precision in the vdw term, and that may be justified (I am a little skeptical; the electrostatics dominate and the vdw error is pretty small at the default of 8). So using a 12 angstrom value for cut would improve vdw accuracy even further, but it is probably of no real value to do this. IF you do want to do something like this, you should use pmemd, which has the option to set the vdw cutoff to a larger value than the electrostatics cutoff (use vdw_cutoff and es_cutoff in &cntrl instead of cut, and set vdw_cutoff > es_cutoff; some folks use 9.0 for vdw_cutoff and 7.0 or 8.0 for es_cutoff - if you use the lower es_cutoff, you may want to look at the pme error numbers just to be sure you are happy - ie, while you can speed things up maybe 10% doing this, I have not done extensive analysis of relative errors - off the cuff, the impact is small, but buyer beware any time you don't use the defaults).
Okay, back to the sander list bomb. What this error message says (and it needs recrafting) is that you attempted to add 7104 pairs to a list that was already at 103754005 and that is bigger than the maximum allowed listsize for this calc of 103754298; the error occurred in process 0, the master. Sander unfortunately is not smart enough to resize the list if the number of pairs exceeds the current maximum. PMEMD on the other hand, is smart enough to do this, and the only way you should get into problems with pmemd is if you use a very small processor count, machines with small memories, and a ridiculously large cutoff (and then you will die with an error about there not being enough memory on the machine). So why would the pair count be above the estimate of the max? Well, a higher density than expected system seems the most likely reason, though really bad luck or a bug in the code is also possible. I would use the default cut, as Ross suggests later, and I would also consider using pmemd for my routine calcs of this sort. It is over twice as fast, and handles these sort of error conditions in a much more friendly fashion, managing to keep running through little trials and tribulations like this (I used to do systems level work, and the ability to resize data structures based on demand is sort of critical there).
Best Regards - Bob Duke
----- Original Message -----
From: neville forlemu
To: amber_at_scripps.edu ; rduke_at_email.unc.edu
Sent: Friday, October 24, 2008 12:48 AM
Subject: Re: AMBER: Non bond list error
Hello,
Could some one explain to me what this error means
* NB pairs 7104 103754005 exceeds capacity ( 103754298) 0
SIZE OF NONBOND LIST = 103754298
SANDER BOMB in subroutine nonbond_list
Non bond list overflow!
check MAXPR in locmem.f
I am trying to run sander for energy minimization, but keep running into this problem.
Thanks
--- On Thu, 10/23/08, Robert Duke <rduke_at_email.unc.edu> wrote:
From: Robert Duke <rduke_at_email.unc.edu>
Subject: Re: AMBER: Non bond list error
To: amber_at_scripps.edu
Date: Thursday, October 23, 2008, 3:44 PM
Actually, I noticed I said "you overflowed the counter", and then showthatyou didn't... (oh, oops). So it is memory corruption. What I don'tunderstand is why you are not dying with some sort of "out of memory"errorfrom sander, associated with asking for more memory than is available. Forpmemd, anywhere I allocate dynamic memory, I check for a success returncode, so the way you should experience running out of memory there is to getan explicit error message. Because sander has a preallocated memory poolstrategy, I suspect that other things are possible... Bottom line on allthis - I think it is a good idea to not run more than roughly 100,000 atomson a single processor, especially for sander. And if you run it on 4processors but they all share the same limited physical memory, you may alsohit trouble. I attached a graphic on pmemd memory requirements - a jpg soit should be
widely viewable. My rule of thumb for pmemd is that 4processors, each with 1 GB of actual physical memory, can handle up to 1million atoms with the default 8 angstrom cutoff. Sander will take more.There are also buffer space considerations in an mpi application (within mpiitself, not in the app), that further muddy the waters, but following thisguideline you should be safe.Regards - Bob----- Original Message ----- From: "Robert Duke" <rduke_at_email.unc.edu>To: <amber_at_scripps.edu>Sent: Thursday, October 23, 2008 4:14 PMSubject: Re: AMBER: Non bond list error> As Ross will tell you too:> 1) Don't increase cut to 12, leave it at the default (of 8)> 2) Run this on at least 4 processors using the MPI version of pmemd or > sander (I know you are using sander here; pmemd requires less memory). > Even higher processor counts will reduce your risk of memory
overflow > further. Your pairlist went negative because you incremented it past a 31 > bit digit; with the commonly used integer format on computers these days > (twos-complement), this results in a negative number (and is clearly an > error condition). Is this memory usage reasonable for the size problem > you have? Well, that cutoff plus skin will produce about 552 pairs per > atom. If you had 1,000,000 atoms (and you are close), that would be > 552,000,000 pairs. Not enough to overflow the list counter. BUT that is > 552,000,000 pairs * 4 bytes per integer, means 2 GB in the nonbonded list > alone. Most machines, you are pushing it to get much over 1.5 GB for the > application (I have not looked recently, so that is off the top of my > head). With true 32 bit executables, you are out of address space; with > the newer 64 bit chips, you have bits to specify more
than 2 GB of > addresses, but you may not have enough actual memory. And remember that > the pairlist is only part of your memory consumption. No resource is > infinite on a computer...> Regards - Bob Duke> ----- Original Message ----- > From: "Wang,Ying" <wangying_at_ufl.edu>> To: <amber_at_scripps.edu>> Sent: Thursday, October 23, 2008 3:26 PM> Subject: RE: AMBER: Non bond list error>>>> Hi, Ross,>>>> Thanks a lot!>>>> My input file is as below:>> 50ps MD with res>> &cntrl>> imin = 0,>> irest = 0,>> ntx = 1,>> ntb = 1,>> cut = 12,>> ntr = 1,>> ntc = 2,>> ntf = 2,>> tempi = 0.0,>> temp0 = 300.0,>> ntt = 3,>> gamma_ln = 2.0,>>
nstlim = 50000, dt = 0.001>> ntpr = 1000, ntwx = 1000, ntwr = 1000>> nmropt=1>> />> &wt TYPE='TEMP0', istep1=0, istep2=50000,>> value1=0.1, value2=300.0, />> &wt TYPE='END' />> Keep system fixed with weak restraints>> 20.0>> RES 1 5076>> END>> END>>>> and the NPT is as below:>>>> NPT: 50ps MD>> &cntrl>> imin = 0, irest = 1, ntx = 7,>> ntb = 2, pres0 = 1.0, ntp = 1,>> taup = 2.0,>> cut = 12, ntr = 1,>> ntc = 2, ntf = 2,>> tempi = 300.0, temp0 = 300.0,>> ntt = 3, gamma_ln = 2.0,>> nstlim = 50000, dt = 0.001,>> ntpr = 1000, ntwx = 1000, ntwr = 1000>> />> Keep fixed with weak restraints>> 20.0>> RES 217 954>> END>> Keep fixed
with weak restraints>> 20.0>> RES 1909 2646>> END>> Keep fixed with weak restraints>> 20.0>> RES 3601 4338>> END>> res also>> 5.0>> RES 955 1692>> END>> res also>> 5.0>> RES 2647 3384>> END>> res also>> 5.0>> RES 4339 5076>> END>> END>>>>>>>> Thanks again!!!!!!!!!!!!!!!>>>>>>>> On Thu Oct 23 14:38:09 EDT 2008, Ross Walker<ross_at_rosswalker.co.uk> >> wrote:>>>>> Hi Wang,>>>>>> 800K atoms is pretty large and while sander / pmemd should supportthis >>> size>>> (I think 999,999 is the limit right now due to file formatting)you may >>> run>>> into problems
that haven't been seen before.>>>>>> It's not obvious what is going wrong in your case but thenumbers don't >>> make>>> any sense (a negative capacity!) which suggests either memorycorruption>>> through an array overflow or the number of pairs is larger than asigned>>> integer and is overflowing. Even at 800K atoms you shouldn'thave this >>> many>>> pairs though. Can you post your input file so we can take a look?I >>> suspect>>> you have cut set too high or perhaps are not running PME etc.>>>>>> All the best>>> Ross>>>>>>> -----Original Message----->>>> From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu]On >>>> Behalf>>>> Of Wang,Ying>>>> Sent: Thursday,
October 23, 2008 10:20 AM>>>> To: amber_at_scripps.edu>>>> Subject: AMBER: Non bond list error>>>>>>>> Hi, Dear AMBERs,>>>>>>>> I meet a problem when I run a simulation of a system consistof>>>> 799889 atoms.>>>>>>>> * NB pairs 451 0 exceeds capacity (>>>> -28510921) 7>>>> SIZE OF NONBOND LIST = -28510921>>>> SANDER BOMB in subroutine nonbond_list>>>> Non bond list overflow!>>>> check MAXPR in locmem.f>>>>>>>> Could anyone tell me what's happen?>>>>>>>> Thanks a lot!>>>>>>>>----------------------------------------------------------------------->>>> The AMBER Mail
Reflector>>>> To post, send mail to amber_at_scripps.edu>>>> To unsubscribe, send "unsubscribe amber" (in the*body* of the email)>>>> to majordomo_at_scripps.edu>>>>>>----------------------------------------------------------------------->>> The AMBER Mail Reflector>>> To post, send mail to amber_at_scripps.edu>>> To unsubscribe, send "unsubscribe amber" (in the *body*of the email)>>> to majordomo_at_scripps.edu>>>>>>>>>>>>>> -->> Wang,Ying>>>>----------------------------------------------------------------------->> The AMBER Mail Reflector>> To post, send mail to amber_at_scripps.edu>> To unsubscribe, send "unsubscribe amber" (in the *body* ofthe email)>> to
majordomo_at_scripps.edu>>>> -----------------------------------------------------------------------> The AMBER Mail Reflector> To post, send mail to amber_at_scripps.edu> To unsubscribe, send "unsubscribe amber" (in the *body* of theemail)> to majordomo_at_scripps.edu>
--- On Thu, 10/23/08, Robert Duke <rduke_at_email.unc.edu> wrote:
From: Robert Duke <rduke_at_email.unc.edu>
Subject: Re: AMBER: Non bond list error
To: amber_at_scripps.edu
Date: Thursday, October 23, 2008, 3:44 PM
Actually, I noticed I said "you overflowed the counter", and then showthatyou didn't... (oh, oops). So it is memory corruption. What I don'tunderstand is why you are not dying with some sort of "out of memory"errorfrom sander,
associated with asking for more memory than is available. Forpmemd, anywhere I allocate dynamic memory, I check for a success returncode, so the way you should experience running out of memory there is to getan explicit error message. Because sander has a preallocated memory poolstrategy, I suspect that other things are possible... Bottom line on allthis - I think it is a good idea to not run more than roughly 100,000 atomson a single processor, especially for sander. And if you run it on 4processors but they all share the same limited physical memory, you may alsohit trouble. I attached a graphic on pmemd memory requirements - a jpg soit should be widely viewable. My rule of thumb for pmemd is that 4processors, each with 1 GB of actual physical memory, can handle up to 1million atoms with the default 8 angstrom cutoff. Sander will take more.There are also buffer space considerations in an mpi
application (within mpiitself, not in the app), that further muddy the waters, but following thisguideline you should be safe.Regards - Bob----- Original Message ----- From: "Robert Duke" <rduke_at_email.unc.edu>To: <amber_at_scripps.edu>Sent: Thursday, October 23, 2008 4:14 PMSubject: Re: AMBER: Non bond list error> As Ross will tell you too:> 1) Don't increase cut to 12, leave it at the default (of 8)> 2) Run this on at least 4 processors using the MPI version of pmemd or > sander (I know you are using sander here; pmemd requires less memory). > Even higher processor counts will reduce your risk of memory overflow > further. Your pairlist went negative because you incremented it past a 31 > bit digit; with the commonly used integer format on computers these days > (twos-complement), this results in a negative number (and is clearly an >
error condition). Is this memory usage reasonable for the size problem > you have? Well, that cutoff plus skin will produce about 552 pairs per > atom. If you had 1,000,000 atoms (and you are close), that would be > 552,000,000 pairs. Not enough to overflow the list counter. BUT that is > 552,000,000 pairs * 4 bytes per integer, means 2 GB in the nonbonded list > alone. Most machines, you are pushing it to get much over 1.5 GB for the > application (I have not looked recently, so that is off the top of my > head). With true 32 bit executables, you are out of address space; with > the newer 64 bit chips, you have bits to specify more than 2 GB of > addresses, but you may not have enough actual memory. And remember that > the pairlist is only part of your memory consumption. No resource is > infinite on a computer...> Regards - Bob Duke> ----- Original
Message ----- > From: "Wang,Ying" <wangying_at_ufl.edu>> To: <amber_at_scripps.edu>> Sent: Thursday, October 23, 2008 3:26 PM> Subject: RE: AMBER: Non bond list error>>>> Hi, Ross,>>>> Thanks a lot!>>>> My input file is as below:>> 50ps MD with res>> &cntrl>> imin = 0,>> irest = 0,>> ntx = 1,>> ntb = 1,>> cut = 12,>> ntr = 1,>> ntc = 2,>> ntf = 2,>> tempi = 0.0,>> temp0 = 300.0,>> ntt = 3,>> gamma_ln = 2.0,>> nstlim = 50000, dt = 0.001>> ntpr = 1000, ntwx = 1000, ntwr = 1000>> nmropt=1>> />> &wt TYPE='TEMP0', istep1=0, istep2=50000,>> value1=0.1, value2=300.0, />> &wt TYPE='END' />> Keep
system fixed with weak restraints>> 20.0>> RES 1 5076>> END>> END>>>> and the NPT is as below:>>>> NPT: 50ps MD>> &cntrl>> imin = 0, irest = 1, ntx = 7,>> ntb = 2, pres0 = 1.0, ntp = 1,>> taup = 2.0,>> cut = 12, ntr = 1,>> ntc = 2, ntf = 2,>> tempi = 300.0, temp0 = 300.0,>> ntt = 3, gamma_ln = 2.0,>> nstlim = 50000, dt = 0.001,>> ntpr = 1000, ntwx = 1000, ntwr = 1000>> />> Keep fixed with weak restraints>> 20.0>> RES 217 954>> END>> Keep fixed with weak restraints>> 20.0>> RES 1909 2646>> END>> Keep fixed with weak restraints>> 20.0>> RES 3601 4338>> END>> res also>> 5.0>> RES 955 1692>>
END>> res also>> 5.0>> RES 2647 3384>> END>> res also>> 5.0>> RES 4339 5076>> END>> END>>>>>>>> Thanks again!!!!!!!!!!!!!!!>>>>>>>> On Thu Oct 23 14:38:09 EDT 2008, Ross Walker<ross_at_rosswalker.co.uk> >> wrote:>>>>> Hi Wang,>>>>>> 800K atoms is pretty large and while sander / pmemd should supportthis >>> size>>> (I think 999,999 is the limit right now due to file formatting)you may >>> run>>> into problems that haven't been seen before.>>>>>> It's not obvious what is going wrong in your case but thenumbers don't >>> make>>> any sense (a negative capacity!) which suggests either
memorycorruption>>> through an array overflow or the number of pairs is larger than asigned>>> integer and is overflowing. Even at 800K atoms you shouldn'thave this >>> many>>> pairs though. Can you post your input file so we can take a look?I >>> suspect>>> you have cut set too high or perhaps are not running PME etc.>>>>>> All the best>>> Ross>>>>>>> -----Original Message----->>>> From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu]On >>>> Behalf>>>> Of Wang,Ying>>>> Sent: Thursday, October 23, 2008 10:20 AM>>>> To: amber_at_scripps.edu>>>> Subject: AMBER: Non bond list error>>>>>>>> Hi, Dear AMBERs,>>>>>>>> I meet a problem
when I run a simulation of a system consistof>>>> 799889 atoms.>>>>>>>> * NB pairs 451 0 exceeds capacity (>>>> -28510921) 7>>>> SIZE OF NONBOND LIST = -28510921>>>> SANDER BOMB in subroutine nonbond_list>>>> Non bond list overflow!>>>> check MAXPR in locmem.f>>>>>>>> Could anyone tell me what's happen?>>>>>>>> Thanks a lot!>>>>>>>>----------------------------------------------------------------------->>>> The AMBER Mail Reflector>>>> To post, send mail to amber_at_scripps.edu>>>> To unsubscribe, send "unsubscribe amber" (in the*body* of the email)>>>> to
majordomo_at_scripps.edu>>>>>>----------------------------------------------------------------------->>> The AMBER Mail Reflector>>> To post, send mail to amber_at_scripps.edu>>> To unsubscribe, send "unsubscribe amber" (in the *body*of the email)>>> to majordomo_at_scripps.edu>>>>>>>>>>>>>> -->> Wang,Ying>>>>----------------------------------------------------------------------->> The AMBER Mail Reflector>> To post, send mail to amber_at_scripps.edu>> To unsubscribe, send "unsubscribe amber" (in the *body* ofthe email)>> to majordomo_at_scripps.edu>>>> -----------------------------------------------------------------------> The AMBER Mail Reflector> To post, send mail to
amber_at_scripps.edu> To unsubscribe, send "unsubscribe amber" (in the *body* of theemail)> to majordomo_at_scripps.edu>
-----------------------------------------------------------------------
The AMBER Mail Reflector
To post, send mail to amber_at_scripps.edu
To unsubscribe, send "unsubscribe amber" (in the *body* of the email)
to majordomo_at_scripps.edu
|