|
|||||||||||||||||||||||||||||||||
AMBER Archive (2008)Subject: Re: AMBER: pmemd 10 output
From: Robert Duke (rduke_at_email.unc.edu)
I would consider trying mpich 1 or 2, which I do support, and which have been tested extensively with pmemd (if this is infiniband, then mvapich gives better performance than openmpi I believe, based at least on one set of benchmarks I have seen). Given that this is random, it could be buffer problems somewhere, or heaven only knows, but it is not likely a matter of the combination of the output params. I would also look at pgi and see what, if anything they may have done to fortran i/o, just in case. This could very easily be a subtle "linked to the wrong stuff" problem though, but I am just throwing out wild guesses here. You are basically exploring all the corners of the hardware space here - amd 64, openmpi, pgi - all stuff not routinely tested with pmemd (either due to availability or because given options, I test and recommend use of the stuff that has the best performance and is most reliable). The thing that worries me about all this - it suggests a memory stomp at some point of the file i/o buffer control variables, and I am wondering what else might be getting stomped on.
Thanks Bob for the details
I tested on 2 different machines. Not really different architectures but on different clusters (different generations of AMD64 4cores/node). The problem is reproducible. The time of appearance differs from run to run and it looks random.
Weird is that I have never observed this with pmemd from AMBER 9. Which made me think that might a compilation issue (we compiled with pgi and openmpi) but of course doesnt make to much sense. I could test without ntave (in the AMBER9 runs I did not use ntave). I will also give it a try and modify the mdout_flush_interval. I'll let you know if something changes.
Best
Robert Duke wrote:
Dear Amber users,
Coming back to the pmemd 10 output problem I reported in the thread below, I did test different nodes (writing locally as well as via the network), with iwrap=1 and iwrap=0 and the problem is very reproducible. I get it everytime I run pmemd10 but not sander.MPI 10 or amber9. Attached is a sample of the output. This is very strange.
If anybody is able to explain this, I'd be very grateful for some suggestions (could be a compilation issue). If there was a file system issue, why it doesnt happen with any other executable ?
Best wishes
-----------------input script --------------------
Ross Walker wrote:
This really does look to me like an issue with your file system - I have never seen this from PMEMD myself and I can't see how you would end up with this situation - it looks more to me like you have some kind of malfunctioning raid device or something.
I have seen something similar to this on GPFS parallel file systems where one of the meta data servers had failed such that you only see 4/5 of the striped data for example. This can happen both in read and write mode, I.e. a perfectly good file on disk can be read by the user as being bad because of the striping issues or alternatively if the error occurs during a write then the data can get written to disk with chunks missing.
How reproducible is the problem? Can you try running it and write to a local scratch disk on the master node instead of a network drive (if that is what you were doing) and see if the problem recurs.
All the best
Ross
From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On Behalf Of Vlad Cojocaru
Hi Ross,
Yes, at some point the ---- lines are truncated, the "check COM velocity" phrase overflows the data lines. VOLUME starts not be printed and towards 100000 steps I get lines where "check COM" appears after NSTEP ... and so on .. the output gets really messy.
As for the input, I am well aware of the performance loss by running NVE this way. However this was a test run in which I wanted to follow the pressure of the system. Unfortunately ntp=0 does not allow that.
Best
Ross Walker wrote:
Hi Vlad, I assume you mean the truncated --- lines, missing data and the missingcarriage returns. This looks to me like a file system issue where yourmachine is actually not writing to disk properly. If this is over a NFSmount then I would run some serious stress tests on the system to make surethings are working properly. Also you may want to note that your input file is probably not optimum forperformance. You have: ntp=1, taup=9999999, pres0=1.0, comp=44.6, Which is effectively the same as running constant volume, with ntb=1.However, computationally it still runs NPT which involves much morecommunication. This generally effects parallel scaling, more than lowprocessor count performance. Generally the performance goes as: NVE > NVT > NPT And for thermostats: NTT=0 > NTT=1 >> NTT=3 Hence you are running an NVT calculation but paying the performance penaltyfor a NPT calculation. All the bestRoss -----Original Message-----From: owner-amber_at_scripps.edu [mailto:owner-amber_at_scripps.edu] On BehalfOf Vlad CojocaruSent: Friday, June 27, 2008 8:49 AMTo: AMBER listSubject: AMBER: pmemd 10 output Dear Amber users, The pmemd of AMBER 10 produces some really strange looking output (seeattached, the three dot lines between NSTEP=250 and NSTEP=56500 arethere to indicate that I truncated the output). What is actually strangeis that the output looks fine till NSTEP=57500. Only after that, theoutput is messed up. I haven't noticed this with any previous version of pmemd. Also not withsander.MPI from amber 10. Thanksvlad ------------------------------------------------------------------------------Dr. Vlad Cojocaru EML Research gGmbHSchloss-Wolfsbrunnenweg 3369118 Heidelberg Tel: ++49-6221-533266Fax: ++49-6221-533298 e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de http://projects.villa-bosch.de/mcm/people/cojocaru/ ----------------------------------------------------------------------------EML Research gGmbHAmtgericht Mannheim / HRB 337446Managing Partner: Dr. h.c. Klaus TschiraScientific and Managing Director: Prof. Dr.-Ing. Andreas Reuterhttp://www.eml-r.org---------------------------------------------------------------------------- -----------------------------------------------------------------------The AMBER Mail ReflectorTo post, send mail to amber_at_scripps.eduTo unsubscribe, send "unsubscribe amber" (in the *body* of the email) to majordomo_at_scripps.edu
-- ----------------------------------------------------------------------------Dr. Vlad Cojocaru EML Research gGmbHSchloss-Wolfsbrunnenweg 3369118 Heidelberg Tel: ++49-6221-533266Fax: ++49-6221-533298 e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de http://projects.villa-bosch.de/mcm/people/cojocaru/ ----------------------------------------------------------------------------EML Research gGmbHAmtgericht Mannheim / HRB 337446Managing Partner: Dr. h.c. Klaus TschiraScientific and Managing Director: Prof. Dr.-Ing. Andreas Reuterhttp://www.eml-r.org----------------------------------------------------------------------------
-- ---------------------------------------------------------------------------- Dr. Vlad Cojocaru
| |||||||||||||||||||||||||||||||||
|