AMBER Archive (2009)

Subject: Re: [AMBER] Error in PMEMD run

From: Marek Malý (maly_at_sci.ujep.cz)
Date: Fri May 08 2009 - 20:36:49 CDT


OK, thanks again for all and I keep my fingers crossed
regarding to your idea to improve scaling fro ntt = 3.

   Best,

     Marek

Dne Sat, 09 May 2009 03:26:37 +0200 Robert Duke <rduke_at_email.unc.edu>
napsal/-a:

> Hi Marek,
> The random seed needs to change with each restart of the simulation -
> ie., you don't want to use the same sequence of random numbers twice,
> because that is effectively not random. I will defer to others, like
> Dave Case, to give you better pointers on langevin dynamics; if I make
> another contribution in this area, it will probably be to fix the poor
> scaling for the random number generation (seems worthwhile to me; just
> have to get to it). Anyway, good luck with your simulations, and
> hopefully the performance numbers you get with the standard benchmarks
> will give you a better idea of what is happening with your machines (I
> remain somewhat disturbed that the value at 32 nodes deteriorates so
> badly - I do suspect something else is going on there, but there are too
> many unknowns for me to be sure).
> Best Regards - Bob
>
> ----- Original Message ----- From: "Marek Malý" <maly_at_sci.ujep.cz>
> To: "AMBER Mailing List" <amber_at_ambermd.org>
> Sent: Friday, May 08, 2009 9:05 PM
> Subject: Re: [AMBER] Error in PMEMD run
>
>
> Hi Bob,
>
> it seems to me that you never sleep :))
>
> Anyway thanks again for your quickfire answer !
>
> I think that I have already consumed plethora of your time so let me just
> last notices regarding to ntt setting, since it seems to me to be the
> most
> important thing
> here.
>
> OK, if I understand well ntt = 1 could be for explicit solvent pretty
> fine
> (this is probably
> not true for implicit solvent at least in Amber10 manual is this
> warning).
>
> Regarding to ntt = 3, there is also probably important gamma_ln which
> defines collision frequency.
> If I understand well from the physical point of view is probably slightly
> more reliable ntt = 3
> choice but with proper gamma_ln value (which could be system dependend).
> The important thing
> is to periodically change random seed (ig value). How often change ig
> parameter is again probably dependend
> on gamma_ln setting (is there any formula which can give me an
> recommended
> ig frequency change as a function of gamma_ln ?).
> If I understand well the mentioned artifacts appeares because of finite
> period
> of used pseudo-random number generator. OK, on the other hand the cost
> for
> maybe more reliable/sofisticated thermostat
> (ntt=3 versus ntt=1) is worse scaling regarding to (32, 64, 128 ...
> cpus).
> Am I right ?
>
> OK, just the last question. Just if you know ...
>
> We are speaking here ntt=3 could be more reliable than ntt = 1 ... but
> what are the criterions
> for this judgements ? In other words, let say that I have molecular
> system
> XY and now I would like
> to try some tests to learn what temperature control is the best for this
> system - just from the physical reliability
> point of view (I don't care now about CPU time).
>
> So can you recommend some tests which can for example help me to estimate
> the optimal "gamma_ln"
> if I choose ntt = 3, or optimal "vrand" if I choose ntt = 2. Tests which
> finally help me
> to choose between ntt = 1 , ntt = 2 with vrand_optimal , ntt = 3 with
> gamma_ln_optimal ?
>
> If you don't know please do not waste time I can probably find some
> particular answers in articles
> cited in Amber10 manual ( e.g. [21], [22], [23], [24]).
>
> So thank you again for all !
>
> Best,
>
> Marek
>
>
>
>
>
>
>
>
> Dne Sat, 09 May 2009 02:01:23 +0200 Robert Duke <rduke_at_email.unc.edu>
> napsal/-a:
>
>> Hi Marek,
>> I glanced at the dif's but I will let Ross or somebody more used to
>> looking at the strange things that may happen in the full suite comment
>> on them. If pmemd passed all it's tests, then it should be good. At
>> 16 processors, I guess I am not greatly surprised that there are not
>> huge differences in performance - you expect things to be hitting you
>> more as go 32, 48, 64... So the biggest difference you see is the ntt 3
>> vs ntt 1, and that I would expect. Where you will see the cut make
>> more of a difference, honestly, is at relatively low processor count.
>> What happens is that the recip space and data distrib costs start
>> going up as you scale, while the direct space costs scale reasonably.
>> I think the less frequent trajectory running slower is a matter of
>> your test times being too short. Also, is anything else running on
>> this cluster? Any chance, whatsoever, that there are other jobs
>> running on the actual nodes you are using? That also makes things sort
>> of poor on performance and unreliable. ON ntt 3 vs ntt 1. Well, I am
>> working with a bunch of guys that still use ntt 1. There are
>> theoretical objections that can be raised about the quality of results
>> with this thermostat. With ntt 2 or 3, if you don't change the random
>> seed at each restart, then your results can have serious artifacts
>> (another point of some contention). So all sorts of wild things were
>> happening, it seemed to me, when these thermostats were first
>> introduced, (3 in particular), but they were reputed to equilibrate
>> temperature better. They probably do; you just have to be sure to use
>> a different random seed with each restart. I have steered clear of
>> them because all of our work went okay with the older ntt 1, because
>> there was this period of bad results, probably due to not resetting
>> the random seed, and finally, because if you really try to scale up,
>> the random number generation methods will start eating up more and
>> more of your time and keep you from scaling very well. I expect at 32
>> cpu it is more noticeable. It is not a huge effect probably until
>> 64-128+ or so, but that is an area that is interesting to me. So
>> that's the history; probably if you don't routinely want to run on a
>> ton of cpu's and change the seed religiously, there is virtue in ntt
>> 3, but many usec has been piled up with ntt 1 over the last decade.
>> Bear in mind, I am more of a computer guy than an MD guy, though I am
>> trained in both computer science and the sciences; still my focus in
>> all this is more providing the tools so you all can do the
>> simulations, not in doing them myself.
>>
>> Okay, last point. Please just benchmark some with factor ix, and see
>> how what you get compares to what other folks are getting on their
>> clusters. So the goal here is to try to sort out if there are any
>> problems with your hardware or software in the performance area.
>> Without comparing something for which we have data elsewhere, we can't
>> really tell...
>>
>> Best Regards - Bob
>>
>> ----- Original Message ----- From: "Marek Malý" <maly_at_sci.ujep.cz>
>> To: "AMBER Mailing List" <amber_at_ambermd.org>
>> Sent: Friday, May 08, 2009 7:37 PM
>> Subject: Re: [AMBER] Error in PMEMD run
>>
>>
>> Dear Bob,
>>
>> thanks a lot for your analysis !
>>
>> I made some tests (ONLY PMEMD) regarding your hypothesis.
>> Just the same short test like previous ones, with the same input files,
>> 1000 steps.
>>
>> In each additional test I just changed 1 parameter (from my original
>> configuration)
>> to see it's influence on CPU time. Regarding to node/cpus setting I have
>> tested only
>> one case : 2/8cpus = 16 single processors job where I am using all 8
>> single cpus per node.
>>
>> my original setting : 85 s
>>
>> cutt = 8 : 84 s
>> ntpr, ntwx = 1000 : 87 s ( strange but true :)) )
>> ntt = 1 : 78 s
>> ntt = 2, vrand =1000 : 83 s
>> ntt = 3, gamma_ln = 0 : 82 s
>> t0 = 300 : 87 s
>>
>> As you can see there are just little changes comparing to my original
>> setting which is listed below (in your last replay).
>>
>> Of course it is question how the influence of tested parameters changes
>> in
>> another node/cpu configurations (4/8 cpu, 4/4 cpu ...) or
>> in longer test, like 5000 steps which you recommeded ...
>>
>> Anyway in this short test I set originally ntpr, ntwx to 200 but
>> ofcourse
>> in real simulation they are much bigger (5000).
>> Regarding ntt it seems to me that you do not recommend ntt=3 (at least
>> for
>> explicit solvent) so what is your favourite choice
>> for this type of simulation ?
>>
>> OK, and now back to the reliability question.
>>
>> I have made all the tests with my "ifort 11" compilation of Amber and
>> 10.1.019 compilation of PMEMD which just uses new cc and MKL libs.
>>
>> Here are the results:
>>
>> #1 - AmberTools - I think OK
>> #2 - AmberSerial_MM - I think OK
>> #3 - AmberSerial_QMMM - I think OK
>>
>> (please see the attached files)
>>
>> #4 - AmberParallel_MM
>>
>> I made it on full node = 8 single cpus
>>
>> Here is my script to run this test:
>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> #!/bin/bash
>> mpdboot -f ~/.mpd11.hosts -n $NODES
>> export DO_PARALLEL="mpiexec -np 8"
>> make test.parallel.MM
>> mpdallexit
>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<,<<<<
>>
>>
>> big part of the test passed without any problems, but after while it
>> got
>> stucked,
>> I have waited cca 45 min, for this time period whole processors were
>> busy
>> for 100%
>> all the time see this "top" list:
>>
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 472 mmaly 20 0 153m 14m 5672 R 101 0.1 43:25.90 sander.MPI
>> 468 mmaly 20 0 153m 14m 5668 R 100 0.1 46:12.26 sander.MPI
>> 469 mmaly 20 0 157m 16m 7728 R 100 0.1 45:38.51 sander.MPI
>> 470 mmaly 20 0 153m 14m 5672 R 100 0.1 46:12.23 sander.MPI
>> 467 mmaly 20 0 157m 16m 7736 R 100 0.1 45:57.42 sander.MPI
>> 473 mmaly 20 0 159m 16m 7728 R 100 0.1 46:03.65 sander.MPI
>> 466 mmaly 20 0 153m 14m 5664 R 99 0.1 45:57.73 sander.MPI
>> 471 mmaly 20 0 157m 16m 7748 R 90 0.1 45:24.29 sander.MPI
>> 512 mmaly 20 0 10740 1480 1032 R 0 0.0 0:01.18 top
>>
>>
>>
>> .........
>>
>> ==============================================================
>> cd PIMD/part_cmd_water/restart && ./Run.cmdyn
>> diffing cmd.out.save with cmd.out
>> PASSED
>> ==============================================================
>> cd PIMD/part_rpmd_water && ./Run.rpmd
>> diffing spcfw_rpmd.top.save with spcfw_rpmd.top
>> PASSED
>> ==============================================================
>> diffing spcfw_rpmd.xyz.save with spcfw_rpmd.xyz
>> PASSED
>> ==============================================================
>> diffing spcfw_rpmd.out.save with spcfw_rpmd.out
>> PASSED
>> ==============================================================
>> cd ti_mass/pent_LES_PIMD && ./Run.pentadiene
>> This test not set up for parallel
>> cannot run in parallel with #residues < #pes
>> make[1]: Leaving directory `/home/mmaly/_applications/amber/test'
>> cd PIMD/full_cmd_water/equilib && ./Run.full_cmd
>> Testing Centroid MD <<<< - HERE IT GOT STUCKED
>>
>>
>>
>> so I had to kill this process since I do not believe that this test
>> should
>> be longer on 8 CPUs that just several minutes ...
>> Anyway relevant TEST_FAILURES file was created (please see attached
>> TEST_FAILURES_AMBER_PARALLEL_MM.diff).
>>
>>
>>
>> #5 - AmberParallel_QMMM
>>
>> This test crashed very soon as you can see on the below listing:
>>
>> export TESTsander=/opt/amber/exe/sander.MPI; make test.sander.QMMM
>> make[1]: Entering directory `/home/mmaly/_applications/amber/test'
>> cd qmmm2/xcrd_build_test/ && ./Run.oct_nma_imaged
>> diffing mdout.oct_nma_imaged.save with mdout.oct_nma_imaged
>> PASSED
>> ==============================================================
>> cd qmmm2/xcrd_build_test/ && ./Run.oct_nma_noimage
>> diffing mdout.oct_nma_noimage.save with mdout.oct_nma_noimage
>> PASSED
>> ==============================================================
>> cd qmmm2/xcrd_build_test/ && ./Run.ortho_qmewald0
>>
>> * NB pairs 145 185645 exceeds capacity ( 185750)
>> 3
>> SIZE OF NONBOND LIST = 185750
>> SANDER BOMB in subroutine nonbond_list
>> Non bond list overflow!
>> check MAXPR in locmem.f
>> [cli_3]: aborting job:
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3
>> rank 3 in job 3 enode11_56157 caused collective abort of all ranks
>> exit status of rank 3: return code 1
>> ./Run.ortho_qmewald0: Program error
>> make[1]: *** [test.sander.QMMM] Error 1
>> make[1]: Leaving directory `/home/mmaly/_applications/amber/test'
>> make: *** [test.sander.QMMM.MPI] Error
>>
>>
>> There is some problem with MAXPR, but as I learned (after seeing file
>> locmem.f) this is not a typical constant but variable which
>> is evaluated by the program it self, or am I wrong ?
>>
>> Anyway can I do something to prevent this error and to proceed whole
>> AmberParallel_QMMM test ?
>>
>>
>> #6 PMEMD test
>>
>> Absolutely without problems. All passed after while and no TEST_FAILURES
>> file was created.
>>
>>
>> Bob I would be very grateful if you can look into attached files and let
>> say at least to indicate :)) if my instalation
>> seems to be reliable or if will be better to compleet reinstalation
>> using
>> your recommended ifort 10.1.021 ...
>>
>> Thank you very much in advance !
>>
>> Best,
>>
>> Marek
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Dne Fri, 08 May 2009 21:44:08 +0200 Robert Duke <rduke_at_email.unc.edu>
>> napsal/-a:
>>
>>> Ah, now we are getting somewhere!
>>> A 60000 atom system - that is fine.
>>> Now, let's look at the mdin file you sent:
>>> heat ras-raf
>>> &cntrl
>>> imin=0,irest=1,ntx=5,
>>> nstlim=1000,dt=0.002,
>>> ntc=2,ntf=2,
>>> cut=10.0, ntb=2, ntp=1, taup=2.0,
>>> ntpr=200, ntwx=200,
>>> ntt=3, gamma_ln=2.0,
>>> temp0=310.0,
>>> /
>>>
>>> Here, things get interesting. Let's go through the potential problems
>>> in the order they occur:
>>>
>>> cut=10.0 - This is a really big cutoff for pme, generally unnecessary.
>>> The default cut is 8 angstrom; you will run roughly twice as slow for
>>> your direct space calcs with a cutoff this big. Not really a great
>>> idea
>>> (some folks go to 9 angstrom to get a longer vdw interaction; with
>>> pmemd
>>> you can actually just increase the vdw while leaving the electrostatic
>>> cut at 8 and get better performance. Now the other thing - if you are
>>> having trouble with scaling, larger cutoffs will slow you down even
>>> more
>>> because there is more information interchange.
>>>
>>> ntwx=200 - You are dumping a trajectory snapshot every 0.4 psec - this
>>> is not outrageous, but is probably also a bit of overkill. You could
>>> probably print every psec and be fine (ntwx=500). If your disk is at
>>> all slow, this will hurt. It sounded like what your were doing on the
>>> disks is okay, as long as there is not some screwy nfs mount issue
>>> (sounds like there is not).
>>>
>>> ntt=3 - AhHa! This is a langevin thermostat. There is a huge
>>> inefficiency here, associated with random number generation. I don't
>>> know how expensive it gets, but it does get expensive, and I view ntt 3
>>> as not a production tool for this reason. Others undoubtedly disagree,
>>> as lots of folks like this thermostat. BUT the way it is currently
>>> implemented, it really kills scaling.
>>>
>>> tem0=310. Additional motion at higher temp. More listbuilds. Less
>>> efficient (but you are driving the dynamics further in less time).
>>> Probably a very small effect.
>>>
>>> nstlim=1000 - PMEMD is still adjusting the run parameters out to
>>> roughly
>>> step 4000. So for higher scaling stuff, I typically do about 5000
>>> steps
>>> minimum to see what is going on.
>>>
>>> - This stuff is at least some of the reason you are not scaling as well
>>> as one might hope... The devil is in the details, and he can be a real
>>> pain...
>>>
>>> Best Regards - Bob
>>>
>>> ----- Original Message ----- From: "Marek Malý" <maly_at_sci.ujep.cz>
>>> To: "AMBER Mailing List" <amber_at_ambermd.org>
>>> Sent: Friday, May 08, 2009 3:10 PM
>>> Subject: Re: [AMBER] Error in PMEMD run
>>>
>>>
>>> Hi Bob,
>>>
>>> my testing system is composed of PPI dendrimer 4 gen + explicit wat,
>>> total num. of atoms cca 60000.
>>>
>>> Here are the input files for testing:
>>>
>>> http://physics.ujep.cz/~mmaly/MySystem/
>>>
>>> I know it is not a big system but for benchmark on 16-32 CPUs is OK I
>>> think or am I wrong ?
>>>
>>> For testing I used just 1000 steps from the equilibrium phase ( NPT
>>> simulation see - equil_DEN_PPIp_D.in ).
>>>
>>> Regarding to discs question.
>>>
>>> Each node has his local harddrive (SATA 250 GB), so I run my jobs from
>>> the
>>> first
>>> node listed in relevant .mpd.hosts file.
>>>
>>> Let say that if I want to run my job on 2 nodes (for example 11 and 12
>>> )
>>> I go to local disc of the node 11 and run the job from it.
>>>
>>> This local discs are not shared yet.
>>>
>>> Regarding to MPI, we are using Intel MPI (actually version 3.2.0.011).
>>>
>>> here are my config commands for compilation of parallel Amber/PMEMD:
>>>
>>> ./configure_amber -intelmpi ifort (Parallel Amber)
>>>
>>> ./configure linux_em64t ifort intelmpi (PMEMD)
>>>
>>> We have 14 nodes in total each node = 2 x Intel Xeon Quad-core 5365 (
>>> 3,00
>>> GHz) = 8 single CPUs
>>> Nodes are connected using "Cisco InfiniBand".
>>>
>>> So that's all what I can say about my testing system and our cluster.
>>>
>>> Thanks for your time !
>>>
>>> Best,
>>>
>>> Marek
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dne Fri, 08 May 2009 20:24:35 +0200 Robert Duke <rduke_at_email.unc.edu>
>>> napsal/-a:
>>>
>>>> Yes, Ross makes points I was planning on making next. We need to know
>>>> your benchmark. You should be running something like JAC, or even
>>>> better yet, factor ix, from the benchmarks suite. Then you should
>>>> convert your times to nsec/day and compare to some to the published
>>>> values at www.ambermd.org to have a clue as to just how good or bad
>>>> you are doing. Once you have a reasonable benchmark (not too small,
>>>> balanced i/o, not asking for extra features that are known not to
>>>> scale, etc etc), then we can look for other problems. Given a GOOD
>>>> infiniband setup (high bandwidth, configured correctly, balance
>>>> between pci express and the infiniband hca's, well-scaled infiniband
>>>> switch layout, no noise from loose cables, etc etc etc), then the
>>>> next
>>>> likely source of grief is the disk. Are you all perhaps using an
>>>> nfs-mounted volume, and even worse, one volume, not a parallel file
>>>> system, being written to by multiple running jobs? Bad idea.
>>>> Parallel jobs will hang like crazy waiting for the master to do disk
>>>> i/o. Is mpi really set up correctly? The only way you know is if
>>>> the
>>>> setup has passed other benchmarks (I typically tell by comparison of
>>>> pmemd on the candidate system to other systems, but believe me, mpi
>>>> can really be screwed up pretty easily). Which mpi? OpenMPI is
>>>> known
>>>> to be bad with infiniband (I don't know if it is actually "good" with
>>>> anything). Intel mpi is supposed to be good, but I have never tried
>>>> to jump through all the configuration hoops. MVAPICH is pretty
>>>> standard; once again, though, because I don't admin a system of this
>>>> type, I have no idea how hard it is to get everything right. I am
>>>> really sorry you are having so much "fun" with all this; I know it
>>>> must be frustrating, but there is a reason bigger clusters get run by
>>>> staff. By the way, how big is the cluster?
>>>> Best Regards - Bob
>>>> ----- Original Message ----- From: "Ross Walker"
>>>> <ross_at_rosswalker.co.uk>
>>>> To: "'AMBER Mailing List'" <amber_at_ambermd.org>
>>>> Sent: Friday, May 08, 2009 2:11 PM
>>>> Subject: RE: [AMBER] Error in PMEMD run
>>>>
>>>>
>>>> Hi Marek,
>>>>
>>>> I don't think I've seen anywhere what the actual simulation you are
>>>> running
>>>> is. This will have a huge effect on parallel scalability. With
>>>> infiniband
>>>> and a 'reasonable' system size you should easily be able to get
>>>> beyond 2
>>>> nodes. Here are some numbers for the JAC NVE benchmark from the suite
>>>> provided on http://ambermd.org/amber10.bench1.html
>>>>
>>>> This is for NCSA Abe which is Dual x Quad core clovertown (E5345
>>>> 2.33GHz so
>>>> very similar to your setup) and uses SDR infiniband.
>>>>
>>>> Using all 8 processors per node (time for benchmark in seconds):
>>>> 8 ppn 8 cpu 364.09
>>>> 8 ppn 16 cpu 202.65
>>>> 8 ppn 24 cpu 155.12
>>>> 8 ppn 32 cpu 123.63
>>>> 8 ppn 64 cpu 111.82
>>>> 8 ppn 96 cpu 91.87
>>>>
>>>> Using 4 processors per node (2 per socket):
>>>> 4 ppn 8 cpu 317.07
>>>> 4 ppn 16 cpu 178.95
>>>> 4 ppn 24 cpu 134.10
>>>> 4 ppn 32 cpu 105.25
>>>> 4 ppn 64 cpu 83.28
>>>> 4 ppn 96 cpu 67.73
>>>>
>>>> As you can see it is still scaling to 96 cpus (24 nodes at 4 threads
>>>> per
>>>> node). So I think you must either be running an unreasonably small
>>>> system to
>>>> expect scaling in parallel or there is something very wrong with the
>>>> setup
>>>> of your computer.
>>>>
>>>> All the best
>>>> Ross
>>>>
>>>>> -----Original Message-----
>>>>> From: amber-bounces_at_ambermd.org [mailto:amber-bounces_at_ambermd.org] On
>>>>> Behalf Of Marek Malý
>>>>> Sent: Friday, May 08, 2009 10:58 AM
>>>>> To: AMBER Mailing List
>>>>> Subject: Re: [AMBER] Error in PMEMD run
>>>>>
>>>>> Hi Gustavo,
>>>>>
>>>>> thanks for your suggestion but we have only 14 nodes in our cluster
>>>>> (each
>>>>> node = 2 x Xeon Quad-core 5365 (3,00 GHz) = 8 single CPUs per node
>>>>> connected with "Cisco InfiniBand").
>>>>>
>>>>> If I allocate 8 nodes and I use just 2 CPUs per node for one my job
>>>>> it
>>>>> means that 8x6 single CPUs = 48 will be wasted. In this
>>>>> case I am sure that my colleagues will kill me :)) Moreover I do not
>>>>> assume that 8/2CPU combination will have significantly better
>>>>> performance that 2/8CPU at least in case of PMEMD.
>>>>>
>>>>> But anyway, thank you for your opinion/experience !
>>>>>
>>>>> Best,
>>>>>
>>>>> Marek
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dne Fri, 08 May 2009 19:28:35 +0200 Gustavo Seabra
>>>>> <gustavo.seabra_at_gmail.com> napsal/-a:
>>>>>
>>>>> >> the best performance I have obtained in case of using combination
>>>>> of
>>>>> 4
>>>>> >> nodes
>>>>> >> and 4 CPUs (from 8) per node.
>>>>> >
>>>>> > I don't know exactly what you have in your system, but I gather you
>>>>> > are using 8core-nodes, and from it you got the best performance by
>>>>> > leaving 4 cores idle. Is that correct?
>>>>> >
>>>>> > In this case, I would suggest that you go a bit further, and also
>>>>> test
>>>>> > using only 1 or 2 cores per node, i.e., leaving the remaining 6-7
>>>>> > cores idle. So, for 16 MPI processes, try allocating 16 or 8
>>>>> nodes.
>>>>> > (I didn't see this case in your tests)
>>>>> >
>>>>> > AFAIK, The 8-core nodes are arranged in 2 4-core sockets, and the
>>>>> > communication between core, that was already bad within the 4-cores
>>>>> in
>>>>> > the same socket, gets even worse when you need to get information
>>>>> > between two sockets. Depending on your system, if you send 2
>>>>> processes
>>>>> > to the same node, it may put all in the same socket or
>>>>> automatically
>>>>> > split it one for each socket. You may also be able to tell it to
>>>>> make
>>>>> > sure that this gets split in to 1 process per socket. (Look into
>>>>> the
>>>>> > mpirun flags.) From the tests we've run on those kind of machines,
>>>>> we
>>>>> > do get the best performance by leaving ALL BUT ONE core idle in
>>>>> each
>>>>> > socket.
>>>>> >
>>>>> > Gustavo.
>>>>> >
>>>>> > _______________________________________________
>>>>> > AMBER mailing list
>>>>> > AMBER_at_ambermd.org
>>>>> > http://lists.ambermd.org/mailman/listinfo/amber
>>>>> >
>>>>> > __________ Informace od NOD32 4051 (20090504) __________
>>>>> >
>>>>> > Tato zprava byla proverena antivirovym systemem NOD32.
>>>>> > http://www.nod32.cz
>>>>> >
>>>>> >
>>>>>
>>>>> --
>>>>> Tato zpráva byla vytvořena převratným poštovním klientem Opery:
>>>>> http://www.opera.com/mail/
>>>>>
>>>>> _______________________________________________
>>>>> AMBER mailing list
>>>>> AMBER_at_ambermd.org
>>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER_at_ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> AMBER mailing list
>>>> AMBER_at_ambermd.org
>>>> http://lists.ambermd.org/mailman/listinfo/amber
>>>>
>>>> __________ Informace od NOD32 4051 (20090504) __________
>>>>
>>>> Tato zprava byla proverena antivirovym systemem NOD32.
>>>> http://www.nod32.cz
>>>>
>>>>
>>>
>>
>

-- 
Tato zpráva byla vytvořena převratným poštovním klientem Opery:  
http://www.opera.com/mail/

_______________________________________________ AMBER mailing list AMBER_at_ambermd.org http://lists.ambermd.org/mailman/listinfo/amber