|
|||||||||||||||||||||||||||||||||
AMBER Archive (2005)Subject: Re: AMBER: AMBER goes in a Loop
From: Robert Duke (rduke_at_email.unc.edu)
Folks -
In /etc/rc.d/rc.local put the two lines:
echo 1048576 > /proc/sys/net/core/rmem_max
This way, every time you reboot there is a substantial chunk of memory dedicated to net buffers. Doing this of course requires root privileges.
Then set P4_SOCKBUFSIZE (MPICH_SOCKET_BUFFER_SIZE for mpich 2) to something like 131072 in your .cshrc or wherever makes sense for you.
The critical point here is that you need sufficient memory set aside that a read and write operation can be underway simultaneously in each mpi process, or things will deadlock, and when you run mpich on dual processor machines, the amount of net buffer space increases (so you see above I am specifying 8 x as much memory in the kernel as in P4_SOCKBUFSIZE; I don't know what the minimum "overage" required to prevent deadlocks is, but this config works well for my machines).
Now, with mpich you will also need a very large value from P4_GLOBMEMSIZE; I set my machines to something like 134217728 to be able to run the rt benchmark on sander; pmemd requires a fraction of this. The run always dies with an obvious error message when this is a problem.
Another point: These large buffer sizes DO improve mpich/gigabit ethernet performance significantly. There are also issues about being sure the right number of processors start on the right machines, and that your server nics (you did buy expensive but faster server nics for your back end didn't you, and you do have a separate local lan interconnecting the machines, right?) are where the mpi i/o occurs. The only way I have found to get the right number of processes on the machines and using the right interconnects is with a "process group file" where I can reference the interconnect - see the mpich doc. All these things make a huge difference for gigabit ethernet lan performance. I currently get the following throughput on 3.2 ghz dual cpu p4's connected as described above for factor ix const pressure (90906 atoms):
#proc psec/day
Note this is current in-development code, not pmemd 8. Basically you DON'T get linear scaling on something like factor ix on these small systems with gigabit ethernet because the distributed fft transposes are huge and overwhelm the interconnect bandwidth. There is not nearly as much of a problem for shared memory machines or real supercomputers (the 1 to 2 processor scaling drop is actually largely a cache sharing issue on these small machines as you don't use the nic's; once you go to 4 procs, though, you use the nics).
Okay, I may or may not have ever posted anything on this; I don't remember. But if I didn't, the reason I didn't was because these are machine-specific instructions that work with RedHat linux and probably a variety of other linuxes (but probably not all), and that work with mpich. So you may have to poke around for your specific machine. If you have a canned vendor setup - like something from sgi or what have you, they probably get the base config correct; the grief comes when you take a generic system and put your own mpi(ch) on top of it. I have not looked at LAM, but there is no reason it would not also be susceptible to the problem. This sort of thing reflects a lack of deadlock avoidance software down there somewhere.
Sorry if this is not at all your problem; in my case though, this is the source of rt benchmark hangs for sander 8 or pmemd 8.
Regards - Bob Duke
Hi David,
Yes, all other benchmark tests ie.. hb, jac and gb_alp etc.. run successfully for 8 processors. Also, they run for 1, 2 and 4 processors.
The problem is only with rt 8 processor run.
Imran
On 10/14/05, David A. Case <case_at_scripps.edu> wrote:
Does the system work for other benchmarks, e.g. "jac" or "hb"? I'm trying to
...dac
-----------------------------------------------------------------------
-----------------------------------------------------------------------
| |||||||||||||||||||||||||||||||||
|