|
|||||||||||||||||||||||||||||||||
AMBER Archive (2008)Subject: Re: AMBER: Performance issues on Ethernet clusters
From: Robert Duke (rduke_at_email.unc.edu)
There are lots of ways to get the purchase and setup of gigabit ethernet hardware and software wrong, and not many ways to get it right. The web page you mention is dated as Dave says; Ross and I have put up "more recent" info, but it is on the order of two-four years old. With the advent of multicore cpu's, the plain fact of the matter is that the interconnect is more and more the bottleneck (where the interconnect includes any ethernet switches, cables, network interface cards, and the pci bus out to the nic cards). You really have to buy the right hardware, set it up right, build and configure mpi correctly, set system buffer params up correctly, and build pmemd correctly. Then it will do what we say. In the past I used some athlon boxes through a cheap switch, and it was always slower than a single processor - the reason I used it at all was purely for test. So CAREFULLY read my amber.scripps.edu web page entries at a minimum, and if you are not a linux and networking guru, find one. Oh, and doing just a bit of other communications over that net that you are doing mpi over - ANY nfs over it can screw it up completely (and the fact it is the default network interface probably means it is a dumb client nic, not a server nic, so it is probably slow to begin with). ANOTHER thing that will screw you up completely - run the master process on a node, and have it write via NFS to some other machine via a net - even a separate one. This nicely stalls the master because NFS is really not very fast, and when the master stalls, everybody else twiddles their thumbs. MD has substantial data volumes associated with it; you will never have the performance you would like to have... (but springing for infiniband if you have 32 nodes would make a heck of a lot of sense, especially if by node, you actually mean a multicore cpu).
Hi all,
Is there anything in the application/cluster configuration or build options that can be done (other than look for cash to get Infiniband)? I hope so, since it's hard to believe that all the descriptions of Ethernet-based clusters (including this one: http://amber.scripps.edu/cluster_info/index.html) are meaningless..
Thank you for any suggestions.
Sasha
-----------------------------------------------------------------------
| |||||||||||||||||||||||||||||||||
|