AMBER Archive (2008)

Subject: Re: AMBER: SiCortex as an alternative to Ethernet and infiniband clusters

From: Sasha Buzko (
Date: Thu Apr 17 2008 - 17:39:46 CDT

Thanks for the info, Kevin.
I guess, we are going to stay with the cluster model, though. Especially
since we already have the hardware running non-MPI jobs.



Kevin Abbey wrote:
> Hi,
> Have you considered the products offered by SiCortex?
> We have two of the development systems here at Rutgers. It is called
> the Catapult, SC072.
> I was planning to make an attempt to compile Amber 10 on this
> architecture.
> I am sure I'll have questions when I begin with this since I have
> little experience with both SiCortex and Amber. I may end up waiting
> for someone else to do this since I have little time for this already.
> Kevin
> Kevin C. Abbey
> System Administrator
> Rutgers University - BioMaPS Institute
> Email:
> Hill Center - Room 268
> 110 Frelinghuysen Road
> Piscataway, NJ 08854
> Phone and Voice mail: 732-445-3288
> Wright-Rieman Laboratories Room 201
> 610 Taylor Rd.
> Piscataway, NJ 08854-8087
> Phone: 732-445-2069
> Fax: 732-445-5958
> Sasha Buzko wrote:
>> Thank you, Bob.
>> Yes, it looks like the network is going to be a hard thing to tweak
>> in our situation, and we'll end up going for an Infiniband
>> interconnect eventually (we actually have 50 4-core nodes).
>> Thanks again for the explanation.
>> Best regards,
>> Sasha
>> Robert Duke wrote:
>>> There are lots of ways to get the purchase and setup of gigabit
>>> ethernet hardware and software wrong, and not many ways to get it
>>> right. The web page you mention is dated as Dave says; Ross and I
>>> have put up "more recent" info, but it is on the order of two-four
>>> years old. With the advent of multicore cpu's, the plain fact of
>>> the matter is that the interconnect is more and more the bottleneck
>>> (where the interconnect includes any ethernet switches, cables,
>>> network interface cards, and the pci bus out to the nic cards). You
>>> really have to buy the right hardware, set it up right, build and
>>> configure mpi correctly, set system buffer params up correctly, and
>>> build pmemd correctly. Then it will do what we say. In the past I
>>> used some athlon boxes through a cheap switch, and it was always
>>> slower than a single processor - the reason I used it at all was
>>> purely for test. So CAREFULLY read my web page
>>> entries at a minimum, and if you are not a linux and networking
>>> guru, find one. Oh, and doing just a bit of other communications
>>> over that net that you are doing mpi over - ANY nfs over it can
>>> screw it up completely (and the fact it is the default network
>>> interface probably means it is a dumb client nic, not a server nic,
>>> so it is probably slow to begin with). ANOTHER thing that will
>>> screw you up completely - run the master process on a node, and have
>>> it write via NFS to some other machine via a net - even a separate
>>> one. This nicely stalls the master because NFS is really not very
>>> fast, and when the master stalls, everybody else twiddles their
>>> thumbs. MD has substantial data volumes associated with it; you
>>> will never have the performance you would like to have... (but
>>> springing for infiniband if you have 32 nodes would make a heck of a
>>> lot of sense, especially if by node, you actually mean a multicore cpu).
>>> Regards - Bob Duke
>>> ----- Original Message -----
>>> *From:* Sasha Buzko <>
>>> *To:* <>
>>> *Sent:* Thursday, April 17, 2008 2:20 PM
>>> *Subject:* AMBER: Performance issues on Ethernet clusters
>>> Hi all,
>>> I've just completed setting up pmemd with mpich2 to test on a
>>> cluster with gigabit Ethernet connections. As a test case, I
>>> used an example from an Amber tutorial (suggested by Ross,
>>> In my setup, using pmemd on up to 32 nodes gave no performance
>>> gain at all over a single 4-processor system. The best case I
>>> had was about 5% improvement when running 1 pmemd process per
>>> node on a 32 node subset of the cluster. There is other traffic
>>> across this private subnet, but it's minimal (another job
>>> running on the rest of the cluster only accesses NFS shares to
>>> write the results of a job with no constant data transfer). In
>>> all cases, cpu utilization ranged from 65% (1 process per node)
>>> to 15-20% (4 per node). With 4 processes per node, it took twice
>>> as long on 32 nodes whan it did on a single box.
>>> Is there anything in the application/cluster configuration or
>>> build options that can be done (other than look for cash to get
>>> Infiniband)? I hope so, since it's hard to believe that all the
>>> descriptions of Ethernet-based clusters (including this one:
>>> are meaningless..
>>> Thank you for any suggestions.
>>> Sasha

The AMBER Mail Reflector
To post, send mail to
To unsubscribe, send "unsubscribe amber" to