AMBER Archive (2009)

Subject: RE: [AMBER] Why..QM/MM

From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Sat Aug 08 2009 - 12:00:27 CDT


Hi Bill.

> Why? I did one QM/MM minimization using PM3 level for a metalloprotein.
> When I specify using only one node (8 CPUs), it's run without any
> problem. But, when I asked for 4 nodes (4 x 8 =32), it stops after 150
> step and doesn't make any more response for more than one day.
> Could you tell me why? By the way, I am using sander.MPI of AMBER10 in
> both cases.

This could be for a number of reasons. The fact that it manages to do 150 steps fine suggests that it is more likely a problem with your hardware than with the code. Do you get any error messages at all? BTW, 32 cpus is a LOT to be doing QM/MM on - what interconnect do you have? (Ethernet will NOT cut it). Have you actually benchmarked the system on this number of cpus? How well it performs in parallel depends on a LOT of factors including how large the MM region is. I would be surprised if you see much benefit beyond 16 cpus.

Is the problem you are seeing reproducible? Does it always hang at 150 steps? Can you set ntpr=1 and see what step it actually stops on. Does it ALWAYS stop on this same step? If it is intermittent then it suggests a hardware fault. If it is ALWAYS the same step then it could be a code issue. BTW, if you run 100 steps at 1, 8, 16 and 32cpus do you always get the same results? You should check this.

Good luck,
Ross

/\
\/
|\oss Walker

| Assistant Research Professor |
| San Diego Supercomputer Center |
| Tel: +1 858 822 0854 | EMail:- ross_at_rosswalker.co.uk |
| http://www.rosswalker.co.uk | PGP Key available on request |

Note: Electronic Mail is not secure, has no guarantee of delivery, may not be read every day, and should not be used for urgent or sensitive issues.

_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber