AMBER Archive (2002)

Subject: Beowulf killing jobs

From: Peter Gannett (pgannett_at_hsc.wvu.edu)
Date: Fri Sep 13 2002 - 16:03:19 CDT


Dear amber users:

I am just beginning to run jobs on a new Beowulf cluster and am having a strange problem. Jobs running with 1 cpu/node, 1 node and 2 nodes or 2 cpu/node and 1 node run just fine. But, if I try to run with 4 CPUs (either 1 cpu/node and 4 nodes or 2 cpu/node and 2 nodes, my jobs get randomly killed and I get an error message:

[pgannett_at_energy b_nomod_ss_prod]$ cat sample_1ppn_4no.e1086
=>> PBS: job killed: node 3 (node2) requested job die, code 1099
Killed by signal 15.
Killed by signal 15.
Killed by signal 15.

and I did not kill the job. If it helps, I am running jobs under the PBS scheduling system (qsub).

Has anyone had this problem. My sysad is not being very helpful and is claiming there must be something in my code (sander, version 7) doing this.

Thanks.
Pete Gannett