AMBER Archive (2002)Subject: RE: Beowulf killing jobs
From: Ross Walker (ross_at_rosswalker.co.uk)
Date: Mon Sep 16 2002 - 09:46:45 CDT
Dear Peter
>I am just beginning to run jobs on a new Beowulf cluster and am having
a strange problem. >Jobs running with 1 cpu/node, 1 node and 2 nodes or
2 cpu/node and 1 node run just fine. >But, if I try to run with 4 CPUs
>[pgannett_at_energy b_nomod_ss_prod]$ cat sample_1ppn_4no.e1086
>=>> PBS: job killed: node 3 (node2) requested job die, code 1099
This sounds very much like a setting in your PBS system that is stopping
you running 4 processes at once. Check if there is a per user process
limit for your cluster. You could also try running 4 copies of the
following and see if they complete or some get killed:
Awk 'BEGIN {for(i=0;i<100000000;i++)for(j=0;j<100000000;j++);}'
echo "Process completed"
This will very quickly tell you if you can run 4 jobs concurrently.
Note, in my experience using PBS for mpi jobs can be a real pain, is
there a facility for you to run the jobs without subbmitting them via a
pbs batch queue?
All the best
Ross.
/\
\/
|\oss Walker
| Imperial College of Science, Technology & Medicine |
| Department of Chemistry | Theoretical Division |
| Tel:- +44 20 759(45851) |
| EMail:- ross_at_rosswalker.co.uk | http://www.rosswalker.co.uk/ |
| PGP Key available on request |
|