AMBER Archive (2009)

Subject: [AMBER] Re: mpirun noticed that process rank 1 ... on signal 1 (Hangup).

From: Naser Alijabbari (na3m_at_virginia.edu)
Date: Tue Jul 14 2009 - 05:49:14 CDT


sorry the message was cut. However, when I use the same configuration for a
new computer: Systemax model 981091 - Intel core quadI get the following
error at non reproducible intervals:
NSTEP = 32000 TIME(PS) = 186.201 TEMP(K) = 289.61 PRESS =
0.0
 Etot = -28480.7941 EKtot = 7231.1811 EPtot =
 -35711.9752
 BOND = 286.8982 ANGLE = 774.4197 DIHED =
 1115.8723
 1-4 NB = 373.9270 1-4 EEL = 5943.6287 VDWAALS =
 3888.4780
 EELEC = -48095.1992 EHBOND = 0.0000 RESTRAINT =
0.0000
 Ewald error estimate: 0.3488E-03
 ------------------------------------------------------------------------------

 NSTEP = 33000 TIME(PS) = 187.201 TEMP(K) = 289.75 PRESS =
0.0
 Etot = -28482.0763 EKtot = 7234.8307 EPtot =
 -35716.9070
 BOND = 273.0706 ANGLE = 770.8814 DIHED =
 1103.7325
 1-4 NB = 365.0504 1-4 EEL = 5948.9635 VDWAALS =
 3906.6602
 EELEC = -48085.2656 EHBOND = 0.0000 RESTRAINT =
0.0000
 Ewald error estimate: 0.3295E-04
 ------------------------------------------------------------------------------

NSTEP = 34000 TIME(PS) = 188.201 TEMP(K) = 292.64 PRESS =
0.0
 Etot = -28482.3229 EKtot = 7306.8466 EPtot =
 -35789.1694
 BOND = 298.3194 ANGLE = 773.0516 DIHED =
 1106.2032
 1-4 NB = 378.5118 1-4 EEL = 5980.9467 VDWAALS =
 4141.3781
 EELEC = -48467.5802 EHBOND = 0.0000 RESTRAINT =
0.0000
 Ewald error estimate: 0.6165E-05
 ------------------------------------------------------------------------------

==> nohup.out <==
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 31400 on node xxx.xxx.xx.xxx
exited on signal 1 (Hangup).
--------------------------------------------------------------------------
2 total processes killed (some possibly by mpirun during cleanup)

I have even run a simulation that was 200000 step without a hangup but the
problem sometimes randomly appears. I believe it is tied to me leaving the
ssh terminal whenever the error does occur. I am using fedora 9.
Has anyone else seen this before?
_______________________________________________
AMBER mailing list
AMBER_at_ambermd.org
http://lists.ambermd.org/mailman/listinfo/amber