[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround fo
From: |
Frank Steiner |
Subject: |
[Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround for Sun machines |
Date: |
Mon, 05 Jul 2010 08:52:57 +0200 |
User-agent: |
Thunderbird 2.0.0.24 (X11/20100302) |
Hi Al,
Albert Chu wrote
> Hey Dave, Frank,
>
> As discussed in the previous thread, there was a corner case in the
> bmc-watchdog workaround I previously did. I then discovered another
> corner case w/ the workaround.
>
> There is a new beta here.
sorry, I was away, but I'm going to test the new beta now. During my
absense the Sun X4100M2 produced two strange things:
1) bmc-watchdog: Get Watchdog Timer Error: No error message found for
command 25h, network function 06h, and completion code 80h. Please
report to <address@hidden>
2) The really bad thing was three of the X4100M2 being rebooted by the
watchdog as reaction to a "bmc-watchdog -s -k" call I guess. The
timer runs 15 minutes and I reset the watchdog by to independent
instances every 3 minutes. On all three machines I found this in
the logs:
Jul 3 21:03:01 sunserver8 /usr/sbin/cron[11808]: (root) CMD
(/usr/bin/bmc-reset)
Jul 3 21:03:04 sunserver8 pm-profiler: Power Button pressed, executing
/sbin/shutdown -h now
Jul 3 21:03:04 sunserver8 shutdown[11853]: shutting down for system halt
The bmc-reset script just does this:
for name in `seq 1 15`
do
# -s -k means: reset if running. Could be that the timer was
# stopped because the init script failed to set it up. We should
# not start it then.
output=`/usr/sbin/bmc-watchdog -s -k 2>&1`
exitstatus=$?
if [ "$exitstatus" != "0" ]
then
sleep 3
else
exit 0
fi
done
There was always 2-3 seconds between the cron entry and the shutdown
so I guess the ilom of the Sun initiated the shutdown due to the
bmc-watchdog -s -k command. The timer cannot have run down because
I get an email for every failed try to reset the watchdog and should
have gotten 3-4 of them in the 15 minutes the timer runs.
Has anything liks this reported before?
Btw, Sun first refused to develop a firmware update for the X4100M2 because
it is EOL, but due to our 5-year-support warranty they are forced to do so ;-)
Now they are developing a patch for a newer machine, because they stated that
the error exists in may of the SunFire machines, and will then backport it to
the 4100.
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
- [Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround for Sun machines,
Frank Steiner <=
[Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround for Sun machines, Albert Chu, 2010/07/06