freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-s


From: Gregor Dschung
Subject: Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors)
Date: Tue, 09 Oct 2007 17:25:21 +0200
User-agent: Thunderbird 1.5.0.12 (X11/20060911)

Hey Al,

here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file
for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache
file from a call of ipmi-sensors which works fine.

I'm using FreeIPMI on a system with SUSE 10.1.
---------
p300slg01:/usr/local/src # uname -a
Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007
i686 i686 i386 GNU/Linux
---------

In your test4-code, I had to change the following lines to compile w/o
errors:
common/src/pstdout.c
-243: fprintf(stderr, "Default stack size = %li bytes \n", mystacksize);
+243: fprintf(stderr, "Default stack size = %li bytes \n",
(long)mystacksize);
+501: va_list vacpy;

---------

I've tested FreeIPMI locally again. I was wrong, it crashes, too. I
guess, I was confused with IPMItool, which runs fine locally but gives
warnings over the network. Don't know whether it helps you:
Locally:
address@hidden:~/ipmi/usr/bin> ./ipmitool -I open sensor
ACPI State       | 0x1        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
System Reset     | 0x0        | discrete   | 0x0080| na        |
na        | na        | na        | na        | na
POST Error       | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
Memory ECC       | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
PCI Error        | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
Fan Error        | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
Watchdog         | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 2        | 10426.441  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 3        | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 4        | 10426.441  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 5        | 9223.391   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 6        | 10900.371  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 7        | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 8        | 10900.371  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 9        | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU Fan 10       | 10426.441  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
System Fan 1     | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
System Fan 2     | 10900.371  | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
CPU0 Vcore       | 1.107      | Volts      | ok    | na        |
0.402     | 0.500     | 1.597     | 1.695     | na
CPU1 Vcore       | na         | Volts      | na    | na        |
0.402     | 0.500     | 1.597     | 1.695     | na
Standby 5V       | 4.969      | Volts      | ok    | na        |
4.263     | 4.528     | 5.527     | 5.792     | na
System 5V        | 4.851      | Volts      | ok    | na        |
4.263     | 4.528     | 5.527     | 5.792     | na
System 3.3V      | 3.234      | Volts      | ok    | na        |
2.822     | 2.999     | 3.675     | 3.851     | na
3V CMOS Sense    | 3.028      | Volts      | ok    | na        |
2.617     | 2.781     | na        | na        | na
CPU0 Therm Diode | na         | degrees C  | na    | na        |
10.000    | na        | 68.000    | 80.000    | 95.000
CPU1 Therm Diode | na         | degrees C  | na    | na        |
10.000    | na        | 68.000    | 80.000    | 95.000
CPU0 ThermDiode2 | na         | degrees C  | na    | na        |
10.000    | na        | 68.000    | 80.000    | 95.000
CPU1 ThermDiode2 | na         | degrees C  | na    | na        |
10.000    | na        | 68.000    | 80.000    | 95.000
AMB Temp         | 29.000     | degrees C  | ok    | na        |
10.000    | na        | 30.000    | 45.000    | na
MultiBit ECC ER  | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
VDD Power Fail   | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
Reset            | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
Identify         | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
NMI              | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
CPU0 Therm-Trip  | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
CPU1 Therm-Trip  | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
CPU0 IERR        | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
CPU1 IERR        | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
CPU0 Prochot     | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
CPU1 Prochot     | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
CPU0 SocketOcc   | 0x1        | discrete   | 0x0280| na        |
na        | na        | na        | na        | na
CPU1 SocketOcc   | 0x0        | discrete   | 0x0180| na        |
na        | na        | na        | na        | na
CPU0 Dmn 0 Temp  | 45.000     | degrees C  | ok    | na        |
na        | na        | na        | 85.000    | 95.000
CPU1 Dmn 0 Temp  | na         | degrees C  | na    | na        |
na        | na        | na        | 85.000    | 95.000
CPU0 Dmn 1 Temp  | 46.000     | degrees C  | ok    | na        |
na        | na        | na        | 85.000    | 95.000
CPU1 Dmn 1 Temp  | na         | degrees C  | na    | na        |
na        | na        | na        | 85.000    | 95.000

Over a RCMP+-Session:
[...]
System Reset     | 0x0        | discrete   | 0x0080| na        |
na        | na        | na        | na        | na
Error reading sensor POST Error (#01)
Error reading sensor Memory ECC (#02)
Error reading sensor PCI Error (#03)
Error reading sensor Fan Error (#04)
Watchdog         | na         | discrete   | na    | na        |
na        | na        | na        | na        | na
CPU Fan 1        | 9992.006   | RPM        | ok    | na        |
na        | na        | 3996.803  | 3475.480  | na
[...]

The missing lines are equal.
-----------

I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And
it crashes with the same error (second attachment).

So... Enough debugging for today.

Have a nice day,
Gregor

Al Chu wrote:
> Hey Gregor,
>
> Although it's unlikely your problem, I saw one other potential issue.
> So I added a fix in this slightly newer tar.gz.
>
> Thanks,
> Al
>
> On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote:
>> Hey Gregor,
>>
>> Here's another tar.gz.  Could you run ./configure with --enable-debug
>> and run with --debug again?  The gdb output confirms the line I believed
>> was causing the problem, but I still can't quite figure out how the
>> corruption is happening.  So I put in a lot more printfs.
>>
>> I do have atleast two other suspicions, that depend on your system.  So
>> do you think you could also send me the SDR from ~/.freeipmi/sdr-cache/
>> for me to analyze and also could you tell me what linux you are running
>> on the i386 box?  I'm wondering if you have some older distribution (b/c
>> its i386) and it has slightly different threads behavior that I'm not
>> handling properly.
>>
>> Thanks,
>> Al
>>
>>
>> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote:
>>> Hi Al,
>>>
>>> I attach again the output of the call with --debug and the backtrace. It
>>> was the first time that I used gdb, so I hope I understood the tutorials
>>> :)
>>>
>>> At the moment I'm not able to run ipmi-sensors locally, because I'm not
>>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until I get
>>> rw-rights for /dev/ipmi0 again. And we have week-end ;)
>>>
>>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On
>>> gtseval is a 64bit-System, so perhaps this is the reason for not crashing
>>> locally.
>>>
>>> Have a nice Sunday,
>>> Gregor
>>>
>>>
>>>> Hey Gregor,
>>>>
>>>> Can't see anything suspicuous in the code.  Here's another tar.gz that I
>>>> added a whole bunch of extra printfs to try and give me more information,
>>>> could you run again (./configure --enable-debug and run ipmi-sensors with
>>>> --debug again).  Also, you mentioned that ipmi-sensors completes locally
>>>> without issue.  Are the number of sensor listed below (ending w/ CPU1 Dmn
>>>> 1 Temp) the same as the number of sensors listed when you run locally?
>>>>
>>>> Also, is a core dump being output by this crash?  Could you run gdb
>>>> against the core and get a backtrace?  That'd be a lot of help too.
>>>>
>>>> Thanks for helping me look into this,
>>>>
>>>> Al
>>>>
>>>>> Hi Al,
>>>>>
>>>>> thanks for your fast answer.
>>>>>
>>>>> I've tested your test-version and it seems to be on the correct way. It
>>>>> still crashes, but now I get sensor-data :) :
>>>>>
>>>>> [...]
>>>>>
>>>>
>>>> --
>>>> Albert Chu
>>>> address@hidden
>>>> 925-422-5311
>>>> Computer Scientist
>>>> High Performance Systems Division
>>>> Lawrence Livermore National Laboratory
>>>>


-- 
Gregor Dschung
System Life Guard, HiWi

Fraunhofer-Institut für Techno-
und Wirtschaftsmathematik ITWM
Fraunhofer-Platz 1
D-67663 Kaiserslautern

E-Mail:   address@hidden
Internet: www.itwm.fraunhofer.de  

Attachment: sdr-cache.tar.bz2
Description: application/bzip

Attachment: ipmi-sensors.debug.tar.bz2
Description: application/bzip

Attachment: ipmi-sensors_x64.debug.tar.bz2
Description: application/bzip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]