[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-s
From: |
Al Chu |
Subject: |
Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors) |
Date: |
Wed, 10 Oct 2007 09:30:17 -0700 |
As an added note to other developers, I've added a few extra notes about
the -v and -vv options in the HEAD ipmi-sensors manpage now too.
Al
On Wed, 2007-10-10 at 09:26 -0700, Al Chu wrote:
> Hey Gregor,
>
> There is a sublety here that I added extra documentation for in the
> FreeIPMI 0.5.0 manpage (I didn't backport to 0.4.X b/c didn't think it
> was that important, but maybe I should have). The ipmi-sensors numbers
> listed on the left are "record ids", not sensor numbers. If you use the
> verbose options on ipmi-sensors (-v or -vv), you can find the sensor
> numbers. As an example on my system:
>
> Record ID: 22
> Sensor Name: Fan5
> Group Name: Fan
> Sensor Number: 18
> Event/Reading Type Code: 1h
>
> you can see the sensor number and record id don't match up.
>
> I'm not 100% why record ids were chosen for input/output over sensor
> numbers in ipmi-sensors (the tool was originally created by others), but
> if I had to guess for some reasons why:
>
> - some sensors don't have sensor numbers. I notice multiple sensors w/
> sensor number 0x00 in the ipmitool output below. I would guess those
> sensors don't have a number so they just output 0x00.
>
> - record ids increase in value, while sensor numbers need not, so
> outputting record ids looks nicer, maybe? The output order in ipmitool
> also seems to be record id based, but they just output the sensor number
> instead of the record id.
>
> As an FYI if you were wondering why sensors seem to be missing from
> ipmi-sensors, our default output does not output every sensor by
> default. Some are only retrievable via the verbose options.
>
> Hope that helps clarify things.
>
> Al
>
> On Wed, 2007-10-10 at 11:06 +0200, Gregor Dschung wrote:
> > Hey Al,
> >
> > mmmh.... now, I'm really confused. I thought, the sensor-id has to be 8
> > bit long?
> >
> > Also I'm confused about the different sensor-ids I'm getting with
> > ipmi-sensors (0.4.6.beta2) and `ipmitool sdr elist` (1.8.6). Sure,
> > ipmitool is giving me the sensor id as Hex and ipmi-sensors as a decimal
> > number... but the converted value should be the same?
> > I would like to set up a PEF-Table, but for that, I'll need the right
> > sensor-ids :-/
> >
> > Example 1:
> >
> > p300slg01:/usr/local/src # ipmitool -H gtseval-ipmi -U ADMIN -a sdr
> > elist all
> > Password:
> > Hewlett-Packard | 00h | ok | 0.0 | Dynamic MC @ 20h
> > ACPI State | 20h | ok | 0.0 | S0/G0: working
> > System Reset | 21h | ok | 0.0 |
> > POST Error | 01h | ns | 0.0 | Disabled
> > Memory ECC | 02h | ns | 0.0 | Disabled
> > PCI Error | 03h | ns | 0.0 | Disabled
> > Fan Error | 04h | ns | 0.0 | Disabled
> > Watchdog | FEh | ns | 0.0 | Disabled
> > CPU Fan 1 | 31h | ok | 0.0 | 9592.33 RPM
> > CPU Fan 2 | 32h | ok | 0.0 | 10426.44 RPM
> > CPU Fan 3 | 33h | ok | 0.0 | 9992.01 RPM
> > CPU Fan 4 | 34h | ok | 0.0 | 10900.37 RPM
> > CPU Fan 5 | 35h | ok | 0.0 | 9592.33 RPM
> > CPU Fan 6 | 3Ch | ok | 0.0 | 10900.37 RPM
> > CPU Fan 7 | 3Dh | ok | 0.0 | 9992.01 RPM
> > CPU Fan 8 | 3Eh | ok | 0.0 | 10426.44 RPM
> > CPU Fan 9 | 3Fh | ok | 0.0 | 9592.33 RPM
> > CPU Fan 10 | 40h | ok | 0.0 | 10426.44 RPM
> > System Fan 1 | 41h | ok | 0.0 | 9992.01 RPM
> > System Fan 2 | 42h | ok | 0.0 | 10900.37 RPM
> > CPU0 Vcore | 3Ah | ok | 3.0 | 1.10 Volts
> > CPU1 Vcore | 3Bh | ns | 3.1 | No Reading
> > Standby 5V | 37h | ok | 0.0 | 4.97 Volts
> > System 5V | 36h | ok | 0.0 | 4.85 Volts
> > System 3.3V | 38h | ok | 0.0 | 3.23 Volts
> > 3V CMOS Sense | 39h | ok | 0.0 | 3.03 Volts
> > CPU0 Therm Diode | 43h | ns | 3.0 | Disabled
> > CPU1 Therm Diode | 44h | ns | 3.1 | Disabled
> > CPU0 ThermDiode2 | 52h | ns | 3.0 | Disabled
> > CPU1 ThermDiode2 | 53h | ns | 3.1 | Disabled
> > AMB Temp | 48h | ok | 0.0 | 29 degrees C
> > MultiBit ECC ER | 4Ah | ok | 0.0 | State Deasserted
> > VDD Power Fail | 4Ch | ok | 0.0 | State Deasserted
> > Reset | 4Dh | ok | 0.0 | State Deasserted
> > Identify | 4Eh | ok | 0.0 | State Deasserted
> > NMI | 50h | ok | 0.0 | State Deasserted
> > CPU0 Therm-Trip | 55h | ok | 3.0 | State Deasserted
> > CPU1 Therm-Trip | 56h | ns | 3.1 | No Reading
> > CPU0 IERR | 57h | ok | 3.0 | State Deasserted
> > CPU1 IERR | 58h | ns | 3.1 | No Reading
> > CPU0 Prochot | 59h | ok | 3.0 | Limit Not Exceeded
> > CPU1 Prochot | 5Ah | ns | 3.1 | No Reading
> > CPU0 SocketOcc | 5Bh | ok | 3.0 | Device Present
> > CPU1 SocketOcc | 5Ch | ok | 3.1 | Device Absent
> > CPU0 Dmn 0 Temp | 86h | ok | 3.0 | 45 degrees C
> > CPU1 Dmn 0 Temp | 89h | ns | 3.1 | No Reading
> > CPU0 Dmn 1 Temp | 8Ch | ok | 3.0 | 45 degrees C
> > CPU1 Dmn 1 Temp | 8Fh | ns | 3.1 | No Reading
> > FRU0 | 00h | ns | 0.0 | Logical FRU @00h
> > ----------
> > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u ADMIN -P
> > Password:
> > 64: ACPI State (ACPI Power State): [S0/G0 "working"]
> > 112: System Reset (Module/Board): [OK]
> > 160: POST Error (System Firmware): [Unknown]
> > 208: Memory ECC (Memory): [Unknown]
> > 256: PCI Error (Critical Interrupt): [Unknown]
> > 304: Fan Error (Cooling Device): [Unknown]
> > 352: Watchdog (Watchdog 2): [Unknown]
> > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 592: CPU Fan 4 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 784: CPU Fan 7 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 848: CPU Fan 8 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > 1040: System Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > 1168: CPU0 Vcore (Voltage): 1.10 V (0.40/1.70): [OK]
> > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK]
> > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK]
> > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK]
> > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK]
> > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK]
> > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK]
> > 2064: MultiBit ECC ER (Module/Board): [State Deasserted]
> > 2112: VDD Power Fail (Power Supply): [State Deasserted]
> > 2160: Reset (Button): [State Deasserted]
> > 2208: Identify (Button): [State Deasserted]
> > 2304: NMI (Button): [State Deasserted]
> > 2352: CPU0 Therm-Trip (Processor): [State Deasserted]
> > 2400: CPU1 Therm-Trip (Processor): [State Deasserted]
> > 2448: CPU0 IERR (Processor): [State Deasserted]
> > 2496: CPU1 IERR (Processor): [State Deasserted]
> > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded]
> > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded]
> > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present]
> > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent]
> > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> >
> > Example 2:
> > p300slg01:/usr/local/src # ipmitool -H gts00-ipmi -U ADMIN -a sdr elist all
> > Password:
> > pef | FDh | ns | 46.1 | Event-Only
> > watchdog | FEh | ns | 46.1 | Event-Only
> > KIM BMC | 00h | ok | 0.0 | Dynamic MC @ 20h
> > PLTFRM SECURITY | FCh | ns | 0.0 | Event-Only
> > CPU Temp 1 | 00h | ok | 3.0 | 22 degrees C
> > CPU Temp 2 | 01h | ok | 3.0 | 21 degrees C
> > CPU Temp 3 | 02h | ns | 3.1 | No Reading
> > CPU Temp 4 | 03h | ns | 3.1 | No Reading
> > Sys Temp | 04h | ok | 7.0 | 36 degrees C
> > CPU1 Vcore | 05h | ok | 3.0 | 1.19 Volts
> > CPU2 Vcore | 06h | ok | 3.1 | 1.21 Volts
> > 3.3V | 07h | ok | 7.0 | 3.34 Volts
> > 5V | 08h | ok | 7.0 | 4.99 Volts
> > 12V | 09h | ok | 7.0 | 11.52 Volts
> > -12V | 0Ah | ok | 7.0 | -12.30 Volts
> > 1.5V | 0Bh | ok | 7.0 | 1.47 Volts
> > 5VSB | 0Ch | ok | 7.0 | 4.92 Volts
> > VBAT | 0Dh | ok | 7.0 | 3.31 Volts
> > Fan1 | 0Eh | ok | 7.0 | 4400 RPM
> > Fan2 | 0Fh | lnr | 7.0 | 0 RPM
> > Fan3 | 10h | ok | 7.0 | 4400 RPM
> > Fan4 | 11h | lnr | 7.0 | 0 RPM
> > Fan5 | 12h | lnr | 7.0 | 0 RPM
> > Fan6 | 13h | lnr | 7.0 | 0 RPM
> > Fan7/CPU1 | 14h | lnr | 3.0 | 0 RPM
> > Fan8/CPU2 | 15h | lnr | 3.0 | 0 RPM
> > Intrusion | 44h | lnc | 23.1 | 0 unspecified
> > Power Supply | 16h | ok | 10.0 | 0 unspecified
> > CPU0 Internal E | 17h | ok | 3.0 | 0 unspecified
> > CPU1 Internal E | 18h | ok | 3.1 | 0 unspecified
> > CPU Overheat | 19h | ok | 3.0 | 0 unspecified
> > Thermal Trip0 | 1Ah | ok | 3.0 | 0 unspecified
> > Thermal Trip1 | 1Bh | ok | 3.1 | 0 unspecified
> > BIOS | 00h | ok | 0.0 |
> > --------
> > p300slg01:/usr/local/src # ipmi-sensors -h gts00-ipmi -u ADMIN -P
> > Password:
> > 4: CPU Temp 1 (Temperature): 22.00 C (NA/78.00): [OK]
> > 5: CPU Temp 2 (Temperature): 21.00 C (NA/78.00): [OK]
> > 6: CPU Temp 3 (Temperature): 0.00 C (NA/78.00): [OK]
> > 7: CPU Temp 4 (Temperature): 0.00 C (NA/78.00): [OK]
> > 8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK]
> > 9: CPU1 Vcore (Voltage): 1.20 V (1.06/1.63): [OK]
> > 10: CPU2 Vcore (Voltage): 1.21 V (1.06/1.63): [OK]
> > 11: 3.3V (Voltage): 3.34 V (2.93/3.66): [OK]
> > 12: 5V (Voltage): 4.99 V (4.44/5.54): [OK]
> > 13: 12V (Voltage): 11.52 V (10.56/13.44): [OK]
> > 14: -12V (Voltage): -12.30 V (-10.59/-13.40): [OK]
> > 15: 1.5V (Voltage): 1.47 V (1.31/1.68): [OK]
> > 16: 5VSB (Voltage): 4.92 V (4.44/5.54): [OK]
> > 17: VBAT (Voltage): 3.31 V (2.93/3.66): [OK]
> > 18: Fan1 (Fan): 4400.00 RPM (300.00/NA): [OK]
> > 19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 20: Fan3 (Fan): 4300.00 RPM (300.00/NA): [OK]
> > 21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 22: Fan5 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 23: Fan6 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower
> > Non-Recoverable Threshold]
> > 26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
> > 27: Power Supply (Power Supply): [OK]
> > 28: CPU0 Internal E (Module/Board): [OK]
> > 29: CPU1 Internal E (Module/Board): [OK]
> > 30: CPU Overheat (Module/Board): [OK]
> > 31: Thermal Trip0 (Module/Board): [OK]
> > 32: Thermal Trip1 (Module/Board): [OK]
> > 33: BIOS (System Firmware): [Unknown]
> >
> >
> > I hope, I only forget something and that's not a new bug.
> >
> > Regards,
> > Gregor
> >
> >
> > Gregor Dschung wrote:
> > > Hey Al,
> > >
> > > whoa!!!
> > >
> > > THAT is OpenSource :). We've mailed perhaps for a week (I guess it would
> > > have taken only about three days, if we had worked both in the same
> > > timezone ;) ). And now, the issue seams to be solved:
> > > -----------
> > > p300slg01:/usr/local/src # ipmi-sensors -h gtseval-ipmi -u admin -P
> > > Password:
> > > 64: ACPI State (ACPI Power State): [S0/G0 "working"]
> > > 112: System Reset (Module/Board): [OK]
> > > 160: POST Error (System Firmware): [Unknown]
> > > 208: Memory ECC (Memory): [Unknown]
> > > 256: PCI Error (Critical Interrupt): [Unknown]
> > > 304: Fan Error (Cooling Device): [Unknown]
> > > 352: Watchdog (Watchdog 2): [Unknown]
> > > 400: CPU Fan 1 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 464: CPU Fan 2 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 528: CPU Fan 3 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 592: CPU Fan 4 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 656: CPU Fan 5 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > > 720: CPU Fan 6 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 784: CPU Fan 7 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 848: CPU Fan 8 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 912: CPU Fan 9 (Fan): 9992.01 RPM (NA/3475.48): [OK]
> > > 976: CPU Fan 10 (Fan): 10426.44 RPM (NA/3475.48): [OK]
> > > 1040: System Fan 1 (Fan): 9592.33 RPM (NA/3475.48): [OK]
> > > 1104: System Fan 2 (Fan): 10900.37 RPM (NA/3475.48): [OK]
> > > 1168: CPU0 Vcore (Voltage): 1.11 V (0.40/1.70): [OK]
> > > 1232: CPU1 Vcore (Voltage): 0.80 V (0.40/1.70): [OK]
> > > 1296: Standby 5V (Voltage): 4.97 V (4.26/5.79): [OK]
> > > 1360: System 5V (Voltage): 4.85 V (4.26/5.79): [OK]
> > > 1424: System 3.3V (Voltage): 3.23 V (2.82/3.85): [OK]
> > > 1488: 3V CMOS Sense (Voltage): 3.03 V (2.62/NA): [OK]
> > > 1680: CPU0 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1744: CPU1 Therm Diode (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1808: CPU0 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1872: CPU1 ThermDiode2 (Temperature): 42.00 C (10.00/80.00): [OK]
> > > 1936: AMB Temp (Temperature): 29.00 C (10.00/50.00): [OK]
> > > 2064: MultiBit ECC ER (Module/Board): [State Deasserted]
> > > 2112: VDD Power Fail (Power Supply): [State Deasserted]
> > > 2160: Reset (Button): [State Deasserted]
> > > 2208: Identify (Button): [State Deasserted]
> > > 2304: NMI (Button): [State Deasserted]
> > > 2352: CPU0 Therm-Trip (Processor): [State Deasserted]
> > > 2400: CPU1 Therm-Trip (Processor): [State Deasserted]
> > > 2448: CPU0 IERR (Processor): [State Deasserted]
> > > 2496: CPU1 IERR (Processor): [State Deasserted]
> > > 2544: CPU0 Prochot (Temperature): [Limit Not Exceeded]
> > > 2592: CPU1 Prochot (Temperature): [Limit Not Exceeded]
> > > 2640: CPU0 SocketOcc (Processor): [Device Inserted/Device Present]
> > > 2688: CPU1 SocketOcc (Processor): [Device Removed/Device Absent]
> > > 2736: CPU0 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 2864: CPU1 Dmn 0 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 3248: CPU0 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > 3440: CPU1 Dmn 1 Temp (Temperature): 45.00 C (NA/85.00): [OK]
> > > -------------
> > >
> > > Thanks a lot for your help.
> > >
> > > Regards,
> > > Gregor
> > >
> > >
> > > Albert Chu wrote:
> > >> Hey Gregor,
> > >>
> > >> Doh! I forgot a patch. Here's the next likely FreeIPMI 0.4.6 release
> > >> :-)
> > >>
> > >> PLMK if it works.
> > >>
> > >> Thanks,
> > >> Al
> > >>
> > >>> Hey Gregor,
> > >>>
> > >>> Attached are two tar.gz files. One is a likely candiate for the
> > >>> FreeIPMI 0.4.6 release and another test tar.gz for debug info if
> > >>> something new goes wrong :-)
> > >>>
> > >>> PLMK how it works out. Thanks for all the debug help.
> > >>>
> > >>> Al
> > >>>
> > >>> On Tue, 2007-10-09 at 17:25 +0200, Gregor Dschung wrote:
> > >>>> Hey Al,
> > >>>>
> > >>>> here is the sdr-cache. 'sdr-cache-p300slg01.10.136.17.128' is the file
> > >>>> for gtseval-ipmi, 'sdr-cache-p300slg01.10.136.17.170' is an other cache
> > >>>> file from a call of ipmi-sensors which works fine.
> > >>>>
> > >>>> I'm using FreeIPMI on a system with SUSE 10.1.
> > >>>> ---------
> > >>>> p300slg01:/usr/local/src # uname -a
> > >>>> Linux p300slg01 2.6.16.27-0.9-smp #1 SMP Tue Feb 13 09:35:18 UTC 2007
> > >>>> i686 i686 i386 GNU/Linux
> > >>>> ---------
> > >>>>
> > >>>> In your test4-code, I had to change the following lines to compile w/o
> > >>>> errors:
> > >>>> common/src/pstdout.c
> > >>>> -243: fprintf(stderr, "Default stack size = %li bytes \n",
> > >>>> mystacksize);
> > >>>> +243: fprintf(stderr, "Default stack size = %li bytes \n",
> > >>>> (long)mystacksize);
> > >>>> +501: va_list vacpy;
> > >>>>
> > >>>> ---------
> > >>>>
> > >>>> I've tested FreeIPMI locally again. I was wrong, it crashes, too. I
> > >>>> guess, I was confused with IPMItool, which runs fine locally but gives
> > >>>> warnings over the network. Don't know whether it helps you:
> > >>>> Locally:
> > >>>> address@hidden:~/ipmi/usr/bin> ./ipmitool -I open sensor
> > >>>> ACPI State | 0x1 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> System Reset | 0x0 | discrete | 0x0080| na |
> > >>>> na | na | na | na | na
> > >>>> POST Error | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> Memory ECC | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> PCI Error | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> Fan Error | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> Watchdog | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> CPU Fan 1 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 2 | 10426.441 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 3 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 4 | 10426.441 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 5 | 9223.391 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 6 | 10900.371 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 7 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 8 | 10900.371 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 9 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU Fan 10 | 10426.441 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> System Fan 1 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> System Fan 2 | 10900.371 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> CPU0 Vcore | 1.107 | Volts | ok | na |
> > >>>> 0.402 | 0.500 | 1.597 | 1.695 | na
> > >>>> CPU1 Vcore | na | Volts | na | na |
> > >>>> 0.402 | 0.500 | 1.597 | 1.695 | na
> > >>>> Standby 5V | 4.969 | Volts | ok | na |
> > >>>> 4.263 | 4.528 | 5.527 | 5.792 | na
> > >>>> System 5V | 4.851 | Volts | ok | na |
> > >>>> 4.263 | 4.528 | 5.527 | 5.792 | na
> > >>>> System 3.3V | 3.234 | Volts | ok | na |
> > >>>> 2.822 | 2.999 | 3.675 | 3.851 | na
> > >>>> 3V CMOS Sense | 3.028 | Volts | ok | na |
> > >>>> 2.617 | 2.781 | na | na | na
> > >>>> CPU0 Therm Diode | na | degrees C | na | na |
> > >>>> 10.000 | na | 68.000 | 80.000 | 95.000
> > >>>> CPU1 Therm Diode | na | degrees C | na | na |
> > >>>> 10.000 | na | 68.000 | 80.000 | 95.000
> > >>>> CPU0 ThermDiode2 | na | degrees C | na | na |
> > >>>> 10.000 | na | 68.000 | 80.000 | 95.000
> > >>>> CPU1 ThermDiode2 | na | degrees C | na | na |
> > >>>> 10.000 | na | 68.000 | 80.000 | 95.000
> > >>>> AMB Temp | 29.000 | degrees C | ok | na |
> > >>>> 10.000 | na | 30.000 | 45.000 | na
> > >>>> MultiBit ECC ER | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> VDD Power Fail | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> Reset | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> Identify | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> NMI | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> CPU0 Therm-Trip | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> CPU1 Therm-Trip | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> CPU0 IERR | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> CPU1 IERR | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> CPU0 Prochot | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> CPU1 Prochot | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> CPU0 SocketOcc | 0x1 | discrete | 0x0280| na |
> > >>>> na | na | na | na | na
> > >>>> CPU1 SocketOcc | 0x0 | discrete | 0x0180| na |
> > >>>> na | na | na | na | na
> > >>>> CPU0 Dmn 0 Temp | 45.000 | degrees C | ok | na |
> > >>>> na | na | na | 85.000 | 95.000
> > >>>> CPU1 Dmn 0 Temp | na | degrees C | na | na |
> > >>>> na | na | na | 85.000 | 95.000
> > >>>> CPU0 Dmn 1 Temp | 46.000 | degrees C | ok | na |
> > >>>> na | na | na | 85.000 | 95.000
> > >>>> CPU1 Dmn 1 Temp | na | degrees C | na | na |
> > >>>> na | na | na | 85.000 | 95.000
> > >>>>
> > >>>> Over a RCMP+-Session:
> > >>>> [...]
> > >>>> System Reset | 0x0 | discrete | 0x0080| na |
> > >>>> na | na | na | na | na
> > >>>> Error reading sensor POST Error (#01)
> > >>>> Error reading sensor Memory ECC (#02)
> > >>>> Error reading sensor PCI Error (#03)
> > >>>> Error reading sensor Fan Error (#04)
> > >>>> Watchdog | na | discrete | na | na |
> > >>>> na | na | na | na | na
> > >>>> CPU Fan 1 | 9992.006 | RPM | ok | na |
> > >>>> na | na | 3996.803 | 3475.480 | na
> > >>>> [...]
> > >>>>
> > >>>> The missing lines are equal.
> > >>>> -----------
> > >>>>
> > >>>> I've called ipmi-sensors from an x86_64 to reach gtseval-ipmi, too. And
> > >>>> it crashes with the same error (second attachment).
> > >>>>
> > >>>> So... Enough debugging for today.
> > >>>>
> > >>>> Have a nice day,
> > >>>> Gregor
> > >>>>
> > >>>> Al Chu wrote:
> > >>>>> Hey Gregor,
> > >>>>>
> > >>>>> Although it's unlikely your problem, I saw one other potential issue.
> > >>>>> So I added a fix in this slightly newer tar.gz.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Al
> > >>>>>
> > >>>>> On Mon, 2007-10-08 at 11:51 -0700, Al Chu wrote:
> > >>>>>> Hey Gregor,
> > >>>>>>
> > >>>>>> Here's another tar.gz. Could you run ./configure with --enable-debug
> > >>>>>> and run with --debug again? The gdb output confirms the line I
> > >>>> believed
> > >>>>>> was causing the problem, but I still can't quite figure out how the
> > >>>>>> corruption is happening. So I put in a lot more printfs.
> > >>>>>>
> > >>>>>> I do have atleast two other suspicions, that depend on your system.
> > >>>> So
> > >>>>>> do you think you could also send me the SDR from
> > >>>> ~/.freeipmi/sdr-cache/
> > >>>>>> for me to analyze and also could you tell me what linux you are
> > >>>> running
> > >>>>>> on the i386 box? I'm wondering if you have some older distribution
> > >>>> (b/c
> > >>>>>> its i386) and it has slightly different threads behavior that I'm not
> > >>>>>> handling properly.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Al
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sun, 2007-10-07 at 12:12 +0200, Gregor Dschung wrote:
> > >>>>>>> Hi Al,
> > >>>>>>>
> > >>>>>>> I attach again the output of the call with --debug and the
> > >>>> backtrace. It
> > >>>>>>> was the first time that I used gdb, so I hope I understood the
> > >>>> tutorials
> > >>>>>>> :)
> > >>>>>>>
> > >>>>>>> At the moment I'm not able to run ipmi-sensors locally, because I'm
> > >>>> not
> > >>>>>>> root on "gtseval" (the host of gtseval-ipmi) and I've to wait until
> > >>>> I get
> > >>>>>>> rw-rights for /dev/ipmi0 again. And we have week-end ;)
> > >>>>>>>
> > >>>>>>> You are right, I'm running the IPMItool and FreeIPMI on an i386. On
> > >>>>>>> gtseval is a 64bit-System, so perhaps this is the reason for not
> > >>>> crashing
> > >>>>>>> locally.
> > >>>>>>>
> > >>>>>>> Have a nice Sunday,
> > >>>>>>> Gregor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hey Gregor,
> > >>>>>>>>
> > >>>>>>>> Can't see anything suspicuous in the code. Here's another tar.gz
> > >>>> that I
> > >>>>>>>> added a whole bunch of extra printfs to try and give me more
> > >>>> information,
> > >>>>>>>> could you run again (./configure --enable-debug and run
> > >>>> ipmi-sensors with
> > >>>>>>>> --debug again). Also, you mentioned that ipmi-sensors completes
> > >>>> locally
> > >>>>>>>> without issue. Are the number of sensor listed below (ending w/
> > >>>> CPU1 Dmn
> > >>>>>>>> 1 Temp) the same as the number of sensors listed when you run
> > >>>> locally?
> > >>>>>>>> Also, is a core dump being output by this crash? Could you run gdb
> > >>>>>>>> against the core and get a backtrace? That'd be a lot of help too.
> > >>>>>>>>
> > >>>>>>>> Thanks for helping me look into this,
> > >>>>>>>>
> > >>>>>>>> Al
> > >>>>>>>>
> > >>>>>>>>> Hi Al,
> > >>>>>>>>>
> > >>>>>>>>> thanks for your fast answer.
> > >>>>>>>>>
> > >>>>>>>>> I've tested your test-version and it seems to be on the correct
> > >>>> way. It
> > >>>>>>>>> still crashes, but now I get sensor-data :) :
> > >>>>>>>>>
> > >>>>>>>>> [...]
> > >>>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Albert Chu
> > >>>>>>>> address@hidden
> > >>>>>>>> 925-422-5311
> > >>>>>>>> Computer Scientist
> > >>>>>>>> High Performance Systems Division
> > >>>>>>>> Lawrence Livermore National Laboratory
> > >>>>>>>>
> > >>> --
> > >>> Albert Chu
> > >>> address@hidden
> > >>> 925-422-5311
> > >>> Computer Scientist
> > >>> High Performance Systems Division
> > >>> Lawrence Livermore National Laboratory
> > >>>
> > >
> >
> >
--
Albert Chu
address@hidden
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
- [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Gregor Dschung, 2007/10/04
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Al Chu, 2007/10/04
- Message not available
- Message not available
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Gregor Dschung, 2007/10/05
- Message not available
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Gregor Dschung, 2007/10/07
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Al Chu, 2007/10/08
- Message not available
- Message not available
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Gregor Dschung, 2007/10/09
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Al Chu, 2007/10/09
- Message not available
- Message not available
- Message not available
- Message not available
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors), Al Chu, 2007/10/10
- Re: [Freeipmi-devel] Trouble w/ HP ProLiant and FreeIPMI (ipmi-sensors),
Al Chu <=