[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freeipmi-users] Temperature sensors disabled or abnormally cold: Can al
From: |
Ryan Cox |
Subject: |
[Freeipmi-users] Temperature sensors disabled or abnormally cold: Can alerts be sent? |
Date: |
Thu, 30 Sep 2010 11:46:08 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100915 Lightning/1.0b1 Thunderbird/3.0.8 |
I'm trying to use pef-config (or any tool that will work) to have a Dell
PowerEdge M610 send alerts when a temperature sensor is disabled or has
seriously erroneous data, like reporting a processor to be at 5 degrees
C. Here is an example:
# ipmitool sdr type Temperature
Temp | 01h | ns | 3.1 | Disabled
Temp | 02h | ok | 3.2 | 5 degrees C
Ambient Temp | 08h | ok | 7.1 | 26 degrees C
IOH ThermTrip | 35h | ns | 7.1 | Disabled
From a good server:
# ipmitool sdr type Temperature
Temp | 01h | ok | 3.1 | 26 degrees C
Temp | 02h | ok | 3.2 | 34 degrees C
Ambient Temp | 08h | ok | 7.1 | 24 degrees C
IOH ThermTrip | 35h | ns | 7.1 | Disabled
The processor sensors are the ones I care about (3.1 and 3.2). A server
was affected by a power event (surge or sag... not sure) and the
processor temperature sensors are having issues for some reason. The
CPUs are throttled as a result and that is logged via syslog. We can
take care of the hardware issues just fine, but I am hoping to have our
servers notify us of problems in a way like alerts for an ECC threshold
error (entry in event log, snmp trap sent, amber light). I played
around with pef-config for a while and can't figure out how to make it
alert when a sensor is disabled. I'm also not sure if the alert would
only happen on a cold boot, etc, so I'm not sure if maybe I do have it
configured correctly but just can't test it. The affected servers are
still in use until user jobs on them are finished, so I can't reboot
them until that time.
Here's an example of a config I was working with:
Section Event_Filter_9
## Possible values:
Manufacturer_Pre_Configured/Software_Configurable/Reserved1/Reserved3
Filter_Type
Manufacturer_Pre_Configured
## Possible values: Yes/No
Enable_Filter Yes
## Possible values: Yes/No
Event_Filter_Action_Alert Yes
## Possible values: Yes/No
Event_Filter_Action_Power_Off No
## Possible values: Yes/No
Event_Filter_Action_Reset No
## Possible values: Yes/No
Event_Filter_Action_Power_Cycle No
## Possible values: Yes/No
Event_Filter_Action_Oem No
## Possible values: Yes/No
Event_Filter_Action_Diagnostic_Interrupt No
## Possible values: Yes/No
Event_Filter_Action_Group_Control_Operation No
## Give a valid number
Alert_Policy_Number 1
## Give a valid number
Group_Control_Selector 0
## Possible values:
Unspecified/Monitor/Information/OK/Non_Critical/Critical/Non_Recoverable
Event_Severity Critical
## Specify a hex Slave Address or Software ID from Event
Message or 0xFF to Match Any
Generator_Id_Byte_1 0xFF
## Specify a hex Channel Number or LUN to match or 0xFF to
Match Any
Generator_Id_Byte_2 0xFF
## Specify a Sensor Type, For options see the MAN page
Sensor_Type Temperature
## Specify a Sensor Number or 0xFF to Match Any
Sensor_Number 0xFF
## Specify a Event/Reading Type Number or 0xFF to Match Any
Event_Trigger 0xFF
## Give a valid number
Event_Data1_Offset_Mask 0x204
## Give a valid number
Event_Data1_AND_Mask 0x00
## Give a valid number
Event_Data1_Compare1 0xFF
## Give a valid number
Event_Data1_Compare2 0x00
## Give a valid number
Event_Data2_AND_Mask 0x00
## Give a valid number
Event_Data2_Compare1 0xFF
## Give a valid number
Event_Data2_Compare2 0x00
## Give a valid number
Event_Data3_AND_Mask 0x00
## Give a valid number
Event_Data3_Compare1 0xFF
## Give a valid number
Event_Data3_Compare2 0x00
EndSection
In this attempt, I was trying to have it essentially alert for
everything and then narrow it down from there.
A few things I'm unsure about: What is Event_Data1_Offset_Mask and is it
set appropriately for what I want to do (I used an existing temperature
policy from the blade as a template)? This is my first time messing
with pef-config, so I'm a little confused by it to be honest. I know
how to checkout, diff, commit, etc, but am having trouble figuring out
what to put for some of the values.
Any thoughts? Am I going about this the wrong way?
Thanks
--
Ryan Cox
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University
- [Freeipmi-users] Temperature sensors disabled or abnormally cold: Can alerts be sent?,
Ryan Cox <=