Multiple Power Supply Unit (PSU) Fan Failures on Sun Fire 3800-6800 Servers may Result in Platform Outage



Category :Availability
Release Phase :Resolved
Product :Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server  
Bug Id :6405762  
Date of Workaround Release :11-DEC-2006 
Date of Resolved Release :22-MAR-2007 


Impact

Power supply fan failures on Sun Fire 3800-6800 servers can go undetected and may lead to a platform outage if a platform suffers multiple PSU fan failures.


Contributing Factors

This issue can occur on the following platforms:

  • Sun Fire 3800 with PSU p/n 300-1441 (A145) and 300-1529 (A145E)
  • Sun Fire 4800 with PSU p/n 300-1460 (A153)
  • Sun Fire 4810 with PSU p/n 300-1459 (A152)
  • Sun Fire 6800 with PSU p/n 300-1459 (A152)

Products not affected:

  • Sun Fire 4800 without the affected PSU (Listed above)
  • Sun Fire 6800 without the affected PSU (Listed above)
  • Sun Fire 4900/6900 Servers

PSU model numbers can be determined through use of one of the following methods:

Utilizing the "showboards" command on the platform:

	sc> showboards
	Slot     Pwr Component Type                 State      Status     Domain
	----     --- --------------                 -----      ------     ------
	SSC0     On  System Controller              Main       Passed     -
	SSC1     On  Present                        Spare      -          -
	ID0      On  Sun Fire 4800 Centerplane      -          OK         -
	PS0      On  A153 Power Supply              -          OK         -
	PS1      On  A153 Power Supply              -          OK         -
	PS2      On  A153 Power Supply              -          OK         -
	 .
	 .
	 .
Reviewing the sc-extended Explorer data file showenvironment_-tv.out:

   $ grep "Power Supply" /<explorer_base_directory>/sc/<SC_Name>/showenvironment_-tv.out
   A152 Power Supply 0
   A152 Power Supply 1
   A152 Power Supply 2
   
Reviewing the prtfru data in Explorer data file prtfru_-x.out:
	
  $ grep "Power Supply" /explorer_base_directory/sc/SC_Name/prtfru_-x.out
  <Fru_Description value="Power Supply (A152)"/>
  <Fru_Description value="Power Supply (A152)"/>
  <Fru_Description value="Power Supply (A152)"/>

 


Symptoms

The Power Supply Unit (PSU) does not report a fan failure via System Controller Application (Sc-App) directly.

If a PSU fan stops or slows down it results in the normal forced flow of air through the PSU from the front of chassis to the back to be reduced and reversed but not halted completely:

Platform	Air flow normal PSU	Air flow fan failed PSU
--------	-------------------	-----------------------
3800		blows air out		draws air in  (reduced flow)
4800		draws air in		blows air out (reduced flow)
6800		draws air in		blows air out (reduced flow)

 

To verify a failed fan on 3800, 4810 and 6800, use a flashlight and look through the PSU vents to see if the fan blades are turning. This is not possible to do on 4800 due to the location of the PSU fan.

An alternative is to examine the air flow at the PSU vent by holding a piece of paper in front of the vent to determine air flow and its direction.

On the 4800 and 6800 PSU, the paper should be drawn into the air intake and held there when the PSU fan is operating normally.

For a 3800 PSU the paper will be blown away from the air vent when the PSU fan is operating normally.

Failed fans are identified when air flow is significantly less when compared with other good power supplies on the same type of platform or when the air flow has reversed its normal direction.

When a fan in a PSU fails the PSU will continue to operate normally but will have an elevated temperature due to reduced and reversed air flow when compared to other PSUs.

ScApp monitors PSU temperatures and only reports warnings if temperatures exceed warning or maximum temperatures:

     - Warning threshold is 65 Degrees C.
     - Maximum threshold is 78 Degrees C.

In the case of PSU fan failure and depending upon many variables the PSU temperature may not be high enough to trigger ScApp to produce a warning, thus resulting in an undetected fan failure.

If undetected PSU fan failures are allowed to build up within a platform over many months* it is possible to have a platform power loss with very little advance warning.

*Fan failure is largely due to bearing failure as the fans reach end of bearing life and the time between PSU fan failures within a single platform is likely to be months or years.

Important Note:

Patches 114526-08 and 114527-03 (or later) provide firmware which monitors power supply temperature and may provide a warning similar to the following:

         WARNING: PS2 temperature is elevated indicating it may have a failed cooling fan.
         PS2 48 VDC 0 Temp. 0 value: 42 Degrees C
         Contact Sun Support Services to check for PSU fan failure.

In some cases on the 4800 platform, the elevated temperature that may occur in this message can be normal.

On the 6800 platform with these firmware patches, the warning message may not identify the power supply with failed fans when the first fan failure occurs.


Workaround

Power supplies with failed fans should be replaced. To detect failed PSU fans prior to implementing the solution below, please inspect physically for the symptoms described above.


Resolution

This issue is addressed in the following platforms:

  • Sun Fire 6800/4800/4810/3800 with firmware 5.19.7 (as delivered in patch 114526-08) or 5.20.2 (as delivered in patch 114527-03) or later



Modification History


Date: 22-MAR-2007
  • Updated Resolution section
  • State: Resolved

 


Date: 09-APR-2007
  • Updated CR list and Resolution section

 




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201071
Article Type : Sun Alert
Last reviewed : 2007-04-12
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1