Using the Reset Button on A Main System Controller May Cause Domain Outage



Category :Availability
Release Phase :Resolved
Product :Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E6900 Server
Sun Fire E4900 Server  
Bug Id :4378797  
Date of Workaround Release :21-APR-2005 
Date of Resolved Release :12-NOV-2007 


Impact

If the main System Controller (SC) on a Sun Fire 3800, 4800, 4810, 4900, 6800 or 6900 system (with running domains) is reset with the reset button, there is a possibility of a change in hardware configuration which would cause the domains to perform a "fatal" reset. The domains will reset and take action as per the "error-reset-recovery" OBP property, which may include unexpected system outages while domains are recovered.


Contributing Factors

This issue can occur on the following platforms:

  • Sun Fire 3800, 4800, 4810, 4900, 6800, 6900 (without recommended SunFire SCApp firmware update 5.12.6)

if the reset button is used on running domains.

The Sun Fire System Controller (SC) periodically queries system ASICs (Application Specific Integrated Circuits) via JTAG buses to read configuration, monitor environmental states and change domain configuration. If the hardware reset button is used during one of these operations, the JTAG bus may be left in an undefined state. This change in configuration can trigger a fatal reset on affected active domains.

To determine the firmware version of the SCApp, use the "showsc" command from the platform shell as follows:

    SC> showsc
    SC: SSC0
    Main System Controller
    SC Failover: disabled
    Clock failover enabled.
    SC date: Thu Jun 01 12:59:45 CDT 2006
    SC uptime: 25 minutes 58 seconds     ScApp version: 5.19.6 Build_01
    RTOS version: 45

Symptoms

Shortly after the main System Controller has been reset using the reset button, the domains within the system reboot with an error message similar to the following:

    ErrorMonitor: Domain A has a SYSTEM ERROR

Workaround

In the case of an SC becoming unresponsive, attempts should be made to confirm connectivity via the serial port and network prior to using the reset button. If the SC appears to be hung:

  1. Confirm that the SC is actually hung by connecting to the serial port of the SC (with a known good cable)
  2. Hit "enter" a few times - if no prompt is returned, the SC is hung
  3. If this is the case, halt all domains, using the Solaris "init 0" command (or "shutdown")
  4. Reset the SC using the reset button, or power-cycle the whole chassis.

Note: The use of the reset button on running domains should be avoided whenever possible, and the SC should be reset either by the above steps or via ScApp.


Resolution

This issue is addressed on the following platforms:

  • Sun Fire 3800, 4800, 4810, 4900, 6800, 6900 with SunFire SCApp firmware 5.12.6 (as delivered in patch 112127-02 or later)

Note: The patch above addresses the software issue for BugID 4378797. The use of the reset button on running domains should be avoided whenever possible.




Modification History


Date: 28-SEP-2005

29-Sep-2005:

  • Update Relief/Workaround section

Date: 12-NOV-2007
  • Updated Contributing Factors and Resolutions sections
  • State: Resolved



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 200180
Article Type : Sun Alert
Last reviewed : 2007-11-12
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1