On Some Sun4v Platforms with Patch125369-02 (or later), "showfaults" on the Service Processor May Display the Motherboard as the Faulty FRU Instead of the Actual Faulty FRU



Category :Availability
Release Phase :Resolved
Bug Id :6582853, 6606715, 6537307  
Product :Solaris 10 Operating System  
Date of Workaround Release :16-OCT-2007 
Date of Resolved Release :15-Sep-2008 

On Some Sun4v Platforms with Patch 125369-02,  (see below for details)


1. Impact

On some sun4v platforms with Solaris 10 (as described below), "showfaults" may display the motherboard as the FRU for a Predictive Self-healing (PSH) diagnosis instead of the actual faulty FRU. The issue only occurs for a PSH (aka FMA) diagnosis and not if the fault is diagnosed by POST.


2. Contributing Factors

This issue can occur on the following releases:

SPARC Platform

  • Solaris 10 with patch 125369-02 or later

And in addition to the above, this issue only occurs on the following subset of sun4v class platforms:

  • Sun Fire T1000/T2000
  • Netra T2000
  • Sun Blade T6300
  • Sun SPARC Enterprise T5120/T5220 (see Resolution section below)
  • Sun SPARC Enterprise T5140/T5240 (see Resolution section below)
  • Netra T5220  (see Resolution section below)
  • Sun Blade T6320 (see Resolution section below)


Notes:
- The issue will not be seen on the Netra CP3060 Board

- The SPARC Enterprise T5120/T5220 is shipped with the above patch 125369-02 and therefore is always vulnerable to this issue

- For the Sun Fire T1000/T2000, Netra T2000 and the Sun Blade T6300 to be vulnerable, the above patch must be installed in the Control Domain in an LDOMs configuration

- For the Sun SPARC Enterprise T5140/T5240 the issue does not occur
for memory faults (SUN4V-8000-E2 and SUN4V-8000-EX)

3. Symptoms

This issue is seen in the output of the following service processor commands and utilities:

- ALOM: showfaults
- ALOM: showfru (the motherboard FRUID will be marked faulty)
- ILOM: show /SP/faultmgmt
- ILOM: "fault_state" property for the motherboard (/SYS/MB)
- ILOM web interface: "Fault Management" tab

Notes:

The issue with ALOM output applies to all affected sun4v platforms.

The issue with ILOM output applies to the T5120/T5220 and T5140/T5240 only.

On T5120/T5220 the FB-DIMM fault indicators do not operate if the PSH diagnosis had indicted a DIMM as the faulty FRU.

In a system with DIMMs or PCI-E adapters that have been faulted by PSH diagnosis on the host, the ALOM showfaults command displays the faulty FRU as the motherboard instead of the DIMM or PCI-E adapter.

This issue occurs for the following FMA Message-ID's (MSGID):

- SUN4V-8000-E2
- SUN4V-8000-DX
- SUN4-8000-4P
- SUN4-8000-A2
- SUN4-8000-75
- SUN4-8000-9J
- SUN4-8000-D4
- PCIEX-8000-0A
- PCIEX-8000-DJ
- PCIEX-8000-HS

Example #1:

Illustrates the problem on T5120/T5220 (the actual fault is a DIMM fault) :

sc> showfaults -v
Last POST Run: Jul 13 18:32:11 2007
Post Status: Passed all devices
ID Time               FRU        Class     Fault        0
Jul 13 19:31:34    /SYS/MB              Host detected fault, MSGID:
SUN4V-8000-DX  UUID: 7b471945-ceef-eea0-c3ad-85ca140be5b2

Example #2:

Illustrates the problem on T1000/T2000 (the actual fault is a DIMM fault):

sc> showfaults -v
Last POST run: TUE AUG 07 10:44:12 2007
POST status: Passed all devices

    ID Time              FRU               Fault
     1 SEP 26 15:21:54   MB                Host detected fault, MSGID:
SUN4V-8000-E2  UUID: bfef53c1-b468-e4c8-9b29-b463cc24760a

 

4. Workaround

1. Use the Fault Management utilities on the host to identify the faulty FRU. Instructions for using these utilities can be found in the Predictive Self-Healing Knowledge Articles located at:

http://www.sun.com/msg/<MSGID>

where <MSGID> is the Message-ID displayed by the 'showfaults' command and the PSH console message displayed on the host.

2. If you do not have web access, the faulty FRU can be identified from the Solaris command:

# fmdump -v -u <UUID>

where <UUID> is the Event-ID displayed by the 'showfaults' command and the PSH console message displayed on the host.

a) Here is the 'fmdump' output, which identifies the correct FRU for the fault reported by showfaults in example #1 (T5120/T5220):

# fmdump -v -u  7b471945-ceef-eea0-c3ad-85ca140be5b2
TIME                 UUID                                 SUNW-MSG-ID
Jun 21 23:50:16.6635 7b471945-ceef-eea0-c3ad-85ca140be5b2 SUN4V-8000-DX
95%  fault.memory.dimm
      Problem in:mem:///unum=MB/CMP0/BR1:CH1/D0/J1601
         Affects:mem:///unum=MB/CMP0/BR1:CH1/D0/J1601
            FRU: hc://:serial=00CE01062351032B66:
              part=371-2143-01 Rev01//motherboard=0/chip=0/branch=1/
              dram-channel=1/dimm=0
         Location: MB/CMP0/BR1: CH1/D0/J1601

b) Here is the 'fmdump' output, which identifies the correct FRU for the fault reported by showfaults in example #2 (T1000/T2000):

# fmdump -v -u bfef53c1-b468-e4c8-9b29-b463cc24760a
TIME                 UUID                                 SUNW-MSG-ID
Sep 26 16:31:16.5027 bfef53c1-b468-e4c8-9b29-b463cc24760a SUN4V-8000-E2
     95%  fault.memory.bank
          Problem in: mem:///unum=MB/CMP0/CH0:R0/D1/J0701
             Affects: mem:///unum=MB/CMP0/CH0:R0/D1/J0701
                 FRU: hc://:serial=98064193:part=//motherboard=0/chip=0/
                      dram-channel=0/rank=0/dimm=1
            Location: MB/CMP0/CH0: R0/D1/J0701

     95%  fault.memory.bank
          Problem in: mem:///unum=MB/CMP0/CH0:R0/D0/J0601
             Affects: mem:///unum=MB/CMP0/CH0:R0/D0/J0601
                 FRU: hc://:serial=98064178:part=//motherboard=0/chip=0/
                 dram-channel=0/rank=0/dimm=0
            Location: MB/CMP0/CH0: R0/D0/J0601

3. Once the faulty FRU(s) have been replaced and the PSH fault has been cleared, the entry in 'showfaults' will be deleted and the fault recorded in the motherboard FRUID will be cleared.

If the entry in 'showfaults' does not clear automatically, it can be manually cleared with the ALOM command:

sc> clearfault <UUID>

where <UUID> is the Event-ID displayed by the 'showfaults' command and the PSH console message displayed on the host.

Notes:

a) If the FRU displayed by the Solaris 'fmdump' command is different from the FRU displayed by 'showfaults', then the FRU displayed by 'fmdump' is the correct faulty FRU.

b) In many cases, the faulty FRU is the motherboard so the output of ALOM 'showfaults' and 'fmdump' will agree.

c) If the fault had been diagnosed by POST the issue does not occur and the FRU displayed by the ALOM 'showfaults' commnand is correct.

- For the T1000 and T2000, a POST diagnosed fault includes the text string "deemed faulty and disabled" in the output of 'showfaults'.

- For the T5120 and T5220, a POST diagnosed fault includes the text string "Forced fail" in the output of 'showfaults'.

d) The Message-ID and Event-ID (UUID) displayed by the showfaults command are correct even though the faulty FRU may be incorrect.


5. Resolution

This issue is addressed in the following releases:

SPARC Platform

And the following platforms:
  • Sun SPARC Enterprise T5120/T5220  / Netra T5220 /  Sun Blade T6320 with firmware patch 136932-01 (firmware revision 7.1.0.g) or later
  • Sun SPARC Enterprise T5140/T5240 with firmware patch 136936-02 (firmware revision 7.1.0.g) or later
        

Upgrading to Solaris 10 Update 5 or later is required for these platforms:
  • Sun Fire T1000/T2000
  • Netra T2000 
  • Sun Blade T6300 


This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


Modification History

18-Jun-2008: Updated Contributing Factors and Resolution sections
15-Sep-2008: Updated Resolution section. Now Resolved.




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 200664
Article Type : Sun Alert
Last reviewed : 2008-09-15
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1