Sun Fire 3800/4800/4810/6800, Sun Fire 12K/15K, Sun Fire V1280, and Netra 1280 Server Domains with 900MHz CPUs May Panic or Hang Due to Incorrect L2 SRAM Parameter Settings



Category :Availability
Release Phase :Resolved
Product :Sun Fire 12K Server
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire 15K Server
Sun Fire V1280 Server
Netra 1280 Server  
Bug Id :4808603, 4807422, 4809236  
Date of Workaround Release :31-JAN-2003 
Date of Resolved Release :17-MAR-2003 


Impact

Sun has identified an issue with L2 SRAM parameter settings on Sun Fire 3800/4800/4810/6800, Sun Fire 12K/15K, Sun Fire V1280 and Netra 1280 systems. This issue may cause L2 SRAM errors to be produced, which can lead to domain panics or hangs.


Contributing Factors

This issue can occur with Sun Fire 3800/4800/4810/6800 systems which have 900Mhz processors running the following firmware releases:

  • Sun Fire 3800/4800/4810/6800 without firmware patch 112883-05 (firmware 5.14.4)
  • Sun Fire 3800/4800/4810/6800 without firmware patch 112494-08 (firmware 5.13.5)
  • Sun Fire 3800/4800/4810/6800 with any version of firmware patch 112127 (firmware 5.12.x)

This issue can occur with Sun Fire V1280 and Netra 1280 systems which have 900Mhz processors running the following firmware releases:

  • Sun Fire V1280 and Netra 1280 systems without firmware patch 113751-02 (firmware 5.13.0012)

This issue can occur on Sun Fire 12K/15K systems which have 900Mhz processors running the following versions of HPOST:

  • Sun Fire 12K/15K with SMS 1.1
  • Sun Fire 12K/15K with SMS 1.2 without patch 112488-11
  • Sun Fire 12K/15K with SMS 1.3 without patch 114608-01

Symptoms

When this issue is encountered, error messages with one of the below character strings may be experienced.

UCC, UCU, EDC, EDU, WDC, WDU, CPC, CPU (not to be confused with references to a processor), Esynd 0x0071.

Below are descriptions of the resulting system behavior and how the error would be reflected in the message files.

A. Sun Fire 3800/4800/4810/6800, Sun Fire V1280, Netra 1280:

i. System reports many single bit errors from the same location with one or a combination of the character strings listed above and then panics or freezes.

ii. System reports the detection of a double bit error with one or a combination of the character strings listed above and in most cases automatically reboots.

iii. No messages from Solaris, messages from the System Controller only with the character string "ECC Syndrome: 0x071".

iv. Failures detected in POST:

(Seen during the test)

      {/N0/SB0/P2} Component under test: /N0/SB0/P2 E-Cache
      {/N0/SB0/P2}    E-Cache RAM Compare Error J6400
      {/N0/SB0/P2}    address  00000000.00000028
      {/N0/SB0/P2}    expected 55555555.55555555
      {/N0/SB0/P2}    observed 55455555.55555555

(Seen at the end of the basic CPU tests)

      {/N0/SB0/P2}      E-Cache DIMM J6400 failed

B. Sun Fire 12K/15K:

i. System reports the detection of a double bit error with one or a combination of the character strings listed above and in most cases automatically reboots.

ii. Failures detected in POST:

     {SB03/P0} Component under test: /SB3/P0: E-Cache
     {SB03/P0}       E-Cache RAM Compare Error J4400
     {SB03/P0}       address  00000000.00010808
     {SB03/P0}       expected 00000000.00010808
     {SB03/P0}       observed 00000000.00010c08
     FAIL E$Dimm SB3/P0/E0: Failure indicated in CPU MBox Primary service FRU is Slot SB3.
     Proc SB3/P0: EpiBecacheR1_sc_tfunc(): Test FAILED
        or
     RECORDSTOP Detected for Slot SB17
     SDI EX17/S0  Master_Stop_Status0[31:0] = 50040108 MStop0[3]: SDI is Recordstopped
     SDI EX17/S0  Recordstop0[31:0]  = 04018400
                  Rstop0[16]: R    DARB texp request Recordstop (M)
                  Rstop0[26]: R 1E Slot0 asserted EccErr, enabled to cause Rstop (M)
     EPLD SB17 Ecc_Err:   Mask= F7  Err= 08  SDC reports EccErr
     SDC SB17  EccStatus[31:0] = 0000C041
               EccSt[15]: Safari port 0/1 Ecc error logged.
               Received by DXs from local Safari port 0, read operation.
     DX SB17/DX2  Ecc_Syndrome[31:0] = 00000071
     Syndr[ 8: 0]: P01 Data: 071: Probable Double-bit UE within a nibble
     Syndr[   15]: P01 Direction: 0: Safari port to DX (Incoming)

     NOTE: Error 071 is a "signal" of an Ecache Uncorrectable Error.
           ECC uncorrectable errors detected from Processor Port SB17/P0, no corresponding parity
           error in DXs or DCDSs. For multibit  errors, the lack of parity error is not
           sufficient to infer that the error originated in memory, it could be from the processor
           or DCDS/DX link.  The syndrome is a "signaling" UE that likely indicates an Ecache error.
           FAIL All Ecache on Port SB17/P0:  Rstop detected by DXs/SDC.
           Primary service FRU is Slot SB17.

iii. A DSMD rstop dump created on the SC during system operation which when examined with redx/wfail exhibits the same signature as the POST RECORDSTOP shown above.


Workaround

The impact of this issue may be reduced by installing Solaris Kernel patches for the following releases:


Resolution

This issue is addressed in the following releases:

Sun Fire 3800/4800/4810/6800:

  • Firmware 5.14.4 (or later) with patch 112883-05
  • Firmware 5.13.5 (or later) with patch 112494-08

Sun Fire 3800/4800/4810/6800 platforms with firmware 5.12.x should be upgraded to a later version (5.13.x, 5.14.x) with the appropriate patch.

Sun Fire V1280 and Netra 1280:

  • Firmware 5.13.0012 (or later) with patch 113751-02

Sun Fire 12K/15K:

Note: All domains must undergo a setkeyswitch standby/on operation after the patch is applied. This will run HPOST at the default level and apply the fix.

Sun Fire 12K/15K platforms with SMS 1.1 should be upgraded to SMS 1.2 (or later) and have the appropriate patch applied.




Modification History


Date: 20-FEB-2003
  • Updated Synopsis
  • Updated Contributing Factors
  • Updated Resolution

Date: 17-MAR-2003
  • Updated Contributing Factors
  • Updated Resolution
  • Changed State to Resolved

Date: 20-MAR-2003
  • Updated Resolution

Date: 04-APR-2003
  • Updated Resolution



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201236
Article Type : Sun Alert
Last reviewed : 2003-03-17
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1