Sun Enterprise Systems With Recently Manufactured UltraSPARC II MSRAM Modules May Experience CPU Failures |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun Enterprise 3500 Server Sun Enterprise 4500 Server Sun Enterprise 5500 Server Sun Enterprise 6500 Server Sun Enterprise 10000 Server
|
| Bug Id : | 4795832
|
| Date of Workaround Release : | 31-JAN-2003
|
| Date of Resolved Release : | 24-JUL-2003
|
Impact
Sun Enterprise 10000 and Enterprise 3500/4500/5500/6500 servers containing recently manufactured UltraSPARC II modules may experience CPU failures due to an MSRAM socket issue. When this issue occurs, systems may reboot or panic due to Uncorrectable Memory Errors.
Contributing Factors
This issue can occur in the following releases:
-
Enterprise 10000 servers with UltraSPARC II MSRAM CPU modules
-
Enterprise 3500/4500/5500/6500 servers with UltraSPARC II MSRAM CPU modules
Notes:
1) Only manufactured modules which shipped between July 2002 and December 2002 are potentially affected. This includes, but is not limited to, modules with the following part numbers.
-
501-5814 400MHZ/8MB
-
501-5815 400MHZ/8MB
-
501-6009 400MHZ/8MB
-
501-5816 464/466MHZ/8MB
-
501-5798 464/466MHZ/8MB
2) Enterprise 3500/4500/5500/6500 and Enterprise 10000 CPU modules which are affected by this issue will show symptoms within the first 120 days of operation, and typically within the first 10-90 days. Modules which have not shown symptoms within the stated time periods are not affected by this issue and will have a service life similar to those shipped before this issue arose.
3) This issue results from a manufacturing process variation for CPU module sockets which caused a small number of sockets to be manufactured outside of the product design specification. The vast majority of sockets are still within design specification and will function as expected. Sun is not able to track the out of specification sockets to specific CPU modules or date codes.
Symptoms
When this issue occurs the system may reboot with the following type of error:
p5 UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0203 UDBL.ESYND 0x03
p5 UDBL Syndrome 0x3 Memory Module Board 6 J3101 J3201 J3301
J3401 J3501 J3601 J3701 J3801
p5 unix: WARNING: [AFT1] errID 0x00027f15.a5d8bc56 Syndrome 0x3
indicates that this may not be a memory module problem
p5 unix: [AFT2] errID 0x00027f15.a5d8bc56 PA=0x00000003.3d5ade08
p5 E$tag 0x00000000.18c067ab E$State: Exclusive E$parity 0x0c
p5 unix: [AFT2] E$Data (0x00): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x08): 0x20202020.20202022 *Bad* PSYND=0x00ff
p5 unix: [AFT2] E$Data (0x10): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x18): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x20): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x28): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x30): 0x20202020.20202020
p5 unix: [AFT2] E$Data (0x38): 0x20202020.20202020
p5 unix: NOTICE: Scheduling clearing of error on page 0x00000003.3d5ac000
p5 unix: [AFT3] errID 0x00027f15.a5d8bc56 Above Error is in User Mode
p5 and is fatal: will reboot
p5 unix: WARNING: [AFT1] initiating reboot due to above error in pid 9744 (java)
Systems may also experience panics due to Uncorrectable Memory Errors:
WARNING: [AFT1] EDP event on CPU1 Data access at TL=0, errID 0x00000093.6323e6f8
AFSR 0x00000000.80408000<PRIV,EDP> AFAR 0x00000000.06901980
AFSR.PSYND 0x8000(Score 95) AFSR.ETS 0x00 Fault_PC 0x78128a84
UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
panic[cpu1]/thread=30000ae5000: [AFT1] errID 0x00000093.6323e6f8 EDP Error(s)
Workaround
Customers experiencing this issue should contact their Sun Services Representative for assistance.
Resolution
Customers experiencing this issue should contact their Sun Services Representative for assistance.
Modification HistoryDate: 17-MAR-2003
-
Updated Impact
-
Updated Contributing Factors
Date: 24-JUL-2003
-
State: Resolved
-
Updated Contributing Factors and Resolution sections
AttachmentsThis solution has no attachment