Sun Fire Server with Solaris 10 may Panic or Reset with lpost message, asynchronous event, fail to stop CPU or send_mondo timeout |
|
| Category : | Availability | | Release Phase : | Resolved | | Bug Id : | 6684726, 6699498
| | Date of Resolved Release : | 01-Aug-2008
| | Product : | Sun Fire E2900 Server Sun Fire E4900/E6900 Server Sun Fire E4800/6800 Server Sun FireV1280 Server Netra 1280 Server Sun Netra 1290 Server Solaris 10 Operating System
|
Sun Fire Server with Solaris 10 may Panic or Reset with lpost message, asynchronous event, fail to stop CPU or send_mondo timeout. (see below for full details) 1. Impact Loss of application availability may occur due to a system panic or
reset. This type of fault is typically diagnosed to be a hardware failure and may lead to unnecessary hardware
replacement.
2. Contributing Factors This issue may occur on the following releases and platforms:
SPARC Platform
- Solaris 10 with Sun Fire E6900/E4900/E2900/6800/4800/4810/3800/V1280 Netra 1280 and Netra 1290 Systems and without patches 114527-11 (FW) and 137111-04 (Kernel)
Notes:
This issue is specific to the Midrange servers listed above
and only seen with SPARC USIV+
1.5GHz and 1.8GHz CPUs. It is considered possible to occur with USIV+ 1.95 GHz CPU.
Current firmware versions of System Controller
(ScApp) ScApp:5.20.9 (as delivered in patch 114527-10)
and
earlier are affected.
It has been observed when running programs that access OBP from
the OS. Examples of programs that access OBP from the OS are prtdiag, prtconf, cfgadm, picl, or other
third party System management software.
This issue is very timing dependent and expected to be rare.
3. Symptoms
Console logs and core files are useful in identifying whether the
system is experiencing
this issue.
-------------------------------------------
Console Logs Showing send_mondo panic:
domainA console login: {/N0/SB2/P0/C1} @(#) lpost 5.20.8 2007/11/20
10:33
{/N0/SB2/P0/C1} Copyright 2007 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB2/P0/C1} Use is subject to license terms.
send mondo timeout [8307178 NACK 0 BUSY] IDSR 0x4000000000000000
cpuids: 0x208
panic: failed to stop cpu520
panic[cpu3]/thread=3005bc3e080: send_mondo_set: timeout
000002a10062e9c0 SUNW,UltraSPARC-IV+:send_mondo_set+454 (2a10062eba0,
... 1, 2a10062eab0, 0)
%l0-3: aaaaaaaaaaaaaaaa 000000000000002f 000000000000002f
0000000000000209
%l4-7: 0000000001221400 00000007274a4ba9 4000000000000000
0000000000000040
000002a10062eaf0 unix:xt_some+194 (2a10062ed78, 2a10062ebf0, fffff7,
fffffffffffffff8, 2a10062eba8, 0)
-------------------------------------------
Console logs showing Asynchronous Event and failed to stop:
domainA console login: {/N0/SB5/P2/C1} @(#) lpost 5.20.8
2007/11/20 10:33
{/N0/SB5/P2/C1} Copyright 2007 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB5/P2/C1} Use is subject to license terms.
{/N0/SB5/P2/C1} @(#) lpost 5.20.8 2007/11/20 10:33
{/N0/SB5/P2/C1} Copyright 2007 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB5/P2/C1} Use is subject to license terms.
{/N0/SB5/P2/C1} WARNING: Asynchronous Event.
{/N0/SB5/P2/C1} Component under test: /N0/SB5/P2 CPU
{/N0/SB5/P2/C1} AFSR1 EXT: 00000000.00000000 AFSR2 EXT:00000000.00000000
{/N0/SB5/P2/C1} tl tt tstate tpc tnpc
{/N0/SB5/P2/C1} 01 63 00000044.80000605
000007ff.f000c370000007ff.f000c374
Apr 23 17:31:15 e13-sc1 Domain-A.SC: Active - Panicking
panic: failed to stop cpu534
panic[cpu7]/thread=30029331920: bad kernel MMU miss at TL 2
4. Workaround
Workarounds for this issue are on a case by case basis and require
consultation with Sun Services.
An individual action plan will be developed for your environment.
5. Resolution
This issue is addressed on the following releases and platforms:
SPARC Platform
- Solaris 10 with Sun Fire E6900/E4900/E2900/6800/4800/4810/3800/V1280 Netra 1280 and Netra 1290 Systems and with patches 114527-11 (FW) and 137111-04 (Kernel)
This Sun Alert notification is being provided to you on an "AS IS"
basis. This Sun Alert notification may contain information provided by
third parties. The issues described in this Sun Alert notification may
or may not impact your system(s). Sun makes no representations,
warranties, or guarantees as to the information contained herein. ANY
AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU
ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT
OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This
Sun Alert notification contains Sun proprietary and confidential
information. It is being provided to you pursuant to the provisions of
your agreement to purchase services from Sun, or, if you do not have
such an agreement, the Sun.com Terms of Use. This Sun Alert
notification may only be used for the purposes contemplated by these
agreements.
Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. Modification History01-Aug-2008: Updated Contributing Factors and Resolution sections. Now Resolved. AttachmentsThis solution has no attachment
|
|
Login Required
You must login and have a valid contract to access Sun's Premium content which includes:
- Sun Alerts
- Bugs
- Patches
- Solutions
- White Papers
- Documentation
- Support Knowledge
Login Required
You must login and have a valid contract to access Sun's contracted features
|
Access Legend:
Sun Contracted Content
Sun Contracted Feature
|
Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.
|