Power Cycling an Enterprise 10000 Server Domain May Cause hpost(1M) to Erroneously Fail Some Resources



Category :Availability
Release Phase :Resolved
Product :Sun Enterprise 10000 Server  
Bug Id :4310528  
Date of Resolved Release :26-AUG-2004 


Impact

On an Enterprise 10000 Server with System Service Processor (SSP) 3.3, 3.4 or 3.5, hpost(1M) may erroneously fail IOCs or have "Procs time out" during xcall testing after the poweroff and during the bringup of a domain.


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • SSP 3.3 (for Solaris 2.6, 7, 8) without patch 108885-11
  • SSP 3.4 (for Solaris 2.6, 7, 8) without patch 110304-07
  • SSP 3.5 (for Solaris 7 and 8) without patch 110498-02

Symptoms

If the described issue occurs, the hpost(1M) failure(s) encountered will be different based upon the hpost(1M) "level" that is run. Noting that hpost(1M) phases are ordered, (xcall, ... io, ... final_config), the level of hpost(1M) that is run determines which hpost(1M) phase will encounter the failure first:

hpost(1M)levels and corresponding symptoms

-------------------------------------------------------------------------
hpost phase	     level7 to 15    level16 to 23    level24 or higher
-------------------------------------------------------------------------
phase xcall		Not run		Not run		Proc time outs
phase io		Not run		FAIL IOCs	*
phase final_config	FAIL IOCs	*		*
-------------------------------------------------------------------------
* = Subsequent FAILURE mode is indeterminate.

The actual hpost(1M) console outputs (or hpost(1M) log info) for the above failures are as follows:

Example failure for "phase xcall" (hpost level24 or higher):

    phase xcall: Interprocessor interrupt tests...
    Proc 3.0 timed out on test xcall interrupt vs. proc 3.2 id=0x2C. Test Failed.
    Proc 3.2 timed out on test xcall interrupt vs. proc 3.0 id=0x2C. Test Failed.
    Arbstop/Recordstop/Timeout recovery (1); rerun starting at:
    phase xcall: Interprocessor interrupt tests...

Example failure for "phase io" (hpost level16 [default]):

    phase io: I/O controller tests...
    {0.0} ERROR: SUBTEST=SYSIO <-> IOPC Synchronization ID=31.2
    {0.0} 	Expected interrupt did not occur,
    {0.0} 	Interrupt Register Address 00000108.00003808
    {0.0} 	testing interrupt number(INO) 21
    {0.0} SYSIO Master Sync 2 error:
    FAIL IOC 0.0 in all configs: SYSIO test failed
    {0.0} ERROR: SUBTEST=SYSIO <-> IOPC Synchronization ID=31.2
    {0.0} 	Expected interrupt did not occur,
    {0.0} 	Interrupt Register Address 0000010a.00003808
    {0.0} 	testing interrupt number(INO) 21
    {0.0} SYSIO Interrupt MID Error
    {0.0} Expected: 0x1
    {0.0} Received: 0x0
    {0.0} XOR:      0x1
    {0.0} SYSIO Master Sync 2 error:
    FAIL IOC 0.1 in all configs: SYSIO test failed

Example failure for "phase final_config" (hpost level7):

    phase final_config: Final configuration...
    Configuring in 3F, FOM = 67584.00: 12 procs, 8 Scards, 5632 MBytes.
    {0.0} 	Expected interrupt did not occur,
    {0.0} 	Interrupt Register Address 0000011a.00003808
    {0.0} 	testing interrupt number(INO) 21
    {0.0} SYSIO Interrupt MID Error
    {0.0} Expected: 0x3
    {0.0} Received: 0x0
    {0.0} XOR:      0x3
    {0.0} SYSIO Master Sync 2 error:
    {0.0} SYSIO Interrupt MID Error
    {0.0} Expected: 0x3
    {0.0} Received: 0x0
    {0.0} XOR:      0x3
    {0.0} SYSIO Master Sync 2 error:
    {0.0} *** Error in SYSIO 0xd master sync (2 retries)
    FAIL IOC 1.1 in config 3F: Initialization failure.

Note: Immediate, subsequent hpost(1M) runs will most likely not encounter the above failures again. However, the problem will intermittently persist for future hpost(1M) runs until the hpost(1M) patch is applied to the SSP.


Workaround

There is no workaround. Please see the "Resolution" section below.


Resolution

This issue is addressed in the following releases:

SPARC Platform

  • SSP 3.3 (for Solaris 2.6, 7, 8) with patch 108885-11 or later
  • SSP 3.4 (for Solaris 2.6, 7, 8) with patch 110304-07 or later
  • SSP 3.5 (for Solaris 7 and 8) with patch 110498-02 or later



Modification History




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201609
Article Type : Sun Alert
Last reviewed : 2004-08-17
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1