Hardware/PROM: Sun Fire E6900/E4900/E2900/6800/4800/4810/3800 and V1280 Systems Firmware Update |
Status: RELEASED
Patch Id: 114525-07
***********************************************************************
READ THE TERMS OF THE AGREEMENT ("AGREEMENT") IN THE LEGAL_LICENSE.TXT
FILE CAREFULLY BEFORE USING THIS SOFTWARE. BY USING THE SOFTWARE, YOU
AGREE TO THE TERMS OF THIS AGREEMENT. IF YOU DO NOT AGREE TO ALL OF THE
TERMS, PROMPTLY DESTROY THE UNUSED SOFTWARE.
***********************************************************************Summary: Hardware/PROM: Sun Fire E6900/E4900/E2900/6800/4800/4810/3800 and V1280 Systems Firmware Update
Date: Apr/27/2006
Installation Requirements:
See Special Install Instructions
Solaris Release: 8 9 10
Sun OS Release: 5.8 5.9 5.10
Unbundled Product: Hardware/PROM
Unbundled Release: ScApp:5.18.6, RTOS:45, SC POST:45
Xref:
Topic:
Sun Fire system controller and flashprom update 5.18.6
Relevant Architecture: sparc
BugId's fixed with this patch:
4500490 4640435 4650932 4657904 4667629 4683268 4690339 4738507 4793171 4824109 4828481 4832310 4834392 4853771 4866713 4880599 4882017 4911531 4915870 4924264 4939856 4953801 4953811 4955947 4957835 4964577 4965384 4966931 4968493 4969956 4981483 4982034 4982170 4984203 4984780 4985737 4987176 4987457 4987854 4988128 4992950 4993271 4993985 4994112 4994488 4994905 4996008 5000947 5001728 5003539 5004331 5005360 5005640 5005655 5006810 5006812 5007818 5007831 5009788 5009856 5009864 5010205 5010616 5010772 5011243 5011320 5012130 5012317 5014581 5015109 5015363 5018002 5019052 5020501 5020606 5020704 5020887 5021417 5022423 5022479 5023405 5025518 5027547 5028333 5028357 5028915 5028916 5028917 5029117 5029722 5029847 5029856 5030395 5031658 5031871 5034739 5034767 5034786 5034881 5035234 5035293 5035517 5035667 5036290 5036321 5037074 5039408 5039565 5039905 5040267 5040732 5041545 5041600 5041656 5042076 5042555 5042636 5043373 5044000 5045210 5049265 5050000 5050697 5050725 5050732 5051257 5051422 5053287 5053926 5055997 5057330 5057869 5058001 5058313 5060659 5060748 5061593 5062510 5062717 5062914 5065337 5066585 5067307 5068391 5068436 5068851 5068926 5070035 5070429 5072938 5074972 5076179 5077697 5080862 5081679 5083664 5088868 5089309 5089914 5090178 5090906 5091506 5091556 5093903 5099024 5099206 5101931 5106212 6175704 6183416 6189121 6190958 6193290 6202816 6204544 6217224 6217337 6225904 6232339 6239143 6264209 6266118 6269048 6287893 6314358 6319195 6321138 6369788
Changes incorporated in this version:
6369788
Patches accumulated and obsoleted by this patch:
111346-04 112127-03 112494-08 112883-07 112884-06 113751-05 114523-02 114524-06 800054-01
Patches which conflict with this patch:
Required Patches:
Obsoleted by:
Files Included in this Patch:
Install.info
README.114525-07
Sun_Fire_Entry-Level_Midrange_System_Administration_Guide.pdf
Sun_Fire_Entry-Level_Midrange_System_Controller_Command_Reference_Manual.pdf
Sun_Fire_Entry-Level_Midrange_System_Firmware_5.18.0_Release_Notes.pdf
Sun_Fire_Midrange_System_Controller_Command_Reference_Manual.pdf
Sun_Fire_Midrange_Systems_Firmware_5.18.0_Release_Notes.pdf
Sun_Fire_Midrange_Systems_Platform_Administration_Manual.pdf
copyright
lw8cpu.flash
lw8pci.flash
sgcpu.flash
sgiowci.flash
sgiowci_sp.flash
sgpci.flash
sgrtos.flash
sgsc.flash
Problem Description:
6369788 Change POSIX timezones 2007 transition dates - U.S. Energy Policy Act of 2005
(From 114525-06)
6314358 IdProm.java generates hostids; it shouldn't.
6321138 hostid is 'ffffffff' in 5.20.0 build 3
6287893 - RTOS 44: SC does not respond to ping but tNetTask looks running
6319195 - RTOS 45: lw8.c doesn't follow proper MAC address and hostid
(From 114525-05)
6239143 post misdiagnose with post-tolerate-ce=true when there is a CE condition
5028357 SC is "hard hung" doing a dumpconfig to host with tcp-wrappers
6204544 sc panic during ssh testing.
6264209 unknown (broken) power supplies are treated a A152
6266118 System error occurs during reset -x, which prevents xir from happening.
6232339 Sc panic with out of memory error.
(From 114525-04)
6269048 MICRON DIMM Boot Up Failure
(From 114525-03)
4828481 Console messages "addRecord: Segment TH Insufficient space Need 35 have 25"
4964577 local-mac-address? flag seems to be ignored by qfe adapter on a V1280
5070035 Alarm 3 on Lightweight 8 needs to be user programmable for backward compatiblity
5089914 RFE : need new power budget with Uniboard fully loaded 2 GB dimm.
5101931 XMITS3.0/PCIX/3.3V Slot: Data comparison failures with SunVTS iobustest
5106212 PS Failure Causes false FT Failure
6175704 2 E2900's got TO (Time Out) panics CPU513 during or very shortly after DR-ing in SB0
6183416 Certain DIMM failures cannot be isolated
6202816 add warning for incompatible dimm sizes on V1 and V2 uniboards
6217224 Copyright file needs updated for 2005.
6217337 Need to update the COBP banner to reflect the year 2005.
6225904 POST banner is not updated for 2005
(From 114525-02)
4690339 domain error isolation CM_EACK in C accompanied by ConsolePortError in D.
5010772 Jasper320 HBA not working in a Starcat/XMITS 5v slot
5058313 takes a long time to synchronize failover status after "setfailover force"
5091506 6800 System fails to boot with 6 JAG's loaded with 2g memory config.
5093903 SIGBUS error occurred during multiple showenvironment commands.
5099206 Seprom addRecord errors are not actionable
6189121 REGRESSION: inventory not showing the correct "powered on" time
6190958 Change Vcore voltage from 1.225 to 1.25 volts for Jag 3.x
6193290 V1280, 5.18.0/5.17.3 service mode contains engineering mode only commands.
(From 114525-01)
4500490 Flash Prom Binary Image utilties should be under source control
4640435 Portion of SNMP OID address space should be reserved for other projects
4650932 POST memory error messages do not provide enough information for debug.
4657904 DFRUID: updatePowerRecs() needs to write Event & Summary records independently.
4667629 RFE: "setdate" cmd should be forbidden if ntp server is configured
4683268 SC POST should reset M48T59 watchdog at boot
4738507 src/Makefile has obsolete hardcoded hostname
4793171 'setfail force' doesn't always force a failover
4824109 showboards -v command reports wrong DIMM conponent slot where no DIMM inserted
4832310 Failed to create, change password using eeprom to set security-mode=command
4834392 scapp "install" target is broken
4853771 main and spare SCs need to synchronize time-of-day when not using SNTP
4866713 .properties on cpu node lists incorrect clock-freq
4880599 When calculating ambient temp, an HPU w/ null sensors results in 0 degrees
4882017 To have the ability to tune sntp or change the default error reporting threshold
4911531 SC drifts time of about 1 to 2 seconds
4915870 get ip addr 0.0.0.0 when two serengeti boot diskless
4924264 misalign string in POST
4939856 Change syslog logger "level" to local0.warning for SNTP messages
4953801 sepromupdate should support 8MB Ecache
4953811 sepromupdate should accept D$ & E$ component names
4955947 SC panic after flashupdating the f/w on LW8
4957835 Automatically enable shell keys when in engineering/manufacturing mode
4965384 SCApp shall enforce proper FT and PS configuration rules
4966931 Enabling SNTP on Serengeti SC disrupts domain clock
4968493 "boot net:dhcp" breaks in the tftp phase requesting wrong boot file
4969956 CSTHl functionality should be integrated into SCapp
4981483 java.lang.ArrayIndexOutOfBoundsException: 2 - domain does not reboot
4982034 Ecache tag ecc err test needs to handle THCE
4982170 chmsgs reports unknown and unused message tags in 5.17-gate
4984203 Frame fan tray and RTS status are not logged
4984780 ScApp does not provide sc board revision to SunMC
4985737 error events are being reported after an automatic restoration has initiated.
4987176 panic trying to lock ISM page that isn't there
4987457 DX Safari Port Error Status registor dump maps incorrect type of error
4987854 repetitive message "the error buffer is full" can overwrite persistent logs
4988128 CPU Time-out (TO) from system bus during POST is not evaluated
4992950 console input does not resume on a failed cycle keyswitch
4993271 lpost 5.17.0 build 5.0 is mishandling fast_ecc_error_trap for the UCC case
4993985 Interconnect fails but all FRUs are still included in the domain.
4994112 after bootup, "Enable Sun Fire Link?" is not enabled even when it says yes
4994488 problem when using seprom-frutype-substitution
4994905 functioning A184 PS are not detected and acknowledged as powered on
4996008 java.lang.NullPointerException after multiple key off/on, then failover
5000947 illegal option -d shows up on help for showlogs
5001728 CHS implicate processor on uniboard when faulty DIMMs is really the problem
5003539 tunable 'Persistent logging' is not working right
5004331 incorrect data used for amazon fan tray power consumption.
5005360 sgcpu and sgpci is flashed with flashupdate (E2900)
5005640 ScApps for LW8 Amazon will not support 2N power.
5005655 fsr_test not cover enough bits
5006810 Domain level POST need to handle failed CPUs(master/slave) effectively
5006812 Update Artesyn D149 2.5 voltage margin for XMIT board.
5007818 lom[service]> help testinterconncet - not very helpful on V1280
5007831 lom[service]> testinterconnect -d A should not work on a V1280/Netra 1280
5009788 RFE: POST implementation for the new memory refresh rate ( shorten in 1/2 )
5009856 Implement new Vcore setting for Jaguar 3.x
5009864 SC Failover service port needs to be private
5010205 POST support for 1200 MHz USIV processors
5010616 'setchs -s faulty ...' does not disable all related components on V3 CPU
5011243 "The error buffer is full" message is misleading
5011320 showboards displays invalid Cpu Mask for Jaguar 3.0
5012130 SC accepts packets with private IP addresses.
5012317 Error data information not displayed when FPU
5014581 F4800 domain, 5.15.3 firmware stuck in "Active - Panicking" state
5015109 COD and SSH is not enabled for lw8
5015363 SBBC Reset Reason(s): Peer Reset, SC Reset button (rebooting the SC)
5018002 LW8: cannot enter Service when in Manufacturing mode
5019052 4800 cannot poweron domain in dual partition with 2 PS
5020501 TelnetServer.run: sock.accept() failed: java.net.SocketException: S_errno_ECONNR
5020606 cod setup should be moved to setupsc
5020704 disablecomponent command run, message said cannot enable component
5020887 "marked as Failed!" is inconsistent with the AD event message
5021417 downgrade allows connection type to be set to unavailable ssh option
5022423 error message every 30 seconds on the console
5022479 main sc did not come back in 60 seconds when rebooting, main became spare
5023405 usage statement for setescape differs from same on sg
5025518 confused msg shown when 'poweron grid0' issued on lw8
5027547 OBP needs to check for domain keyswitch state before dropping to OK prompt.
5028333 Remove sgiowci_sp.flash from future releases
5028915 lpost clears the i-cache microtags on the master cpu
5028916 the second dtlb t8_1 is not initialized properly at diag-level>init
5028917 2-way e-cache associativity is not always determined correctly for jaguar tests
5029117 ERROR: Missing text resource (flash downgrade e2900)
5029722 false msg:IB6/FAN0 Faulty: replacement required
5029847 SCApp needs to support JG3.1
5029856 PS failure caused I2C problems with other FRUs.
5030395 Serengeti POST does not use new FPROM Access Timing during domain level tests
5031658 Need to handle setBytes failure when write log messages to the persistent log.
5031871 Thread deadlock when clearing bad segment in E$ on poweron
5034739 show-post-results does not recognize Xmits IO boards
5034767 regression test stopper: POST fails SB0 and excludes it
5034786 Line numbers not included in stack traces
5034881 cobp generates incorrect replies to RARP requests
5035234 SC panic during boot after failover initiated from the SpareSC
5035293 Jar's Manifest isn't treated properly by SC JVM
5035517 redundent warning msg shown on console
5035667 outgoing telnet from SC causes ClassCastException
5036290 pci_lpost routine does not clear iommu entries for pci leaf B
5036321 RFE to the SCAPP command "sepromupdate" with a new option
5037074 License key interpreted as decimal instead of hex
5039408 fail IO test but processor gets the blame and marked bad
5039565 Variables VERSION and SUNW_PRODVERS should not be the same SMS release value
5039905 Killing a repeat thread causes error to be displayed
5040267 VXWORKS_BASEDIR needs to reference the new /ws/sg-firmware/vxworks-master-2.0
5040732 shownvram CLI with wrong args causes stack trace
5041545 Setkey off turn off sequence incorrect after a failed setkey on
5041600 Lightweight 8 flashupdate file issue
5041656 POST fails on ERROR: TEST=CPU Functional,SUBTEST=D-Cache Parity Tag Test ID=41.9
5042076 setk on failed to standby mode
5042555 ADM1023A -128C temp sensor warning needs to be reported as a temp sensor failure
5042636 Faulty system board causes NullPointerException and causes an impression that se
5043373 "ERROR: wrong value on timing register:..." should be removed
5044000 shownvram does not show contents of E$ & D$
5045210 Interconnect retest failed signals always passed.
5049265 SC hangs at Boot with virgin U2106
5050000 SC prints warning for replacing PS/FT on 4800 machine
5050697 SC should detect mixed PSs and print warning immediately after the PS inserted
5050725 confused warning message printed on console when CPU V3 inserted into 4800
5050732 PS-FT warning message printed on console does not match the real system config
5051257 mem2 /N0/SB4/P3/Cx timeout, has no hearbeat, not responding
5051422 SCApp needs to support JG3.1 and JG2.4
5053287 E2900 gives "I2c error: slave did not ACK" message after resetsc
5053926 Panic stacktrace should be logged
5055997 State bits in ecache tags not included in ecache tags test
5057330 inconsistency of dx name output for IO and SB/RP
5057869 setupnetwork - disable network should not have network option
5058001 Failed Echache Functional and SBBC Dev Error Status 200(5.18.0)
5060659 Incorrect scan chain length for Jaguar_3.0
5060748 upgrade from 5.15.0 to 5.18.0 firmware changes connection type on spare SC
5061593 Switching network settings clobbers SNTP server setting
5061593 Switching network settings clobbers SNTP server setting ./
5062510 DomainBufferWriter thread error.
5062717 Timing requirements for D150/D105/D151 converters
5062914 sntp setting is lost during upgrade to 5.18.0
5065337 Reset of Domain causes CHS disable board
5066585 ScApp should not power on processors on unsupported boards.
5067307 Possible regression due to fix for 4985737.
5068391 postTestList may be null during startCpu/stopCpu
5068436 showchs incorrectly parses the component string.
5068851 serengeti platform obp get wrong mac address of router foroff subnet tftp server
5068926 Detecting board failure during POST causes inconsistent display of board power
5070429 System controller, PANIC: Out of Memory, 5.17.1 firmware
5072938 Signal Dispatch: signal 10 in thread
5074972 Starcat post should implement rio i/o test functions
5076179 shell keys ^A ^X are not usable
5077697 "6800-SC[service]> testinterconnet" passes despite missing centreplane pin.
5080862 system does not not have SC failover automatically enabled after power cycle
5081679 NVRAMRC buffer is too small
5083664 backout changes for bug#5060748 which can cause ssh disabled.
5088868 4900 SC incorrectly indicates FT as non high volume, eventhough they are
5089309 Add jtag to support Jaguar 3.2
5090178 Request to undo the change for bug fixed 5045210.
5090906 Workaround needed for Schizo 2.2 problem
5091556 SC panics runs out of memory
5099024 Persistent Msg Log Error count corrupted.
Revision History:
112494-08 114525-02 114525-01 112127-03 112884-06 114525-03 112883-07 111346-04 114524-06 113751-05 114525-04 114523-02 114525-06 114525-05 800054-01
Patch Installation Instructions:
--------------------------------
Please refer to the Install.info file for instructions on updating
the firmware using the files included in this patch.
Special Instructions:
---------------------
Watchdog Timer - Sun Fire Entry-Level Midrange Systems 5.18.2 - 4/18/2005
=========================================================================
This text gives information on the application mode of the watchdog
timer on the Netra 1280 server.
The enhancement allows users to:
o Configure the watchdog timer - User applications running on the host can
configure and use the watchdog timer, enabling customers to detect fatal
problems from their applications and to recover automatically.
o Program Alarm 3 - This enables users to generate this alarm in case of
critical problems in their applications.
This README text provides the following sections to help you understand how to
configure and use the watchdog timer and program Alarm3:
o Upgrading the Firmware Using the lom -G Command
o Understanding the Watchdog Timer Application Mode
o Using the ntwdt Driver
o Understanding the User APIs
o Setting the Time-out Period
o Enabling or Disabling the Watchdog
o Rearming ("Patting") the Watchdog
o Getting the State of the Watchdog Timer
o Finding and Defining Data Structures
o Using the Sample Watchdog Program
o Programming Alarm3
o Understanding Error Messages
o Knowing Unsupported Features and Limitations
Upgrading the Firmware Using the lom -G Command
-----------------------------------------------
Note: Both the RTOS and ScApp need to be updated before
rebooting. Ignore any reboot messages you may receive in
between each update.
1) Upgrade the firmware on the system controller (SC):
#lom -G sgrtos.flash
#lom -G sgsc.flash
2) Escape to lom> and reset the SC:
lom> resetsc -y
To get to the Lights Out Management (lom) prompt, you can telnet directly into
the Ethernet port of the SC (this is different from the Solaris IP address), or
you can attach a console to the serial port on the SC. If you are remote from
the system, configure the SC's Ethernet port, or attach the SC serial port to a
network terminal server.
3) Upgrade the firmware on the system boards:
#lom -G lw8cpu.flash
#lom -G lw8pci.flash
4) Shutdown the Solaris(TM) Operating System (OS).
5) Power off the system.
lom poweroff
6) Power on the system.
lom poweron
Understanding the Watchdog Timer Application Mode
-------------------------------------------------
The watchdog mechanism detects a system hang, or an application hang or crash,
should they occur. The watchdog is a timer that is continually reset by a user
application as long as the operating system and user application are running.
When the application is rearming the application watchdog, an expiration can be
caused by:
o Crash of the rearming application
o Hang or crash of the rearming thread in the application
o System hang
When the system watchdog is running, a system hang, or more specifically, the
hang of the clock interrupt handler causes an expiration.
The system watchdog mode is the default. If the application watchdog is not
initialized, then the system watchdog mode is used.
The "setupsc" command, an existing command on the SC Lights Out Management can
be used to configure the recovery for the system watchdog ONLY:
lom> setupsc
The system controller configuration should be as follows:
SC POST diag Level [off]:
Host Watchdog [enabled]:
Rocker Switch [enabled]:
Secure Mode [off]:
PROC RTUs installed: 0
PROC Headroom quantity (0 to disable, 4 MAX) [0]:
The recovery configuration for the application watchdog is set using
Input/Output Control codes (IOCTLs) that are issued to the ntwdt driver.
Using the ntwdt Driver
----------------------
To use the new application watchdog feature, you must install the ntwdt
driver. To enable and control the watchdog's application mode, you must
program the watchdog system using the LOMIOCDOGxxx IOCTLs, described in the
section "Understanding the User API".
If the ntwdt driver, as opposed to the system controller, initiates a reset of
the Solaris OS on application watchdog expiration, the value of the following
property in the ntwdt driver's configuration file (ntwdt.conf) is used:
ntwdt-boottimeout="600";
In case of a panic, or an expiration of the application watchdog, the ntwdt
driver reprograms the watchdog time-out to the value specified in the property.
Assign a value representing a duration that is longer than the time it takes to
reboot and perform a crash dump. If the specified value is not large enough, the
SC resets the host if reset is enabled. Note that this reset by the SC occurs
only once.
Understanding the User API
---------------------------
The ntwdt driver provides an application program interface by using IOCTLs. You
must open the /dev/ntwdt device node before issuing the watchdog IOCTLs.
--------------------------------------------------------------------------------
NOTE: Only a single concurrent instance of open() is allowed on /dev/ntwdt. Any
subsequent open() generates the following error message: EAGAIN - (The driver is
busy, try again.)
--------------------------------------------------------------------------------
You can use the following IOCTLs with the watchdog timer:
o LOMIOCDOGTIME - Set time-out period for watchdog timer
o LOMIOCDOGCTL - Enable or disable watchdog timer
o LOMIOCDOGPAT - Rearm ("pat") watchdog timer
o LOMIOCDOGSTATE - Get state of watchdog timer
o LOMIOCALCTL - Set value of Alarm3
o LOMIOCALSTATE - Get state of Alarm3
Setting the Time-out Period
---------------------------
The LOMIOCDOGTIME IOCTL sets the time-out period of the watchdog. This IOCTL
programs the watchdog hardware with the time specified in this IOCTL. You must
set the time-out period (LOMIOCDOGTIME) before attempting to enable the watchdog
timer (LOMIOCDOGCTL).
The argument is a pointer to an unsigned integer. This integer holds the
new time-out period for the watchdog in multiples of 1 second. You can
specify any time-out period in the range of 1 second to 180 minutes.
If the watchdog function is enabled, the time-out period is immediately
reset so that the new value can take effect. An error (EINVAL) is displayed if
the time-out period is less than 1 second or longer than 180 minutes.
-----------------------------------------------------------------------------
NOTE: The LOMIOCDOGTIME is not intended for general purpose use. Setting the
watchdog time-out to too low a value might cause the system to receive a
hardware reset if the watchdog and reset functions are enabled. If the
time-out is set too low, the user application must be run with a higher
priority (for example, as a real time thread) and must be rearmed more
often to avoid an unintentional expiration.
-----------------------------------------------------------------------------
Enabling or Disabling the Watchdog
----------------------------------
The LOMIOCDOGCTL IOCTL enables or disables the watchdog, and it enables or
disables the reset capability. (See the "Data Structures" section for the
correct values for the watchdog timer.)
The argument is a pointer to the lom_dogctl_t structure (described in
greater detail in the "Data Structures" section).
Use the reset_enable member to enable or disable the system reset function.
Use the dog_enable member to enable or disable the watchdog function. An
error (EINVAL) is displayed if the watchdog is disabled and reset is
enabled.
--------------------------------------------------------------------------------
NOTE: If LOMIOCDOGTIME has not been issued to set up the time-out period prior
to this IOCTL, the watchdog is NOT enabled in the hardware.
--------------------------------------------------------------------------------
Rearming, or Patting, the Watchdog
----------------------------------
The LOMIOCDOGPAT IOCTL rearms, or pats, the watchdog so that the watchdog starts
ticking from the beginning; that is, to the value specified by LOMIOCDOGTIME.
This IOCTL requires no arguments. If the watchdog is enabled, this IOCTL must be
used at regular intervals that are less than the watchdog time-out, or the
watchdog expires.
Getting the State of the Watchdog Timer
---------------------------------------
The LOMIOCDOGSTATE IOCTL gets the state of the watchdog and reset
functions and retrieves the current time-out period for the watchdog. If
LOMIOCDOGSTATE was never issued to set up the time-out period prior to
this IOCTL, the watchdog is not enabled in the hardware.
The argument is a pointer to the lom_dogstate_t structure (described in
greater detail in the section on "Data Structures"). The structure members
are used to hold the current states of the watchdog reset circuitry and
current watchdog time-out period. Note that this is not the time
remaining before the watchdog is triggered.
The LOMIOCDOGSTATE IOCTL requires only that open() be successfully called. This
IOCTL can be run any number of times after open() is called, and it does not
require any other DOG IOCTLs to have been executed.
Finding and Defining Data Structures
------------------------------------
All data structures and IOCTLs are defined in lom_io.h, which is available in
the SUNWlomu package.
The data structures for the watchdog timer are shown here:
1. The watchdog/reset state data structure is as follows:
typedef struct {
int reset_enable; /* reset enabled if non-zero */
int dog_enable; /* watchdog enabled if non-zero */
uint_t dog_timeout; /* Current watchdog time-out in seconds */
} lom_dogstate_t;
2. The watchdog/reset control data structure is as follows:
typedef struct {
int reset_enable; /* reset enabled if non-zero */
int dog_enable; /* watchdog enabled if non-zero */
} lom_dogctl_t;
Using the Sample Watchdog Program
-----------------------------
Following is a sample program for the watchdog timer:
#include <sys/types.h>
#include <lom_io.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
int
main()
{
uint_t timeout = 30;
lom_dogctl_t dogctl;
int fd;
dogctl.reset_enable = 1;
dogctl.dog_enable = 1;
fd = open("/dev/ntwdt", O_EXCL);
/* Set timeout */
ioctl(fd, LOMIOCDOGTIME, (void *)&timeout);
/* Enable watchdog */
ioctl(fd, LOMIOCDOGCTL, (void *)&dogctl);
/* Keep patting */
while (1) {
ioctl(fd, LOMIOCDOGPAT, NULL);
sleep (5);
}
return (0);
}
Programming Alarm3
------------------
Alarm3 is available to Solaris Operating System users irrespective of the
watchdog mode. Alarm3 or system alarm ON and OFF have been redefined (see the
table below.)
Set the value of Alarm3 using the LOMIOCALCTL IOCTL. You can program Alarm3 like
you set and clear Alarm1 and Alarm2.
The following table presents the behavior of Alarm3:
Alarm3 Relay System LED (Green)
---------------------------------------------------------------------
Poweroff ON COM -> NC OFF
Poweron/LOM up ON COM -> NC OFF
Solaris running OFF COM -> NO ON
Solaris not running ON COM -> NC OFF
Host WDT expires ON COM -> NC OFF
User sets to ON ON COM -> NC OFF
User sets to OFF OFF COM -> NO ON
Alarm3 ON = Relay(COM->NC), System LED OFF
Alarm3 OFF = Relay(COM->NO), System LED ON
When programmed, you can check Alarm3 or the system alarm with the showalarm
command and the argument "system".
For example:
sc> showalarm system
system alarm is on
The data structure used with the LOMIOCALCTL and LOMIOCALSTATE IOCTLs is as
follows:
#include <lom_io.h>
#define ALARM_NUM_1 1
#define ALARM_NUM_2 2
#define ALARM_NUM_3 3
#define ALARM_OFF 0
#define ALARM_ON 1
typedef struct {
int alarm_no;
int alarm_state;
} lom_aldata_t;
Understanding Error Messages
----------------------------
Following are the error messages that might be displayed and what they mean:
EAGAIN
This error message is displayed if you attempt to open more than one instance
of open() on /dev/ntwdt.
EFAULT
This error message is displayed if an incorrect user-space address was
specified.
EINVAL
This error message is displayed if a nonexistent control command
was requested or invalid parameters were supplied.
EINTR
This error message is displayed if a thread awaiting a component state change
is interrupted.
ENXIO
This error message is displayed if the driver is not installed in
the system.
Knowing Unsupported Features and Limitations
--------------------------------------------
1) In the case of the watchdog timer expiration detected by the SC, the recovery
is attempted only once; there are no further attempts of recovery if the first
attempt fails to recover the domain.
2) If the application watchdog is enabled and you break into the OpenBoot(TM)
PROM (OBP) by issuing the "break" command from the system controller's "lom"
prompt, the SC automatically disables the watchdog timer.
--------------------------------------------------------------------------------
NOTE: The SC displays a console message as a reminder that the watchdog, from
the SC's perspective, is disabled.
--------------------------------------------------------------------------------
However, when you reenter the Solaris OS, the watchdog timer is still ENABLED
from the Solaris Operating System's perspective. To have both the SC and the Solaris OS view the same watchdog state, you must
use the watchdog application to either enable or disable the watchdog.
3) If you perform a dynamic reconfiguration (DR) operation in which a system
board containing kernel (permanent) memory is deleted, then you must
disable the watchdog timer's application mode before the DR operation and
enable it after the DR operation. This is required because Solaris software
quiesces all system IO and disables all interrupts during a memory-delete of
permanent memory. As a result, system controller firmware and Solaris software
can not communicate during the DR operation. Note that this limitation affects
neither the dynamic addition of memory nor the deletion of a board not
containing permanent memory. In those cases, the watchdog timer's application
mode can run concurrently with the DR implementation.
You can execute the following command to locate the system boards that contain
kernel (permanent) memory:
sh> cfgadm -lav | grep -i permanent
4) If the Solaris Operating System hangs under the following conditions, the
system controller firmware cannot detect the Solaris software hang:
o Watchdog timer's application mode is set
o Watchdog timer is not enabled
o No rearming is done by the user
5) The watchdog timer provides partial boot monitoring. You can use the
application watchdog to monitor a domain reboot.
However, domain booting is not monitored for:
o Bootup after a cold powerup
o Recovery of a hung or failed domain
In the latter cases, a boot failure is not detected and no recovery attempts are
made.
6) The watchdog timer's application mode provides no monitoring for application
startup. In application mode, if the application fails to start up, the failure
is not detected and no recovery is provided.
--------------------------------------------------------------------------------
Copyright 2006 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
This product or document is protected by copyright and distributed under
licenses restricting its use, copying, distribution, and decompilation.
No part of this product or related documentation may be reproduced in
any form by any means without prior written authorization of Sun and
its licensers, if any. Third party software, including font technology,
if any, is copyrighted and licensed from Sun suppliers.
Sun, Sun Microsystems, Solaris, the Sun Logo, Sun Fire, OpenBoot, and SPARC are
trademarks or registered trademarks of Sun Microsystems, Inc in the U.S.
and other countries. All SPARC trademarks are used under license and are
trademarks or registered trademarks of SPARC International, Inc. in the
U.S. and other countries. Products bearing SPARC trademarks are based
upon an architecture developed by Sun Microsystems, Inc.
Federal Acquisitions: Commercial Software - Government users subject to
standard license terms and conditions.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS.
REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO
BE LEGALLY INVALID.
--------------------------------------------------------------------------------
Copyright 2006 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.
Ce produit ou document est protege par un copyright et distribue avec
des licences qui en restreignent l'utilisation, la copie, la distribution,
et la decompilation. Aucune partie de ce produit ou document ne peut etre
reproduite sous aucune forme, par quelque moyen que ce soit, sans l'autorisation
prealable et ecrite de Sun et de ses bailleurs de licence, s'il y en a. Le
logiciel detenu par des tiers, et qui comprend la technologie relative aux
polices de caracteres, est protege par un copyright et licencie par des
fournisseurs de Sun.
Sun, Sun Microsystems, Solaris, le Sun logo, Sun Fire, OpenBoot, et SPARC sont
desmarques de fabrique ou des marques deposees de Sun Microsystems, Inc. aux
Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisees
sous licence et sont des marques de fabrique ou des marques deposees de
SPARC International, Inc. aux Etats-Unis et dans d'autres pays. Les
produits portant les marques SPARC sont bases sur une architecture
developpee par Sun Microsystems, Inc.
LA DOCUMENTATION EST FOURNIE "EN L'ETAT" ET TOUTES AUTRES CONDITIONS,
DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES
DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE
GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L'APTITUDE A UNE
UTILISATION PARTICULIERE OU A L'ABSENCE DE CONTREFACON.Special Install Instructions:
--------------------------------------
None.
README -- Last modified date: Thursday, April 27, 2006