Sun Fire Midrange Server Time Jumps When SC Accumulates Extended Uptime |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Bug Id : | 6567546
|
| Date of Workaround Release : | 28-JUN-2007
|
| Date of Resolved Release : | 14-SEP-2007
|
| Product : | Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Sun Fire E6900 Server Sun Fire E2900 Server Sun Fire V1280 Server Sun Fire E4900 Server Sun Netra 1290 Server Netra 1280 Server
|
Impact
On certain Sun Fire systems, a System Controller (SC) running ScApp versions 5.18.X, 5.19.X or 5.20.X. with extended uptime may experience an abrupt clock change. The changes have been seen where the clock jumps forward to 828 days, or where the clock becomes unstable upon reaching 828 days. The forward jump has been seen jumping 60 days forward to 828 days. This clock change can affect the domain time, database applications, customer data, or any other clock related operations.
Contributing Factors
This issue can occur on the following platforms:
SPARC Platform
- Sun Fire 3800/4800/4810/6800/E2900/E4900/E6900/V1280 Servers
- Netra 1280/1290 Servers
with a System Controller (SC) running ScApp versions 5.18.X, 5.19.X or 5.20.X.
Note: Any system properly patched for Daylight Saving Time (DST) required a SC reboot in 2007, and the system would not be at risk for this issue until uptime accumulates. (See also DST SunAlert 102617).
To determine both the ScApp version and SC uptime, the following command can be run on the SC:
sc0:SC> showsc
SC: SSC0
Main System Controller
SC Failover: enabled but not active.
Clock failover enabled.
SC date: Wed Jun 27 13:34:37 GMT-04:00 2007
SC uptime: 82 days 22 hours 56 minutes 57 seconds
ScApp version: 5.20.3 Build_0
RTOS version: 46
Symptoms
This issue may occur after extended SC uptime, at which time the system controller time and the domain time may change abruptly.
Error messages will differ depending on how the system is configured, and whether the message is from the SC or from a domain. One example of error message is the following (may be from /var/adm/messages or to the console):
Feb 2 17:33:52 domain-a xntpd[975]:
[ID 261039 daemon.error] time error
X.X is way too large (set clock manually)
Workaround
If uptime is less than 575 days:
1. Add these lines to the /etc/system file for all domains to disable the domains from getting their time from the SC:
set tod_broken=1
set dosynctodr=0
2. Enable NTP on the domain, and SNTP on the system controller. Refer to the appropriate Solaris Systems Administration Guide for your Solaris Release, and the appropriate Platform Administration Guide for your ScApp Release.
3. Reboot the system controllers.
4. Reboot the domain at next available maintenance window prior to reaching 575 Days uptime on the system controller.
If uptime is greater than 575 days:
1. Add these lines to the /etc/system file for all domains to disable the domains from getting their time from the SC:
set tod_broken=1
set dosynctodr=0
2. Manually disable the running domain from getting it's time from the SC. Either reboot the domain with the /etc/system changes listed in step 1, or run the script below. The following script can be invoked as "root" on the running domain to change the value of "tod_broken" and "dosynctodr" in the running domain's kernel:
#!/bin/sh
#
# Set tod_broken and dosynctodr
#
echo "tod_broken ?W 1" | adb -w -k /dev/ksyms /dev/mem
echo "dosynctodr ?W 0" | adb -w -k /dev/ksyms /dev/mem
#
# exit 0
3. Enable NTP on the domain, and SNTP on the system controller. Refer to the appropriate Solaris Systems Administration Guide for your Solaris Release, and the appropriate Platform Administration Guide for your ScApp Release.
4. Reboot the system Controllers
(see "Contributing Factors" for determination of uptime on the SC).
Note: The domain will still get it's initial time from the SC on boot even with the /etc/system setting. This functionality cannot be fully disabled.
If the domain experiences this issue and the domain time has already changed, a reboot of the SC will be necessary. The time will also need to be adjusted manually on both the domain and SC. A reboot of the affected domain may also be necessary.
Resolution
This issue is addressed in the following release:
SPARC Platform
- ScApp version 5.20.7 or later (as delivered in patch 114527-08 or later)
for the following systems:
- Sun Fire 3800/4800/4810/6800/E2900/E4900/E6900/V1280 Servers
- Netra 1280/1290 Servers
Modification HistoryDate: 01-AUG-2007
- Updated Impact and Relief/Workaround sections
Date: 14-SEP-2007
- Updated Resolution section
- State: Resolved
AttachmentsThis solution has no attachment