On Rare Occasions, Sun Fire 3800/4800/4810/6800 and E4900 and E6900 Systems May Experience Data Loss During Clock Jumps |
|
| Category : | Data Loss |
| Release Phase : | Resolved |
| Product : | Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Solaris 9 Operating System Sun Fire E6900 Server Solaris 8 Operating System Sun Fire E4900 Server
|
| Bug Id : | 4966931, 4618950, 5089758
|
| Date of Workaround Release : | 22-DEC-2003
|
| Date of Resolved Release : | 16-AUG-2005
|
Impact
Sun Fire 3800/4800/4810/6800 and E4900 and E6900 servers do not always maintain their hardware clock correctly which may create issues for the Network Time Protocol (NTP) function and "date -a" command. The potential for data loss involves anything that relies on a consistent time stream, such as a transaction log. For example, replaying a transaction log with a negative time increment in it could potentially lose data if any such application assumes (not unreasonably) that all time increments are positive.
Contributing Factors
This issue can occur in the following releases:
SPARC Platform
OR systems on the following platforms:
- Sun Fire 3800/4800/4810/6800 with firmware revision prior to 5.19.0 and without patch 114526-01
- Sun Fire E4900 and E6900 Servers with firmware revision prior to 5.19.0 and without patch 114526-01
This issue will only occur when these conditions are present:
- SNTP is enabled on the System Controller AND
- NTP is enabled in the domain(s)
To determine the firmware revision of the System Controller (SC), use the following command from the platfom shell of the main SC:
sc0:SC> showsc
SC: SSC0
Main System Controller
...
ScApp version: 5.15.3
RTOS version: 32
Symptoms
Should the described issue occur, "time step" messages similar to the following may be output to the "/var/adm/messages" log file:
# egrep ntp /var/adm/messages
...
xntpd[253]: [ID 774427 daemon.notice] time reset (step) -2.059495 s
xntpd[253]: [ID 774427 daemon.notice] time reset (step) 2.062533 s
along with any of the following messages in the System Controller platform log:
sc0:SC> showlogs
...
Platform.SC: SNTP bring clock forward by 1 seconds
Platform.SC: SNTP bring clock backward by 1 seconds
Platform.SC: System clock has drifted. Resynced with SNTP server
Workaround
To work around the described issue, isolate the domain from the virtual hardware clock by adding the following to "/etc/system":
set tod_broken = 1
set dosynctodr = 0
Note: A reboot may be required in order for this change to take effect.
Resolution
This issue is resolved in the following releases:
SPARC Platform
AND:
- System Controller Firmware (5.19.0) patch 114526-01 or later
Notes:
- Both patches for either Solaris 8 or 9 must be installed to resolve this issue.
- BugID 4966931 was fixed in the 5.18.0 firmware provided by patch 114525-01, however, for complete resolution including the fix to BugID 5089758, it is necessary to install the 5.19.0 firmware provided by 114526-01.
Modification HistoryDate: 09-DEC-2004
-
Updated Contributing Factors for firmware patch and revision clarifications
-
Updated Resolution for firmware patch and revision clarifications
Date: 17-NOV-2004
-
Final patch additions/corrections; re-release as resolved
Date: 29-JUL-2005
29-Jul-2005:
- Update Resolution section; reopened, State changed to Workaround
Date: 16-AUG-2005
16-Aug-2005:
- Updated Contributing Factors and Resolution sections
AttachmentsThis solution has no attachment