New dstat_ntp plugin
Every year I am confronted with another VMware guest time synchronization problem, it seems. I have become pretty good at debugging such problems :-)
You have 2 different unrelated problems. Either time goes too slow in a VM guest, which is fixable. Or time goes too fast in a VM guest, which is basicly impossible to fix by any time synchronization method.
If time is going too fast in a VM, both VMwareTools host-guest time synchronization as well as ntp will refuse to correct time arguing that it is unsafe to set time back as nobody expects to relive the same moment in time. So the only way to fix it is by influencing the kernel.
Time usually goes too fast in VM guests because the hypervisor is unable to provide the VM guest with sufficient timer interrupts, these interrupts do not get lost they are being delayed. The kernel however, noticing there are interrupts missing and is trying to compensate for those missing timer interrupts and you end up with more interrupts than expected.
The above problem I was asked to help debug was related with RHEL4.6 64bit on a VMware ESX 3.0.2 and we tried the usual (recommended) fixes by adding kernel parameters. In fact, these systems were loaded with the advised kernel parameters but nevertheless it failed.
I tried some other suggestions from the VMware Timekeeping document, but none of the suggestions did the job and time kept on going too fast (very slowly) inside the VM guest.
Usually I measure the number of kernel timer interrupts by doing:
dstat -t -i -I0 --debug
This shows the time (inside the VM guest) together with the number of timer interrupts we received and this usually gives you a quick look at what's going on. However, we still get the time from inside the VM guest and the only way to monitor that with "real time" was to have a separate window running the same.
Not anymore, I implemented a dstat_ntp plugin which now gives me the time directly from an NTP server next to local time (in the same interval), use it like this:
DSTAT_NTPSERVER=0.centos.pool.ntp.org dstat -t -i -I0 -M ntp --debug
VMware recently changed it advisory about time synchronization to move from Host-Guest synchronization to NTP synchronization. Not that it helps if your clocks is too fast though...
BTW: The solution to time going too fast on RHEL4.6 (or older) kernels is to move to a RHEL4.7 kernel and use the divider=10 option to reduce the number of timer interrupts from 1000 to 100. This makes it easier for the hypervisor to guarantee those interrupts in a timely manner.