Wednesday 9 February 2011

Computer Clocks and VMWare ESXi

I recently noticed that my home Linux server, which now runs on VMWare ESXi was losing time when it was suspended by the ESXi server. Every time the guest was suspended, the clock just stopped and started again, possibly several days later, still using the time when the virtual machine was halted.

I checked the hardware clock (/sbin/hwclock – I needed to be root to do so on my Ubuntu distribution) and it was correct – like other VMWare products, ESXi mimics the CMOS clock of a real platform and uses the hardware clock to do so (and it can update this regularly using an NTP server) but the response from the date command was still stuck to the time a few days ago when the machine was switched off. Several of the “clocks” in our house show the time from this server, so it needs to be accurate. I waited half an hour to see if the time would sort itself out and so it would cross an hour boundary to see if that would help. It didn’t – all my clocks were still wrong.

I learned several things in my subsequent investigation which may prove useful or interesting to other people:
  • OSes only read the time from the hardware clock when they are booted up (or only every hour or so). After that, they count several times a second and derive the time and date from this counter. The CMOS clock only reports time to the nearest second and it is presumably quite slow to access so the OS sorts out the time after bootup itself. The ticker is often a hardware device (and several different ones can be used, depending on OS and available hardware), but will generally cause an interrupt that needs to be serviced on each tick.
  • Windows (NT derivatives only) updates the ticker once an hour from the system CMOS clock (correcting only if it is more than 60 seconds different). Linux generally does not do this.
  • Linux systems can cause resource-hogging problems depending on the kernel used. Earlier Linux systems used to tick at 100Hz, but newer (2.6 kernels) use a 1000Hz timer by default. In a virtual environment, this imposes a more significant resource drain since the whole VM must be switched into context on the host, rather than just whichever CPU mode is required to service the timer interrupts on a physical system. Kernel directives can force the timer to use 100Hz on a virtualised installation which can often be a good idea, especially if the number of guests running simultaneously is relatively high. Kernels newer than 2.6.28-7.18 are more VM friendly and perform timing operations in a different manner
  • As you might expect, if VMTools is installed and running correctly (which I thought it was, but for some reason I still haven’t been able to replicate, it wasn’t running when my machine came out of suspend) this is all taken care of automatically.

No comments:

Post a Comment