linux-kernel - Re: Clock jumps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4BFEE9F3.8040200@redhat.com>
Date:	Thu, 27 May 2010 11:53:55 -1000
From:	Zachary Amsden <zamsden@...hat.com>
To:	Bernhard Schmidt <berni@...kenwald.de>
CC:	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Clock jumps

On 05/27/2010 08:32 AM, Bernhard Schmidt wrote:
> Alexander Graf<agraf@...e.de>  wrote:
>
> Hi,
>
>    
>> Do you have ntpd running inside the guest? I have a bug report lying
>> around about 2.6.33 with kvm-clock jumping in time when ntpd is used:
>> https://bugzilla.novell.com/show_bug.cgi?id=582260
>>      
> I want to chime in here, I have a very similar problem, but not with
> ntpd in the guest.
>
> The host was a HP ProLiant DL320 G5p with a Dualcore Xeon3075. System
> was a Debian Lenny with a custom 2.6.33 host kernel and a custom
> qemu-kvm 0.11.0 .deb ported from Ubuntu. The host is synced with ntpd.
>
> The guests are various Debian Lenny/Squeeze VMs, with a custom kernel
> (2.6.33 at the moment) with kvm-clock. Exclusively amd64 guest
> kernels, but one system has i386 userland.
>
> With this setup once in a while (maybe every other week) one VM would
> have a sudden clock jump, 6-12 hours into the future. No kernel messages
> or other log entries than applications complaining about the clock jump
> after the fact. Other VMs were unaffected.
>
> Yesterday I did an upgrade to Debian Squeeze. This involved a new
> qemu-kvm (0.12.4), but not a new host kernel. I also upgraded the guest
> kernels from 2.6.33 to 2.6.33.4.
>
> First of all, after the reboot the host clock was totally unreliable. I
> had a constant skew of up to five seconds per minute in the host clock,
> which of course affected the VMs as well.  This problem went away when I
> changed from tsc into hpet on the host. The host does CPU frequency
> scaling which is, as far as I know, known to affect TSC stability. I
> think I remember messages about tsc being unstable in earlier boots,
> maybe the detection did just not work this time.
>
> Worse, the clock jump issues in the guest appeared much more often than
> before. The higher loaded VMs did not survive ten minutes without
> jumping several hours ahead.
>
> Situation has stabilized after setting clocksource hpet in the guest
> immediately after boot. So it seems kvm-clock has some issues here.
>
> I've seen a preliminary patch floating around on the ML by Zachary
> Amsden, but I haven't tried it yet. It talks of backward warps, but so
> far I've only seen forward warps (the VM time suddenly jumps into the
> future), so it might be unrelated.
>    

I have an AMD Turion TL-52 machine with unreliable TSC.  It varies with 
CPU frequency, which is okay, we can compensate for that, but worse, it 
has broken clocking when in C1E idle.  Apparently, it divides down the 
clock input to an idle core, so it only runs at 1/16 or whatever of the 
rate, and adds a multiplier to the TSC increment, so it scales by 16 
instead of by 1 (whatever the actual numbers are I don't know, but this 
illustrates the point).  When it wakes up to service a cache probe from 
another core, it now runs with full clock rate ... and still uses the 
multiplier for the TSC increment.

The effect is that idle CPUs have TSC which may increase faster than 
running CPUs.  Given time, this delta can add to a very large number (in 
theory, it's a random walk, but it can go very far off).  If a VM is 
running on this CPU and happens to match the idle pattern without 
switching CPUs, time can effectively run accelerated on that VM, and 
very rapidly things are going to get confused.

Newer kernels should detect the host clock being unreliable quite 
quickly; my F13 machine detects it right away at boot.

I have server side fixes for this kvm-clock which seem to give me a 
stable clock on this machine, but for true SMP stability, you will need 
Glauber's guest side changes to kvmclock as well.  It is impossible to 
guarantee strictly monotonic clocksource across multiple CPUs when 
frequency is dynamically changing (and also because of the C1E idle 
problems).

There is one remaining problem to fix, the reset of TSC on reboot in SMP 
will destabilize the TSCs again, but now I've actually got VMs running 
again (different bug), that shouldn't be long.

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/