linux-kernel - Re: [PATCH] Close small window for vsyscall time inconsistencies

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200804070255.22516.zippel@linux-m68k.org>
Date:	Mon, 7 Apr 2008 01:55:19 +0100
From:	Roman Zippel <zippel@...ux-m68k.org>
To:	john stultz <johnstul@...ibm.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Paul Mackerras <paulus@...ba.org>,
	Tony Luck <tony.luck@...el.com>, Ingo Molnar <mingo@...e.hu>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] Close small window for vsyscall time inconsistencies

Hi,

On Friday 4. April 2008, john stultz wrote:

> So Thomas and Ingo pointed out to me that they were occasionally seeing
> small 1ns inconsistencies from clock_gettime() (and more rarely, 1us
> inconsistencies from gettimeofday() when the 1ns inconsistency occurred
> on a us boundary)

What does inconsistency mean?

> Looking over the code, the only possible reason I could find would be
> from an interaction with the vsyscall code.
>
> In update_wall_time(), if we read the hardware at time A and start
> accumulating time, and adjusting the clocksource frequency, slowing it
> for ntp.
>
> Right before we call update_vsyscall(), another processor makes a
> vsyscall gettimeofday call, reading the hardware at time B, but using
> the clocksource frequency and offsets from pre-time A.
>
> The update_vsyscall then runs, and updates the clocksource frequency
> with a slower frequency.
>
> Another processor immediately calls vsyscall gettimeofday, reading the
> hardware (lets imagine its very slow hardware) at time B (or very
> shortly there after), and then uses the post-time A clocksource
> frequency which has been slowed.
>
> Since we're using basically the same hardware value B, but using
> different frequencies, its possible for a very small 1ns time
> inconsistency to occur.

One thing to keep in mind here is that if update_wall_time() adjusts the 
frequency at time A, the time is still the same after the frequency change at 
this point.
This means on the same cpu the time keeps increasing, if the update on another 
cpu is now delayed due to update_vsyscall() at time B, it's possible that 
there is a small time jump at this time, but in the common case it should be 
quite small to be even noticable, e.g. if the frequency is changed by 1us/s 
and it takes 1ms for the update the jump is 1ns and IMO that is already a 
lot.
I'm not saying that it's impossible that it results in a visible problem, but 
I think it should be rather rare. NTP frequency should be quite rare, at most 
every 16s and in standard configurations every 64s (over time even less). 
Inbetween these updates NTP changes its frequency very slowly. That leaves 
the clock frequency when it tries to match the NTP frequency, if you really 
see that large frequency changes, it suggest that something else is quite 
wrong, e.g. if the clock code has a problem to hold a halfway steady 
frequency, this should be fixed first.
So instead of shooting in the dark, I'd suggest to collect some numbers first, 
which support your theory. This starts with the NTP logs and then try add 
some stats to the adjustment code to see by how much the clock frequency is 
changed (e.g. the min/max/last mult values and the same for the number of 
cycles until update_vsyscall() is called).

bye, Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/