lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 01 Jul 2012 15:05:08 -0700
From:	John Stultz <johnstul@...ibm.com>
To:	Linux Kernel <linux-kernel@...r.kernel.org>
CC:	Prarit Bhargava <prarit@...hat.com>, stable@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Jan Engelhardt <jengelh@...i.de>
Subject: Re: [PATCH 0/2][RFC] Potential fix for leapsecond caused futex issue
 (v2)

On 07/01/2012 11:29 AM, John Stultz wrote:
> TODOs:
> * Chase down the futex/hrtimer interaction to see if this could
> be triggered in any other way.

Ok, got a little more detailed diagnosis of what is going on figured out:

* Leap second occurs, CLOCK_REALTIME is set back one second.

* As clock_was_set() is not called, the hrtimer base.offset value for 
CLOCK_REALTIME is not updated, thus its sense of wall time is one second 
ahead of the timekeeping core's.

* At interrupt time (T), the hrtimer code expires all CLOCK_REALTIME 
based timers set for T+1s and before, causing early expirations for 
timers between T and T+1s since the hrtimer code's sense of time is one 
second ahead.

* This causes all TIMER_ABSTIME CLOCK_REALTIME timers to expire one 
second early.

* More problematically, all sub-second TIMER_ABSTIME CLOCK_REALTIME 
timers will return immediately.  If any such timer calls are done in a 
loop (as commonly done with futex_wait or other timeouts), this will 
cause load spikes in those applications.

* This state persists until clock_was_set() is called (most easily done 
via settimeofday())


I've used the attached test case to demonstrate triggering a leap-second 
and its effect on CLOCK_REALTIME hrtimers.

The test sets a leapsecond to trigger in 10 seconds, then in a loop 
sleeps for half a second via clock_nanosleep, printing out the current 
time, and the delta from the target wakeup time for 30 seconds.

When the leap second triggers, on affected machines you'll see the 
output streams quickly, with negative diff values, as clock_nanosleep is 
immediately returning.

To build:
gcc leaptest-timer.c -o leaptest-timer -lrt


I've reproduced this behaviour in kernel versions:
     v3.5-rc4
     v2.6.37
     v2.6.32.59
(And quite likely all in-between).

I haven't been able to build or boot anything earlier with the distro on 
my current test boxes, but I'm working to get older distro installed so 
I can do further testing.

Likely has potentially been around 
since:746976a301ac9c9aa10d7d42454f8d6cdad8ff2b in v2.6.22, as Ben Blum 
and Jan Ceuleers already noted.

With my fix to call clock_was_set when we apply a leapsecond, I no 
longer see the issue.

thanks
-john


View attachment "leaptest-timer.c" of type "text/x-csrc" (2182 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ