linux-kernel - Re: [PATCH] nohz: fix race allowing use of stale jiffies when waking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 21 Mar 2012 18:14:44 -0700
From:	John Stultz <johnstul@...ibm.com>
To:	Milton Miller <miltonm@....com>
CC:	Eric Dumazet <eric.dumazet@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] nohz: fix race allowing use of stale jiffies when waking

On 01/13/2012 09:02 PM, Milton Miller wrote:
> On Thu, 12 Jan 2012 about 10:49:15 +0100 Eric Dumazet wrote:
>> Le jeudi 12 janvier 2012 Ã  02:55 -0600, Milton Miller a Ã©crit :
>>> When waking up from nohz mode, all cpus call tick_do_update_jiffies64
>>> regardless of tick_do_timer_cpu as it could be no cpu was assigned.
>>>
>>> At the start of the function there is a quick lockless check to
>>> determine if jiffies is current.  The check uses last_jiffies_update,
>>> which is used to calculate when to perform the next increment.
>>> Unfortunately it is updated when how many jiffies to advance the
>>> clock is calculated, before the call to do_timer which actually
>>> updates jiffies.  A second cpu waking up could use the (potentially
>>> very) stale jiffies value during this window.
>>>
>>> This patch changes the check to be against tick_next_period, which
>>> is updated after the call to do_timer completes.  It compares the
>>> result of subtraction to zero, but this is safe as ktime_sub returns
>>> ktime_t which is s64, as signed type.
>>>
>>> I found this race while trying to track down reports of network adapter
>>> hangs on a large system.  I suspected premature false detection so
>>> I added logging when the locked region determined a multiple jiffie
>>> update would be required.  I noticed that it happened frequently when
>>> tick_do_timer_cpu was NONE (-1), and realized the large update was
>>> when all cpus were previously in nohz.  I then thought about what
>>> would happen if multiple cpus woke up near close to each other in
>>> time and decided the stale jiffies would be used.  (I later found at
>>> least part of the hung adapter reports were due to faulty detection
>>> logic that has since changed upstream.)
>>>
>>> Signed-off-by: Milton Miller<miltonm@....com>
>>> Cc: stable@...r.kernel.org
>>> ---
>>> Patch was generated and tested against 2.6.36; I verified it applies
>>> with offset -1 line to next-20120111.
>>>
>>> Index: src/kernel/time/tick-sched.c
>>> ===================================================================
>>> --- src.orig/kernel/time/tick-sched.c	2011-10-13 17:42:16.000000000 -0500
>>> +++ src/kernel/time/tick-sched.c	2011-10-13 17:45:31.000000000 -0500
>>> @@ -52,8 +52,8 @@ static void tick_do_update_jiffies64(kti
>>>   	/*
>>>   	 * Do a quick check without holding xtime_lock:
>>>   	 */
>>> -	delta = ktime_sub(now, last_jiffies_update);
>>> -	if (delta.tv64<  tick_period.tv64)
>>> +	delta = ktime_sub(now, tick_next_period);
>>> +	if (delta.tv64<  0)
>>>   		return;
>>>
>> Given ktime_t on 32bit arches is not an atomic type, I wonder how safe
>> is this anyway...
>>
> Ok I admit I hadn't thought about it, and initially I was going to
> think of something involving comparing the two timestamps, and
> waiting if next_period<= next_jiffies_update (with approprate
> subtract and compare).
>
> But then I thought some more and comparing the timestamp after the
> update is safe:
[snipped]


> There are a couple additional points to consider in this scenerio.
> One is that the cpu still has xtime lock so any attempt to read a
> high precision time will stall.  The second is if the cpu updating
> the jiffies is stalled by the hypervisor, then it is not unique to
> when it is waking from nohz and is likely happing when it owns
> timer duty, so time will be subject to bunching and jumping jiffies
> on a regular baasis.  About the most we could do is detect it, either
> by taking periodic helath checks of jiffie by other cpus or noticing
> that our tick update is constantly behind.
>
> So I think the updated racy check is fine, but will expand on the
> racy check comment why it is safe if that is desired.
>
So,  what happened with this patch?   Is there a updated version with 
the improved documentation covered in this mail?

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/