lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.GSO.2.00.1306070717400.20297@git.silcnet.org>
Date:	Fri, 7 Jun 2013 07:23:21 +0200 (CEST)
From:	Pekka Riikonen <priikone@....fi>
To:	Tejun Heo <tj@...nel.org>
cc:	greearb@...delatech.com, linux-kernel@...r.kernel.org,
	eric.dumazet@...il.com, stable@...r.kernel.org,
	torvalds@...ux-foundation.org
Subject: Re: [PATCH v3] Fix lockup related to stop_machine being stuck in
 __do_softirq.

On Thu, 6 Jun 2013, Tejun Heo wrote:

> On Thu, Jun 06, 2013 at 02:29:49PM -0700, greearb@...delatech.com wrote:
>> From: Ben Greear <greearb@...delatech.com>
>>
>> The stop machine logic can lock up if all but one of
>> the migration threads make it through the disable-irq
>> step and the one remaining thread gets stuck in
>> __do_softirq.  The reason __do_softirq can hang is
>> that it has a bail-out based on jiffies timeout, but
>> in the lockup case, jiffies itself is not incremented.
>>
>> To work around this, re-add the max_restart counter in __do_irq
>> and stop processing irqs after 10 restarts.
>>
>> Thanks to Tejun Heo and Rusty Russell and others for
>> helping me track this down.
>>
>> This was introduced in 3.9 by commit:  c10d73671ad30f5469
>> (softirq:  reduce latencies).
>>
>> It may be worth looking into ath9k to see if it has issues with
>> it's irq handler at a later date.
>>
>> The hang stack traces look something like this:
> ...
>> Signed-off-by: Ben Greear <greearb@...delatech.com>
>
> Acked-by: Tejun Heo <tj@...nel.org>
>
> Linus, while this doesn't fix the root cause of the problem - softirq
> runaway - I still think this is a worthwhile protection to have.  Ben
> is in the process of finding out why the softirq runaway happens in
> the first place.  We probably want to add Cc: stable@...r.kernel.org
> tag.
>
The counter also helps to keep the interrupted task interrupted a shorter 
period of time.  10 iterations may be a lot shorter than the 2 ms, or 10 
ms with HZ=100, so it helps interactivity also.  This is a good change 
to bring back in any case.

 	Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ