[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 6 Jun 2013 14:40:14 -0700
From: Tejun Heo <tj@...nel.org>
To: greearb@...delatech.com
Cc: linux-kernel@...r.kernel.org, eric.dumazet@...il.com,
stable@...r.kernel.org, torvalds@...ux-foundation.org
Subject: Re: [PATCH v3] Fix lockup related to stop_machine being stuck in
__do_softirq.
On Thu, Jun 06, 2013 at 02:29:49PM -0700, greearb@...delatech.com wrote:
> From: Ben Greear <greearb@...delatech.com>
>
> The stop machine logic can lock up if all but one of
> the migration threads make it through the disable-irq
> step and the one remaining thread gets stuck in
> __do_softirq. The reason __do_softirq can hang is
> that it has a bail-out based on jiffies timeout, but
> in the lockup case, jiffies itself is not incremented.
>
> To work around this, re-add the max_restart counter in __do_irq
> and stop processing irqs after 10 restarts.
>
> Thanks to Tejun Heo and Rusty Russell and others for
> helping me track this down.
>
> This was introduced in 3.9 by commit: c10d73671ad30f5469
> (softirq: reduce latencies).
>
> It may be worth looking into ath9k to see if it has issues with
> it's irq handler at a later date.
>
> The hang stack traces look something like this:
...
> Signed-off-by: Ben Greear <greearb@...delatech.com>
Acked-by: Tejun Heo <tj@...nel.org>
Linus, while this doesn't fix the root cause of the problem - softirq
runaway - I still think this is a worthwhile protection to have. Ben
is in the process of finding out why the softirq runaway happens in
the first place. We probably want to add Cc: stable@...r.kernel.org
tag.
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists