[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DEBD95B.6030901@die-jansens.de>
Date: Sun, 05 Jun 2011 21:30:35 +0200
From: Arne Jansen <lists@...-jansens.de>
To: Ingo Molnar <mingo@...e.hu>
CC: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
efault@....de, npiggin@...nel.dk, akpm@...ux-foundation.org,
frank.rowand@...sony.com, tglx@...utronix.de,
linux-tip-commits@...r.kernel.org
Subject: Re: [debug patch] printk: Add a printk killswitch to robustify NMI
watchdog messages
On 05.06.2011 20:59, Ingo Molnar wrote:
>
> * Arne Jansen<lists@...-jansens.de> wrote:
>
>>> hm, it's hard to interpret that without the spin_lock()/unlock()
>>> logic keeping the dumps apart.
>>
>> The locking was in place from the beginning. [...]
>
> Ok, i was surprised it looked relatively ordered :-)
>
>> [...] As the output is still scrambled, there are other sources for
>> BUG/WARN outside the watchdog that trigger in parallel. Maybe we
>> should protect the whole BUG/WARN mechanism with a lock and send it
>> to early_printk from the beginning, so we don't have to wait for
>> the watchdog to kill printk off and the first BUG can come through.
>> Or just let WARN/BUG kill off printk instead of the watchdog
>> (though I have to get rid of that syslog-WARN on startup).
>
> I had yet another look at your lockup.txt and i think the main cause
> is the WARN_ON() caused by the not-held pi_lock. The lockup there
> causes other CPUs to wedge in printk, which triggers spinlock-lockup
> messages there.
>
> So i think the primary trigger is the pi_lock WARN_ON() (as your
> bisection has confirmed that too), everything else comes from this.
>
> Unfortunately i don't think we can really 'fix' the problem by
> removing the assert. By all means the assert is correct: pi_lock
> should be held there. If we are not holding it then we likely won't
> crash in an easily visible way - it's a lot easier to trigger asserts
> than to trigger obscure side-effects of locking bugs.
>
> It is also a mystery why only printk() triggers this bug. The wakeup
> done there is not particularly special, so by all means we should
> have seen similar lockups elsewhere as well - not just with
> printk()s. Yet we are not seeing them.
From the timing I see I'd guess it has something to do with the
scheduler kicking in during printk. I'm neither familiar with the
printk code nor with the scheduler.
If you have any ideas what I should test or add please let me know.
-Arne
>
> So some essential piece of the puzzle is still missing.
>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists