[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.01.0908242113540.3218@localhost.localdomain>
Date: Mon, 24 Aug 2009 21:30:16 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Frederic Weisbecker <fweisbec@...il.com>
cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Greg Kroah-Hartman <gregkh@...e.de>
Subject: Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointer dereference
at 0000000000000008
On Mon, 24 Aug 2009, Linus Torvalds wrote:
>
> Anyway, I'll happily be shown wrong. I think the (second) patch I sent out
> is an acceptable hack in the presense of the current locking, but as I
> said, I'm not exactly happy about it, because I do think the locking is
> broken.
Btw, another solution to all this would be to just not have that
ldisc_mutex deadlock due to do_tty_hangup -> tty_ldisc_hangup at all.
The actual _flushing_ doesn't need the mutex - it's just that both
flushing and hangup is done with workqueues.
If we can avoid the deadlock by not having the (artificial) workqueue
dependency, it would allow everybody to just hold on to the mutex over the
whole sequence - and would obviate the need for that hacky
TTY_LDISC_CHANGING bit thing in tty_set_ldisc.
In other words, the whole problem really comes in from the fact that
do_tty_hangup() is called from "hangup_work", and the workqueues can get
hung to the point where you can't then do the (totally _unrelated_) queue
flushing.
Because flush_to_ldisc() itself - which is what we want to do - doesn't
need that mutex or the workqueue at all. It could run from any context,
afaik.
So if we were to turn it into just a timer (rather than a "delayed work"),
then we'd not need to do that "flush_scheduled_work()" thing at all, and
we wouldn't have that interaction with do_tty_hangup(). At which point we
could again hold on to locks, because we wouldn't need to worry about the
workqueues getting stuck on the mutex (that isn't even needed for the
actual flushing part that we want to do!).
So don't get me wrong - there are _multiple_ ways to solve this. But they
are all pretty major surgery, changing "big" semantics. We could fix the
locking, we could change how we flush, we could do all of those things.
And I'd love to. But I think the almost-oneliner is the safest approach
right now. It's certainly not perfect, but it's fairly minimal impact.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists