[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090825150506.GG6114@nowhere>
Date: Tue, 25 Aug 2009 17:05:09 +0200
From: Frederic Weisbecker <fweisbec@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Greg Kroah-Hartman <gregkh@...e.de>
Subject: Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointer
dereference at 0000000000000008
On Mon, Aug 24, 2009 at 09:30:16PM -0700, Linus Torvalds wrote:
>
>
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> >
> > Anyway, I'll happily be shown wrong. I think the (second) patch I sent out
> > is an acceptable hack in the presense of the current locking, but as I
> > said, I'm not exactly happy about it, because I do think the locking is
> > broken.
>
> Btw, another solution to all this would be to just not have that
> ldisc_mutex deadlock due to do_tty_hangup -> tty_ldisc_hangup at all.
>
> The actual _flushing_ doesn't need the mutex - it's just that both
> flushing and hangup is done with workqueues.
Yeah, it would be sad, but having the flushing done in a dedicated workqueue
would solve the need of relaxing the lock, because we would only wait
for the pending flush works, not the hangup works.
But it's sad to create a thread only for that.
> If we can avoid the deadlock by not having the (artificial) workqueue
> dependency, it would allow everybody to just hold on to the mutex over the
> whole sequence - and would obviate the need for that hacky
> TTY_LDISC_CHANGING bit thing in tty_set_ldisc.
>
> In other words, the whole problem really comes in from the fact that
> do_tty_hangup() is called from "hangup_work", and the workqueues can get
> hung to the point where you can't then do the (totally _unrelated_) queue
> flushing.
>
> Because flush_to_ldisc() itself - which is what we want to do - doesn't
> need that mutex or the workqueue at all. It could run from any context,
> afaik.
>
> So if we were to turn it into just a timer (rather than a "delayed work"),
> then we'd not need to do that "flush_scheduled_work()" thing at all, and
> we wouldn't have that interaction with do_tty_hangup(). At which point we
> could again hold on to locks, because we wouldn't need to worry about the
> workqueues getting stuck on the mutex (that isn't even needed for the
> actual flushing part that we want to do!).
Yeah, a simple timer would be better than a dedicated workqueue in that
we don't need a whole thread for such small job.
>
> So don't get me wrong - there are _multiple_ ways to solve this. But they
> are all pretty major surgery, changing "big" semantics. We could fix the
> locking, we could change how we flush, we could do all of those things.
> And I'd love to. But I think the almost-oneliner is the safest approach
> right now. It's certainly not perfect, but it's fairly minimal impact.
>
> Linus
Yep.
I hope the progressive work Jens Axboe is doing on workqueues will drop
their serialized nature which leads to such perpetual deadlocks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists