linux-kernel - Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.01.0908241632060.3824@localhost.localdomain>
Date:	Mon, 24 Aug 2009 16:51:03 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
cc:	linux-kernel@...r.kernel.org, x86@...nel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Greg Kroah-Hartman <gregkh@...e.de>
Subject: Re: v2.6.31-rc6: BUG: unable to handle kernel NULL pointer dereference
 at 0000000000000008

On Mon, 24 Aug 2009, Linus Torvalds wrote:
> 
> Untested. VERY untested. Just going by "that looks odd".

Btw, one issue here is that we at least sometimes do tty_ldisc_halt() 
under the tty->ldisc_mutex.  Now that's fine - as long as we never take 
that lock inside any delayed work - because then the delayed work itself 
may need the lock we hold in order to complete, and now the 
'cancel_delayed_work_sync()' thing might deadlock.

And sadly, we do end up having 'do_tty_hangup()' as a workqueue entry, and 
that one does tty_ldisc_hangp, and that one in turn does take 
tty->ldisc_mutex.

So it looks like either we can't use the 'sync()' version, or we should 
never hold the ldisc_mutex while doing that tty_ldisc_halt(). Because 
waiting for the workqueue while holding the mutex looks like it could 
deadlock. It's probably very rare, but whatever.

Still, it would be good for people to test whether that patch makes the 
problem go away. Just to see if the issue really is a race between 
"tty_ldisc_halt()" and an ldisc being active on another CPU right then. 

But I wanted to let people know that the patch is clearly not the "last 
word" on this. It's a useful thing to try, but we need something better.

And it looks like we've hit that problem before, which is probably why it 
didn't use sync. several of the callers of 'tty_ldisc_halt()' do a 
flush_scheduled_work() afterwards, outside the ldisc_mutex. Of course, the 
sane one (tty_ldisc_release()) does a tty_ldisc_halt() even before taking 
the mutex lock.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/