linux-kernel - Re: Linux 2.6.31-rc9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090921083609.GA16048@elte.hu>
Date:	Mon, 21 Sep 2009 10:36:09 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: Linux 2.6.31-rc9

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> [ Added some people to the cc - this is very directly related to the 
>   previous thread on "v2.6.31-rc6: BUG: unable to handle kernel NULL 
>   pointer dereference at 0000000000000008", and the deadlock discussion 
>   there ]
> 
> On Tue, 8 Sep 2009, Ingo Molnar wrote:
> > 
> > FYI, i'm getting very (very) rare warnings from the TTY code in this 
> > place:
> > 
> > [   28.187364] rc.sysinit used greatest stack depth: 5224 bytes left
> > [   31.422457] Adding 3911816k swap on /dev/sda2.  Priority:-1 extents:1 across:3911816k 
> > [   32.974830] ssh used greatest stack depth: 5200 bytes left
> > [   33.115028] ------------[ cut here ]------------
> > [   33.119518] WARNING: at drivers/char/tty_io.c:1267 __tty_open+0x3ef/0x4c0()
> 
> Hmm. I think I see why, and I _suspect_ this is harmless, although 
> it's obviously very annoying, and it really is indicative of a real 
> locking problem.
> 
> What's going on is that same horrible deadlocak-avoidance where we have to 
> drop the ldisc_mutex after clearing TTY_LDISC, in order to then wait for 
> any pending work. See commit 5c58ceff103d8a654f24769bb1baaf84a841b0cc, 
> which is probably also the one that introduced the timing that gets your 
> particular warning.
> 
> So when __tty_open() does this:
> 
> 	mutex_lock(&tty->ldisc_mutex);
> 	WARN_ON(!test_bit(TTY_LDISC, &tty->flags));
> 	mutex_unlock(&tty->ldisc_mutex);
> 
> it's really warning about something that really can happen: the things 
> that clear TTY_LDISC will all release the ldisc_mutex with that bit still 
> clear, because they all end up having to release the lock that they 
> _should_ hold in order to avoid a deadlock.
> 
> So the warning is "real" in the sense that it does show a real locking 
> problem. It's probably not _relevant_ in that it probably will never cause 
> any other issues in practice.
> 
> > I got it on two systems so far. Config attached (but is probably 
> > irrelevant). The warnings started in the .31 cycle. They occur every 
> > 1000-2000 random kernels - i.e. every few days.
> 
> Yeah, the configuration won't matter. 
> 
> > These warnings were never fatal and my guess is that they are 
> > ancient, pre-existing races in the TTY code - but wanted to mention 
> > them here in case they matter.
> 
> The issue is pre-existing, yes - we've always done that 
> 
> 	tty_ldisc_halt(tty);
> 	flush_scheduled_work();
> 
> outside the ldisc_mutex, but the commit mentioned above (5c58ceff) added a 
> new case where we do it (it used to be in just tty_set_ldisc() and in 
> tty_ldisc_release()). So it's a pre-existing issue that probably just got 
> _way_ easier to hit fairly recently.
> 
> Quite frankly, the ldisc_mutex problem is not fixable at this stage in 
> 2.6.31, and it's probably not worth worrying about. I'm planning on 
> revisiting this after releasing 2.6.31 (probably just deciding that 
> the sane way to fix it is to turn that flush_to_ldisc thing into just 
> a timer, not a delayed work - which allows us to hold the mutex), but 
> there's no way I'm doing that before..
> 
> If the fix turns out straightforward, we can back-port it through 
> stable.

Just to refresh this older thread - is this warning supposed to be gone 
in latest -git? It still triggers occasionally in -tip tests:

[    9.243982] quotaon used greatest stack depth: 5396 bytes left
[   13.758784] Adding 3911816k swap on /dev/sda2.  Priority:-1 extents:1 across:3911816k 
[   15.373560] ------------[ cut here ]------------
[   15.374283] WARNING: at drivers/char/tty_io.c:1268 tty_open+0x20a/0x3b1()
[   15.375257] Hardware name: System Product Name
[   15.376216] Modules linked in:
[   15.378215] Pid: 1706, comm: modprobe Not tainted 2.6.31-tip #16184
[   15.379530] Call Trace:
[   15.380217]  [<793430e5>] ? tty_open+0x20a/0x3b1
[   15.381217]  [<7904c329>] warn_slowpath_common+0x6f/0xb0
[   15.382215]  [<7904c386>] warn_slowpath_null+0x1c/0x30
[   15.383215]  [<793430e5>] tty_open+0x20a/0x3b1
[   15.384244]  [<790c8f2b>] chrdev_open+0x111/0x139
[   15.385215]  [<790c4001>] __dentry_open+0x16c/0x270
[   15.386215]  [<790c8e1a>] ? chrdev_open+0x0/0x139
[   15.387215]  [<790c420f>] nameidata_to_filp+0x39/0x61
[   15.388215]  [<790d1ab1>] do_filp_open+0x455/0x7d3
[   15.389246]  [<796572fe>] ? _spin_unlock+0x35/0x5c
[   15.390216]  [<790daebd>] ? alloc_fd+0xd7/0xf2
[   15.391215]  [<790c3d42>] do_sys_open+0x53/0xe6
[   15.392215]  [<790c3e48>] sys_open+0x2c/0x45
[   15.393221]  [<79023507>] sysenter_do_call+0x12/0x3c
[   15.395245] ---[ end trace 8e8143959784383e ]---

(config attached) It's still never fatal, just a warning.

	Ingo

View attachment "config" of type "text/plain" (61719 bytes)