[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090921083609.GA16048@elte.hu>
Date: Mon, 21 Sep 2009 10:36:09 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Greg Kroah-Hartman <gregkh@...e.de>,
Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: Linux 2.6.31-rc9
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> [ Added some people to the cc - this is very directly related to the
> previous thread on "v2.6.31-rc6: BUG: unable to handle kernel NULL
> pointer dereference at 0000000000000008", and the deadlock discussion
> there ]
>
> On Tue, 8 Sep 2009, Ingo Molnar wrote:
> >
> > FYI, i'm getting very (very) rare warnings from the TTY code in this
> > place:
> >
> > [ 28.187364] rc.sysinit used greatest stack depth: 5224 bytes left
> > [ 31.422457] Adding 3911816k swap on /dev/sda2. Priority:-1 extents:1 across:3911816k
> > [ 32.974830] ssh used greatest stack depth: 5200 bytes left
> > [ 33.115028] ------------[ cut here ]------------
> > [ 33.119518] WARNING: at drivers/char/tty_io.c:1267 __tty_open+0x3ef/0x4c0()
>
> Hmm. I think I see why, and I _suspect_ this is harmless, although
> it's obviously very annoying, and it really is indicative of a real
> locking problem.
>
> What's going on is that same horrible deadlocak-avoidance where we have to
> drop the ldisc_mutex after clearing TTY_LDISC, in order to then wait for
> any pending work. See commit 5c58ceff103d8a654f24769bb1baaf84a841b0cc,
> which is probably also the one that introduced the timing that gets your
> particular warning.
>
> So when __tty_open() does this:
>
> mutex_lock(&tty->ldisc_mutex);
> WARN_ON(!test_bit(TTY_LDISC, &tty->flags));
> mutex_unlock(&tty->ldisc_mutex);
>
> it's really warning about something that really can happen: the things
> that clear TTY_LDISC will all release the ldisc_mutex with that bit still
> clear, because they all end up having to release the lock that they
> _should_ hold in order to avoid a deadlock.
>
> So the warning is "real" in the sense that it does show a real locking
> problem. It's probably not _relevant_ in that it probably will never cause
> any other issues in practice.
>
> > I got it on two systems so far. Config attached (but is probably
> > irrelevant). The warnings started in the .31 cycle. They occur every
> > 1000-2000 random kernels - i.e. every few days.
>
> Yeah, the configuration won't matter.
>
> > These warnings were never fatal and my guess is that they are
> > ancient, pre-existing races in the TTY code - but wanted to mention
> > them here in case they matter.
>
> The issue is pre-existing, yes - we've always done that
>
> tty_ldisc_halt(tty);
> flush_scheduled_work();
>
> outside the ldisc_mutex, but the commit mentioned above (5c58ceff) added a
> new case where we do it (it used to be in just tty_set_ldisc() and in
> tty_ldisc_release()). So it's a pre-existing issue that probably just got
> _way_ easier to hit fairly recently.
>
> Quite frankly, the ldisc_mutex problem is not fixable at this stage in
> 2.6.31, and it's probably not worth worrying about. I'm planning on
> revisiting this after releasing 2.6.31 (probably just deciding that
> the sane way to fix it is to turn that flush_to_ldisc thing into just
> a timer, not a delayed work - which allows us to hold the mutex), but
> there's no way I'm doing that before..
>
> If the fix turns out straightforward, we can back-port it through
> stable.
Just to refresh this older thread - is this warning supposed to be gone
in latest -git? It still triggers occasionally in -tip tests:
[ 9.243982] quotaon used greatest stack depth: 5396 bytes left
[ 13.758784] Adding 3911816k swap on /dev/sda2. Priority:-1 extents:1 across:3911816k
[ 15.373560] ------------[ cut here ]------------
[ 15.374283] WARNING: at drivers/char/tty_io.c:1268 tty_open+0x20a/0x3b1()
[ 15.375257] Hardware name: System Product Name
[ 15.376216] Modules linked in:
[ 15.378215] Pid: 1706, comm: modprobe Not tainted 2.6.31-tip #16184
[ 15.379530] Call Trace:
[ 15.380217] [<793430e5>] ? tty_open+0x20a/0x3b1
[ 15.381217] [<7904c329>] warn_slowpath_common+0x6f/0xb0
[ 15.382215] [<7904c386>] warn_slowpath_null+0x1c/0x30
[ 15.383215] [<793430e5>] tty_open+0x20a/0x3b1
[ 15.384244] [<790c8f2b>] chrdev_open+0x111/0x139
[ 15.385215] [<790c4001>] __dentry_open+0x16c/0x270
[ 15.386215] [<790c8e1a>] ? chrdev_open+0x0/0x139
[ 15.387215] [<790c420f>] nameidata_to_filp+0x39/0x61
[ 15.388215] [<790d1ab1>] do_filp_open+0x455/0x7d3
[ 15.389246] [<796572fe>] ? _spin_unlock+0x35/0x5c
[ 15.390216] [<790daebd>] ? alloc_fd+0xd7/0xf2
[ 15.391215] [<790c3d42>] do_sys_open+0x53/0xe6
[ 15.392215] [<790c3e48>] sys_open+0x2c/0x45
[ 15.393221] [<79023507>] sysenter_do_call+0x12/0x3c
[ 15.395245] ---[ end trace 8e8143959784383e ]---
(config attached) It's still never fatal, just a warning.
Ingo
View attachment "config" of type "text/plain" (61719 bytes)
Powered by blists - more mailing lists