[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <s5hzlb38t6i.wl%tiwai@suse.de>
Date: Fri, 17 Jul 2009 13:06:45 +0200
From: Takashi Iwai <tiwai@...e.de>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: tty related hangs with 2.6.31-rc3
At Wed, 15 Jul 2009 16:11:42 +0100,
Alan Cox wrote:
>
> On Wed, 15 Jul 2009 18:59:56 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com> wrote:
>
> > Hi,
> >
> > I am finding tty related hangs with 2.6.31-rc3. This didn't happen
> > before. This happen when i close the emacs session. The /proc/<pid>/stack
> > content is below
>
> Thanks - nice clear trace. Looks like a bug in the n_tty locking changes
> from a few releases back that the pty changes are triggering.
>
> Basically process_echoes calls tty_put_char which if it thinks a device
> queue was full and now has a bit of space will call tty_wakeup which can
> call process_echoes and thus deadlock. With a physical serial device we
> will even sometimes call tty_wakeup() from the serial transmit path
> which is an irq path (which makes this doubly wrong as it then takes
> mutexes)
>
> Emacs presumably uses fasync which is the trigger for this. You need the
> right timing combined with the new pty behaviour combined with FASYNC to
> trigger it.
>
>
> > [<c0362c5b>] process_echoes+0x2b/0x2e0
> [which tries to take the lock we already hold (end A)]
> > [<c03638cb>] n_tty_write_wakeup+0xb/0x40
> [which processes our ldisc wakeup (end A)]
> > [<c0360a88>] tty_wakeup+0x58/0x70
> [which wakes up our tty (end A)]
> > [<c0368347>] pty_write+0x67/0x70
> [our write method is for tty/pty pairs end A output, queued to end B]
> > [<c035f1cb>] tty_put_char+0x2b/0x40
> [calls tty_put_char to write the echoed byte to end A output]
> > [<c036291f>] do_output_char+0xef/0x200
> [the fake typed character is echoed back towards end B]
> > [<c0362d4e>] process_echoes+0x11e/0x2e0
> [tries to process echo characters on end A]
> > [<c0364292>] n_tty_receive_char+0x102/0x710
> [ receives a byte that we've faked typing to end A input]
> > [<c0364ac0>] n_tty_receive_buf+0x220/0x410
> [ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that
> bug is old]
> > [<c036059c>] tiocsti+0x8c/0xa0
> > [<c0361aca>] tty_ioctl+0x25a/0x310
> > [<c01e52c8>] vfs_ioctl+0x28/0x80
> > [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> > [<c01e56a3>] sys_ioctl+0x53/0x70
> > [<c0102e3c>] sysenter_do_call+0x12/0x28
> > [<ffffffff>] 0xffffffff
>
> Do you have the lock validator enabled and if so did it have anything
> useful to report ?
I hit the same bug. The log is attached below.
> Please try the following. I suspect this is the real fix:
>
> n_tty: Fix echo race
>
> From: Alan Cox <alan@...ux.intel.com>
>
> If a tty in N_TTY mode with echo enabled manages to get itself into a state
> where
> - echo characters are pending
> - FASYNC is enabled
> - tty_write_wakeup is called from either
> - a device write path (pty)
> - an IRQ (serial)
>
> then it either deadlocks or explodes taking a mutex in the IRQ path.
>
> On the serial side it is almost impossible to reproduce because you have to
> go from a full serial port to a near empty one with echo characters
> pending. The pty case happens to have become possible to trigger using
> emacs and ptys, the pty changes having created a scenario which shows up
> this bug.
>
> The code path is
>
> n_tty:process_echoes() (takes mutex)
> tty_io:tty_put_char()
> pty:pty_write (or serial paths)
> tty_wakeup (from pty_write or serial IRQ)
> n_tty_write_wakeup()
> process_echoes()
> *KABOOM*
>
> Signed-off-by: Alan Cox <alan@...ux.intel.com>
The patch seems to fix the problem indeed.
Thanks!
Tested-by: Takashi Iwai <tiwai@...e.de>
Takashi
=============================================
[ INFO: possible recursive locking detected ]
2.6.31-rc3-test #43
---------------------------------------------
events/3/18 is trying to acquire lock:
(&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf
but task is already holding lock:
(&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf
other info that might help us debug this:
4 locks held by events/3/18:
#0: (events){+.+.+.}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
#1: (&(&tty->buf.work)->work){+.+...}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
#2: (&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf
#3: (&tty->echo_lock){+.+...}, at: [<ffffffff81245b0c>] process_echoes+0x5a/0x2bf
stack backtrace:
Pid: 18, comm: events/3 Not tainted 2.6.31-rc3-test #43
Call Trace:
[<ffffffff8108b1fb>] __lock_acquire+0x14d6/0x157d
[<ffffffff810879f7>] ? save_trace+0x4e/0xc0
[<ffffffff81087af9>] ? add_lock_to_list+0x90/0xeb
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff8108b390>] lock_acquire+0xee/0x12e
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81249a4f>] ? tty_ldisc_try+0x2b/0x6f
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81334b6f>] mutex_lock_nested+0x66/0x2f2
[<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
[<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
[<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
[<ffffffff81245af7>] process_echoes+0x45/0x2bf
[<ffffffff81245d91>] n_tty_write_wakeup+0x20/0x6a
[<ffffffff8124402e>] tty_wakeup+0x44/0x84
[<ffffffff8124bfcb>] pty_write+0x62/0x82
[<ffffffff81245b0c>] ? process_echoes+0x5a/0x2bf
[<ffffffff81241ba8>] tty_put_char+0x3c/0x52
[<ffffffff81245a84>] do_output_char+0x1ce/0x1fc
[<ffffffff81245cbd>] process_echoes+0x20b/0x2bf
[<ffffffff813345ec>] ? mutex_unlock+0x1c/0x32
[<ffffffff8124783d>] n_tty_receive_buf+0x33e/0xf1e
[<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
[<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
[<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
[<ffffffff81088ed2>] ? trace_hardirqs_on+0x20/0x36
[<ffffffff8124ade7>] flush_to_ldisc+0x119/0x1c0
[<ffffffff8124acce>] ? flush_to_ldisc+0x0/0x1c0
[<ffffffff81070061>] worker_thread+0x217/0x330
[<ffffffff8107000f>] ? worker_thread+0x1c5/0x330
[<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
[<ffffffff81075d40>] ? autoremove_wake_function+0x0/0x5a
[<ffffffff8106fe4a>] ? worker_thread+0x0/0x330
[<ffffffff81075900>] kthread+0x94/0x9c
[<ffffffff8101402a>] child_rip+0xa/0x20
[<ffffffff81013990>] ? restore_args+0x0/0x30
[<ffffffff8107586c>] ? kthread+0x0/0x9c
[<ffffffff81014020>] ? child_rip+0x0/0x20
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists