linux-kernel - Re: tty related hangs with 2.6.31-rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <s5hzlb38t6i.wl%tiwai@suse.de>
Date:	Fri, 17 Jul 2009 13:06:45 +0200
From:	Takashi Iwai <tiwai@...e.de>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: tty related hangs with 2.6.31-rc3

At Wed, 15 Jul 2009 16:11:42 +0100,
Alan Cox wrote:
> 
> On Wed, 15 Jul 2009 18:59:56 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com> wrote:
> 
> > Hi,
> > 
> > I am finding tty related hangs with 2.6.31-rc3. This didn't happen 
> > before. This happen when i close the emacs session. The /proc/<pid>/stack
> > content is below      
> 
> Thanks - nice clear trace. Looks like a bug in the n_tty locking changes
> from a few releases back that the pty changes are triggering.
> 
> Basically process_echoes calls tty_put_char which if it thinks a device
> queue was full and now has a bit of space will call tty_wakeup which can
> call process_echoes and thus deadlock. With a physical serial device we
> will even sometimes call tty_wakeup() from the serial transmit path
> which is an irq path (which makes this doubly wrong as it then takes
> mutexes)
> 
> Emacs presumably uses fasync which is the trigger for this. You need the
> right timing combined with the new pty behaviour combined with FASYNC to
> trigger it.
> 
> 
> > [<c0362c5b>] process_echoes+0x2b/0x2e0
> [which tries to take the lock we already hold (end A)]
> > [<c03638cb>] n_tty_write_wakeup+0xb/0x40
> [which processes our ldisc wakeup (end A)]
> > [<c0360a88>] tty_wakeup+0x58/0x70
> [which wakes up our tty (end A)]
> > [<c0368347>] pty_write+0x67/0x70
> [our write method is for tty/pty pairs end A output, queued to end B]
> > [<c035f1cb>] tty_put_char+0x2b/0x40
> [calls tty_put_char to write the echoed byte to end A output]
> > [<c036291f>] do_output_char+0xef/0x200
> [the fake typed character is echoed back towards end B]
> > [<c0362d4e>] process_echoes+0x11e/0x2e0
> [tries to process echo characters on end A]
> > [<c0364292>] n_tty_receive_char+0x102/0x710
> [ receives a byte that we've faked typing to end A input]
> > [<c0364ac0>] n_tty_receive_buf+0x220/0x410
> [ioctl method calls the ld->ops->receive_buf for n_tty (unsafely but that
> bug is old]
> > [<c036059c>] tiocsti+0x8c/0xa0
> > [<c0361aca>] tty_ioctl+0x25a/0x310
> > [<c01e52c8>] vfs_ioctl+0x28/0x80
> > [<c01e54f4>] do_vfs_ioctl+0x64/0x1c0
> > [<c01e56a3>] sys_ioctl+0x53/0x70
> > [<c0102e3c>] sysenter_do_call+0x12/0x28
> > [<ffffffff>] 0xffffffff
> 
> Do you have the lock validator enabled and if so did it have anything
> useful to report ?

I hit the same bug.  The log is attached below.


> Please try the following. I suspect this is the real fix:
> 
> n_tty: Fix echo race
> 
> From: Alan Cox <alan@...ux.intel.com>
> 
> If a tty in N_TTY mode with echo enabled manages to get itself into a state
> where
> 	- echo characters are pending
> 	- FASYNC is enabled
> 	- tty_write_wakeup is called from either
> 		- a device write path (pty)
> 		- an IRQ (serial)
> 
> then it either deadlocks or explodes taking a mutex in the IRQ path.
> 
> On the serial side it is almost impossible to reproduce because you have to
> go from a full serial port to a near empty one with echo characters
> pending. The pty case happens to have become possible to trigger using
> emacs and ptys, the pty changes having created a scenario which shows up
> this bug.
> 
> The code path is
> 
> 	n_tty:process_echoes() (takes mutex)
> 	tty_io:tty_put_char()
> 	pty:pty_write  (or serial paths)
> 	tty_wakeup     (from pty_write or serial IRQ)
> 	n_tty_write_wakeup()
> 	process_echoes()
> 	*KABOOM*
> 
> Signed-off-by: Alan Cox <alan@...ux.intel.com>

The patch seems to fix the problem indeed.
Thanks!

Tested-by: Takashi Iwai <tiwai@...e.de>


Takashi

=============================================
[ INFO: possible recursive locking detected ]
2.6.31-rc3-test #43
---------------------------------------------
events/3/18 is trying to acquire lock:
 (&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf

but task is already holding lock:
 (&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf

other info that might help us debug this:
4 locks held by events/3/18:
 #0:  (events){+.+.+.}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
 #1:  (&(&tty->buf.work)->work){+.+...}, at: [<ffffffff8107000f>] worker_thread+0x1c5/0x330
 #2:  (&tty->output_lock){+.+...}, at: [<ffffffff81245af7>] process_echoes+0x45/0x2bf
 #3:  (&tty->echo_lock){+.+...}, at: [<ffffffff81245b0c>] process_echoes+0x5a/0x2bf

stack backtrace:
Pid: 18, comm: events/3 Not tainted 2.6.31-rc3-test #43
Call Trace:
 [<ffffffff8108b1fb>] __lock_acquire+0x14d6/0x157d
 [<ffffffff810879f7>] ? save_trace+0x4e/0xc0
 [<ffffffff81087af9>] ? add_lock_to_list+0x90/0xeb
 [<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
 [<ffffffff8108b390>] lock_acquire+0xee/0x12e
 [<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
 [<ffffffff81249a4f>] ? tty_ldisc_try+0x2b/0x6f
 [<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
 [<ffffffff81334b6f>] mutex_lock_nested+0x66/0x2f2
 [<ffffffff81245af7>] ? process_echoes+0x45/0x2bf
 [<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
 [<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
 [<ffffffff81245af7>] process_echoes+0x45/0x2bf
 [<ffffffff81245d91>] n_tty_write_wakeup+0x20/0x6a
 [<ffffffff8124402e>] tty_wakeup+0x44/0x84
 [<ffffffff8124bfcb>] pty_write+0x62/0x82
 [<ffffffff81245b0c>] ? process_echoes+0x5a/0x2bf
 [<ffffffff81241ba8>] tty_put_char+0x3c/0x52
 [<ffffffff81245a84>] do_output_char+0x1ce/0x1fc
 [<ffffffff81245cbd>] process_echoes+0x20b/0x2bf
 [<ffffffff813345ec>] ? mutex_unlock+0x1c/0x32
 [<ffffffff8124783d>] n_tty_receive_buf+0x33e/0xf1e
 [<ffffffff81088b6c>] ? mark_held_locks+0x65/0x9b
 [<ffffffff81336926>] ? _spin_unlock_irqrestore+0x55/0x7a
 [<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
 [<ffffffff81088ed2>] ? trace_hardirqs_on+0x20/0x36
 [<ffffffff8124ade7>] flush_to_ldisc+0x119/0x1c0
 [<ffffffff8124acce>] ? flush_to_ldisc+0x0/0x1c0
 [<ffffffff81070061>] worker_thread+0x217/0x330
 [<ffffffff8107000f>] ? worker_thread+0x1c5/0x330
 [<ffffffff81088e78>] ? trace_hardirqs_on_caller+0x124/0x15e
 [<ffffffff81075d40>] ? autoremove_wake_function+0x0/0x5a
 [<ffffffff8106fe4a>] ? worker_thread+0x0/0x330
 [<ffffffff81075900>] kthread+0x94/0x9c
 [<ffffffff8101402a>] child_rip+0xa/0x20
 [<ffffffff81013990>] ? restore_args+0x0/0x30
 [<ffffffff8107586c>] ? kthread+0x0/0x9c
 [<ffffffff81014020>] ? child_rip+0x0/0x20
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/