linux-kernel - Re: WARNING at: drivers/char/tty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.01.0908021254580.3352@localhost.localdomain>
Date:	Sun, 2 Aug 2009 13:20:18 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Sergey Senozhatsky <sergey.senozhatsky@...l.by>
cc:	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Greg KH <greg@...ah.com>
Subject: Re: WARNING at: drivers/char/tty_ldisc.c

On Sun, 2 Aug 2009, Sergey Senozhatsky wrote:
> 
> non-SMP system 'fails' as well.

Ahh, can you trigger this reliably? Is it 100% of the time when you shut 
down from single user mode? Or just occasionally?

> > The ldisc refcounts are simply done wrong. They are more debugging aids 
> > (for the case where no races occur), than actual memory management 
> > refcounts.
> 
> tty_ldisc.c:798  tty_ldisc_hangup
> 	WARN_ON(tty_ldisc_wait_idle(tty) != 0);
> 
> gave WARN_ON traces.

Yes, good catch. It means that somebody seems to have held on to the 
refcount for more than five seconds.

Which shouldn't happen under any normal situation.

> So, it seems refcount is wrong before
> 	tty_ldisc_halt(tty);
> 	tty_ldisc_wait_idle(tty);

Agreed. Or something is just holding the refcount for too long, possibly 
due to some deadlockish scenario (ie we migth be in "tty_ldisc_flush()", 
and blocked forever on ld->ops->flush_buffer() while holding the ldisc 
refcount. And we hold that whole &tty->ldisc_mutex _while_ waiting, so I 
can easily see things being blocked on each other.

I'd like to drop the ldisc_mutex while sleeping, but we can't. Not every 
caller even holds it. But just for a broken test, can you try the appended 
patch (NOT meant for serious consumption!) to see if it migth be a 
deadlock (broken by the timeout) on that semaphore?

I take it that you can't get a trace with sysrq-T because nothing gets 
logged, and you don't have a serial port console? That would likely 
pinpoint it pretty quickly (you could make the WARN_ON() do a 
"show_state()" instead - no need to actually physically press 'sysrq-t').

			Linus
---
 drivers/char/tty_ldisc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
index acd76b7..eb44c45 100644
--- a/drivers/char/tty_ldisc.c
+++ b/drivers/char/tty_ldisc.c
@@ -795,7 +795,9 @@ void tty_ldisc_hangup(struct tty_struct *tty)
 		if (tty->ldisc) {	/* Not yet closed */
 			/* Switch back to N_TTY */
 			tty_ldisc_halt(tty);
+			mutex_unlock(&tty->ldisc_mutex);	// HACK
 			tty_ldisc_wait_idle(tty);
+			mutex_lock(&tty->ldisc_mutex);		// HACK
 			tty_ldisc_reinit(tty);
 			/* At this point we have a closed ldisc and we want to
 			   reopen it. We could defer this to the next open but
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/