linux-kernel - Re: [Bug #14388] keyboard under X with 2.6.31

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.01.0910131744590.3404@localhost.localdomain>
Date:	Tue, 13 Oct 2009 18:03:40 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Paul Fulghum <paulkf@...rogate.com>
cc:	Boyan <btanastasov@...oo.co.uk>, "Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Dmitry Torokhov <dmitry.torokhov@...il.com>,
	Ed Tomlinson <edt@....ca>,
	"OGAWA Hirofumi \"" <hirofumi@...l.parknet.co.jp>
Subject: Re: [Bug #14388] keyboard under X with 2.6.31

On Tue, 13 Oct 2009, Paul Fulghum wrote:
> 
> This is correct, the last buffer is not passed to tty_buffer_free()
> if it is the last in the list so tail is maintained.
> There is no free space in it so no new data can be added.
> There is no place where tail is null while the spinlock
> is released in preparation for calling receive_buf.
> I still can't spot any flaw in the current locking.

Do you even bother reading my emails?

Let me walk through an example of where the locking F*CKS UP, exactly 
because it's broken.

	thread1		thread2		thread3

	flush_to_ldisc
	set_bit(TTY_FLUSHING)
	buf.head = NULL
	...
	..release lock..
	.. sleep in ->receive_buf ..

			flush_to_ldisc
			set_bit(TTY_FLUSHING)
			.. head==NULL ..
			clear_bit(TTY_FLUSHING)
			.. release lock ..

					tty_ldisc_flush()
					-> tty_buffer_flush()
					TTY_FLUSHING not set!
					-> __tty_buffer_flush()
					-> tty->buf.tail = NULL

and now you're screwed. See? You have both 'buf.tail' and 'buf.head' both 
being NULL, and look what happens in that case 'tty_buffer_request_room()' 
if some new data comes in? Right: it will add the buffer to both tail and 
head.

And notice how 'thread1' is still inside flush_to_ldisc()! The buffer that 
got added will be overwritten by the old one, and now tail and head no 
longer match. Or another flush_to_ldisc() comes in, and now it won't be a 
no-op any more, and it will find the new data, and run ->receive_buf 
concurrently with the old receive_buf from thread1.

And the whole reason was that there were some very odd locking rules: 
buf.head=NULL meant "don't flush", and "TTY_FLUSHING is set" meant "don't 
clear 'buf.head'", and but the "don't flush" case still cleared 
TTY_FLUSHING (after not flushing), and it all messed up.

I could just have fixed it (move the "clear_bit(TTY_FLUSHING)" but up, but 
the fact is, once you fix that, it then becomes obvious that 
"buf.head=NULL" really is the wrong thing to test in the first place, and 
we should just use TTY_FLUSHING instead, and simply _remove_ the odd 
"buf.head=NULL is special" case. Which is what my patch did

> Your statement that the locking is too clever/subtle is
> clearly true since I am struggling to work this out again.

I have to say that the only case I could make up that is _clearly_ a bug 
is the above very contrieved example. I don't really think something like 
the above happens in reality. But it's an example of bad locking, and what 
happens when the locking logic isn't obvious.

There may be other cases where the locking fails, and I just didn't find 
them. 

Or the patch may simply not fix anything in practice, and nobody has ever 
actually triggered the bad locking in real life. I dunno. I just do know 
that the locking was too damn subtle.

Any time people do ad-hoc locking with "clever" schemes, it's almost 
invariably buggy. So the rule is: just don't do that. Make the locking 
rules "obvious".  Don't have subtle rules about "if head is NULL, then 
we're not going to add any new buffers to it, except if tail is also 
NULL". Because look above what happens, and see how complicated it was to 
even see the bug.

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/