linux-kernel - Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19104.9909.672999.36280@pilspetsen.it.uu.se>
Date:	Thu, 3 Sep 2009 22:27:33 +0200
From:	Mikael Pettersson <mikpe@...uu.se>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Mikael Pettersson <mikpe@...uu.se>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Alan Cox <alan@...ux.intel.com>, Greg KH <gregkh@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
Subject: Re: [Bug #14015] pty regressed again, breaking expect and gcc's
 testsuite

Linus Torvalds writes:
 > 
 > 
 > On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
 > > On Tuesday 01 September 2009, Mikael Pettersson wrote:
 > > > 
 > > > Starting with 2.6.31-rc8 and reverting
 > > > 
 > > > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
 > > > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
 > > > 
 > > > in that order gives me a kernel that works on both x86 and powerpc64.
 > > > 
 > > > So the bug is definitely limited to the pty buffering logic change.
 > > 
 > > Thanks a lot for this information, adding somme CCs to the list.
 > 
 > Mikael, is there any way to get the gcc testsuite to show the "expected" 
 > vs "result" cases when the failures occur, so that we can see what the 
 > pattern is ("it drops one character every 8kB" or something like that).

I don't know. It dumps a summary on stdout and saves logs in various
subdirs. Those logs do show the gcc commands executed and the output
from those commands, but they don't show why some test is considered
failed, or the exact boundaries of each fragment of text returned by
read(2) [which I guess may be significant].

What I can say for certain is that the test cases I've seen fail most
frequently deliberately generate lots of diagnostic output from the
compiler, like 100s or 1000s of lines of warning/error messages.
So the comments in the pty changes that talk about using some 8KB
buffer instead of a 64KB tty flip buffer definitely made me nervous.

 > However, I get the feeling that it's really the same bug that 
 > OGAWA-san already fixed - and that his fix just doesn't always do a 100% 
 > of the job. 
 > 
 > So what Ogawa did was to make sure that we flush any pending data whenever 
 > we;re checking "do we have any data left". He did that by calling out to 
 > tty_flush_to_ldisc(), which should flush the data through to the ldisc. 
 > 
 > The keyword here being "should". In flush_to_ldisc(), we have at least one 
 > case where we say "we'll delay it a bit more":
 > 
 > 		if (!tty->receive_room) {
 > 			schedule_delayed_work(&tty->buf.work, 1);
 > 			break;
 > 		}
 > 
 > and while I think this _should_ be ok (because if there is no 
 > receive-room, then we'll hopefully always return non-zero from 
 > "input_available_p()". However, we do have this really odd case that the 
 > reader side will do "n_tty_set_room()" onlyl _after_ having checked for 
 > input_available_p(), and so maybe we do sometimes trigger the case that
 > 
 >  - input_available_p() tries to flush to the input buffer before checking 
 >    how much data is available, by calling 'tty_flush_to_ldisc()'
 > 
 >  - but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room 
 >    is zero.
 > 
 >  - so now input_available_p will say "I don't have any data", even though 
 >    there was data in the write buffers.
 > 
 >  - we'll notice that the other end has hung up, and return EOF/EIO.
 > 
 >  - which is very WRONG, because the other end may have hung up, but before 
 >    it did that, it wrote data that is still in the write queues, and we 
 >    should have returned that data.
 > 
 > Anyway, I'm not at all sure that the "receive_room == 0" case can happen 
 > at all, but maybe it can. Ogawa-san?
 > 
 > Here's a totally untested trial patch. I only have this dead-slow netbook  
 > for reading email with me, and I don't have a failing test-case anyway, 
 > but if my analysis is right, then the patch might fix it. It just forces 
 > the re-calculation of the receive buffer before flushing the ldisc.
 > 
 > (And btw, from a performance standpoint, it might make more sense to only 
 > do this whole read-room / ldisc-flush thing if we are about to return 
 > zero. If we already have data available, we probably shouldn't waste time 
 > trying to see if we need to do anything fancy like this.)
 > 
 > CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to 
 > me compiling the wrong file or something.
 > 
 > 		Linus

Thanks. I'll give this a try tomorrow.

/Mikael

 >  drivers/char/n_tty.c |    1 +
 >  1 files changed, 1 insertions(+), 0 deletions(-)
 > 
 > diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
 > index 973be2f..7fa3452 100644
 > --- a/drivers/char/n_tty.c
 > +++ b/drivers/char/n_tty.c
 > @@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
 >  
 >  static inline int input_available_p(struct tty_struct *tty, int amt)
 >  {
 > +	n_tty_set_room(tty);
 >  	tty_flush_to_ldisc(tty);
 >  	if (tty->icanon) {
 >  		if (tty->canon_data)
 > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/