[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pra89sgp.fsf@devron.myhome.or.jp>
Date: Thu, 03 Sep 2009 20:29:42 +0900
From: OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@...k.pl>,
Mikael Pettersson <mikpe@...uu.se>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Kernel Testers List <kernel-testers@...r.kernel.org>,
Alan Cox <alan@...ux.intel.com>, Greg KH <gregkh@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
>> On Tuesday 01 September 2009, Mikael Pettersson wrote:
>> >
>> > Starting with 2.6.31-rc8 and reverting
>> >
>> > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
>> > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
>> >
>> > in that order gives me a kernel that works on both x86 and powerpc64.
>> >
>> > So the bug is definitely limited to the pty buffering logic change.
>>
>> Thanks a lot for this information, adding somme CCs to the list.
>
> Mikael, is there any way to get the gcc testsuite to show the "expected"
> vs "result" cases when the failures occur, so that we can see what the
> pattern is ("it drops one character every 8kB" or something like that).
>
> However, I get the feeling that it's really the same bug that
> OGAWA-san already fixed - and that his fix just doesn't always do a 100%
> of the job.
>
> So what Ogawa did was to make sure that we flush any pending data whenever
> we;re checking "do we have any data left". He did that by calling out to
> tty_flush_to_ldisc(), which should flush the data through to the ldisc.
>
> The keyword here being "should". In flush_to_ldisc(), we have at least one
> case where we say "we'll delay it a bit more":
>
> if (!tty->receive_room) {
> schedule_delayed_work(&tty->buf.work, 1);
> break;
> }
>
> and while I think this _should_ be ok (because if there is no
> receive-room, then we'll hopefully always return non-zero from
> "input_available_p()". However, we do have this really odd case that the
> reader side will do "n_tty_set_room()" onlyl _after_ having checked for
> input_available_p(), and so maybe we do sometimes trigger the case that
>
> - input_available_p() tries to flush to the input buffer before checking
> how much data is available, by calling 'tty_flush_to_ldisc()'
>
> - but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
> is zero.
>
> - so now input_available_p will say "I don't have any data", even though
> there was data in the write buffers.
>
> - we'll notice that the other end has hung up, and return EOF/EIO.
>
> - which is very WRONG, because the other end may have hung up, but before
> it did that, it wrote data that is still in the write queues, and we
> should have returned that data.
>
> Anyway, I'm not at all sure that the "receive_room == 0" case can happen
> at all, but maybe it can. Ogawa-san?
If I'm not missing, I think it doesn't have big change with old
code. But I would need to check more deeply.
Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
why reverting buffer handling fixes the problem.
Well, anyway, I'd like to reproduce this on my machine. Could you tell
me the version of tools? I guess gcc testsuite using the gcc's source
(svn revision?), expect, dejagnu, tcl. (BTW, I'm using debian
testing. If it can be reproduced on kvm, I can install distro version
which you are using)
Thanks.
--
OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists