[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWcZkL_Uc3Stug7nspjBnrAJ-zMLm7AtYnrxPnqas23L+PN_A@mail.gmail.com>
Date: Mon, 20 Feb 2012 22:11:14 +0100
From: Egmont Koblinger <egmont@...il.com>
To: Pavel Machek <pavel@....cz>
Cc: Bruno Prémont <bonbons@...ux-vserver.org>,
Greg KH <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: Data corruption when pasting large data to terminal
Hi,
I attach a simple self-contained test case that triggers the bug most
of the time. Moreover, it turns out that we're facing a data
corruption plus a deadlock issue -- often the test triggers randomly
one of them.
The test is a slight modification of Bruno's example (thanks!). The
most important change is: it emulates a readline app by setting the
terminal to cooked mode and doing some "work" (1 millisecond of sleep)
after every newline, then reverting it to raw mode.
Minor changes also include: ignoring the last 100 bytes (potentially
an incomplete line that stays in the kernel's buffer, the slave
doesn't expect that to arrive), plus a long sleep on the master after
writing its output (ugly hack, but definitely long enough to give the
slave time to read everything).
The behavior is:
- Often: Corrupt data read (\r versus \n changes, as well as actual
loss of data), as reported by the slave.
- Often: Deadlock, the slave hangs in a read() reading from the
terminal, while the master hangs on its write() at the same time.
You can play with parameters like the buffer size, the write size
(wsz), the blocking vs. nonblocking mode of write, TCSETS versus
TCSETSW -- they don't make much of a difference.
What does make a difference though, is the read size (rsz). The bug
is reproducible if and only if the read size is a divisor of the
length of the line excluding the terminating newline (i.e. the length
of the full line minus one); that is, a divisor of 62 in this example.
So a read size of 1 (which is used by readline) triggers the bug with
all kinds of data; larger read sizes only with certain well-crafted
buffers.
Also, the bug is still only reproducible after writing at least 4kB.
This gives me a guts feeling (without actually studying the kernel's
source) that it might be some circular buffer overrun: whenever
there's only 1 byte left in the buffer, the final newline of a line,
the writer can incorrectly wrap around in a 4k buffer and override
that -- does this make any sense?
Interestingly, the test uses \n and \r reversed compare to real life
(the buffer should contain \r instead of \n, and ICRNL should be used
instead of INLCR) -- for some reason this test didn't trigger the bug
for me after swapping the two, I don't know why.
Anyway, I hope that this test case and my findings about the read size
helps catch and fix the bug.
Thanks a lot,
egmont
View attachment "ptmx2.c" of type "text/x-csrc" (6151 bytes)
Powered by blists - more mailing lists