lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <60483.14588.qm@web1112.biz.mail.sk1.yahoo.com>
Date:	Sat, 23 May 2009 09:27:00 -0700 (PDT)
From:	Richard Schmitt <rschmitt@...epeach.com>
To:	linux-kernel@...r.kernel.org
Subject: Re: TTY/Serial Driver Hangs in tty_wait_until_sent, analysis, and recommendations



There is a catch 22 with the linux serial driver architecture that can hang apps and can hang shells.  I can reproduce this at will on our powerpc 8313 system with a 8250 uart driver. 

The problem happens because we can output a lot of text on the console at high speed.  This will overrun the receiver which may send XOFF’s.  Because of the high speed and slow response of the serial port, the linux system does not process the XOFF right away.  Instead, the receiver keeps sendings XOFFs and BELLS to complain that it is dropping data.  There is so much data coming back from the receiver, that eventually the console’s input buffer becomes full.  Because the input buffer is full, the linux system is unable to receive a subsequent XON from the receiver. 

Now in a normal situation this would clear up because the input buffer would get processed but when bash is the program that will read the input buffer,  it may be busy trying to modify the terminal modes.  The bash shell manipulates the terminal settings very often.  When manipulating the settings, it makes a system call that waits for the terminal output buffer to be empty.  Well, since the terminal server has the console in flow control, the buffer is not going to drain right away and since the shell is waiting for the termio to complete it won’t get around to calling a read from the serial port which it needs to do in order for the serial driver to free up some input room for receiving new characters.  Only upon receiving new characters, will it detect an XON and allow the output buffer to drain.

So, the console is hung.  Control C characters and Control Q characters are not processed and there is nothing you can do short of logging on through a different port and killing the shell on the console.

We do things that compound the situation for ourselves.  One, we send lots of stuff to the console port especially when we are in debug mode.  Second, we run our serial ports at very high baud rates.  Third, our networks are incredibly congested and cause the terminal servers to back up.  Fourth, we use our console as a shell. 

I think there are two long term solutions to this: 1) have the low level serial port drivers (i.e. 8250) treat XON/XOFF out of band and have some back door into the line discipline driver to restart the serial line without needing input buffer space and 2) to change the tty_ioctl handling of a WAIT call to flush the output buffer if the port is stopped.  I figure either of these approaches alters the serial behavior enough that it is more then just a bug fix.

Our workaround is to not allow bash to run on the console, but although that will keep us from seeing the problem, it is something others may see and should know about.

Some data points we saw with gdb:

The console was in a ‘stopped’ state waiting for an XON.
The console had data to send
The console had no room to receive additional characters from the terminal server
Bash was waiting for the output buffer to drain and is blocked inside of tty_wait_until_sent
No one was doing a read on the console port.
The input buffer was filled with BELLs (^Gs)

I’ve searched the various discussion lists and although I’ve seen that folks have seen port hangs with a shell stuck in tty_wait_until_sent, I haven’t seen this analysis described nor any resolution to the problem.  I’ve decided to post this here in the event that a) someone knows of a real fix b) the analysis is new and piques someone’s interest to resolve it, c) the analysis can provide insight to others as to how to work around the problem if they see it.

I’m not on the kernel mailing list.  If you wish to respond, please include my email explicitly.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ