lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 1 Nov 2010 20:39:16 +0100
From:	Arnd Bergmann <arnd@...db.de>
To:	Eli Billauer <eli@...lauer.co.il>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Greg KH <greg@...ah.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: open() on /dev/tty takes 30 seconds on 2.6.36

On Monday 01 November 2010, Eli Billauer wrote:
> Let me start with the headline: It's close() of a ttySx that takes 30 
> seconds, during which the big TTY lock is held. Now to details.

Ok, good to hear you tracked it down this far!

> Arnd Bergmann wrote:
> 
> > So modem-manager is the process you need to look at, as it holds the
> > BTM, probably for a long time. It should not do that. First you should
> > verify that when you don't have modem-manager running, the problem goes
> > away. This may indeed be related to the misdetection of a serial port.
> >   
> That test is built-in: When modem-manager stops poking around, 
> everything becomes fine again.

Ok.

> > The other thing to check is what modem-manager is actually doing. You can
> > find out by looking at /proc/<pid>/stack, where <pid> is the pid of the
> > modem-manager process, 1835 in your example.
> >   
> OK, so this is what I did eventually: I ran strace on sshd and 
> modem-manager, and I also added printk's to tell me when tty_lock() is 
> invoked, when it actually got the lock, and when tty_unlock() is 
> invoked. Pretty messy, and still.
> 
> Let's start with sshd's strace. We have a
> 01:06:20.152787 open("/dev/tty", O_RDWR|O_NOCTTY) = -1 ENXIO (No such 
> device or address) <30.008153>

That looks unrelated, as far as I can tell.

> And this is what we have in /var/log/messages:
> 
> Nov  1 01:06:20 ocho modem-manager: (ttyS2) closing serial device...
> Nov  1 01:06:20 ocho kernel: tty_lock() called
> Nov  1 01:06:20 ocho kernel: tty_lock() got lock
> Nov  1 01:06:20 ocho kernel: tty_lock() called
> Nov  1 01:06:50 ocho kernel: tty_unlock() called
> Nov  1 01:06:50 ocho kernel: tty_lock() got lock
> Nov  1 01:06:50 ocho kernel: tty_unlock() called
> Nov  1 01:06:50 ocho kernel: tty_lock() called
> 
> So we can see that someone got the lock, and then didn't release it for 
> 30 seconds, and that someone else tried to get the lock and waited for 
> it 30 seconds.

Ok, that's certainly modem-manager in the tty_release function, calling
uart_close, which has this code:

        if (port->closing_wait != ASYNC_CLOSING_WAIT_NONE) {
                /*
                 * hack: open-coded tty_wait_until_sent to avoid
                 * recursive tty_lock
                 */
                long timeout = msecs_to_jiffies(port->closing_wait);
                if (wait_event_interruptible_timeout(tty->write_wait,
                                !tty_chars_in_buffer(tty), timeout) >= 0)
                        __uart_wait_until_sent(uport, timeout);
        }

port->closing_wait gets initialized to 30 seconds, which fits your
symptom.

I'm not sure whether it blocks in wait_event or in __uart_wait_until_sent
though, both get the same timeout.

> An strace of modem-manager running under vanilla kernel 2.6.35-4 shows 
> that close() takes 30 seconds there as well. It just so happens that it 
> went unnoticed (maybe because of the changes in the locking schema?).

Exactly. The code used to hold the BKL, which gets implicitly released
during a sleeping operation like wait_event_interruptible_timeout or 
__uart_wait_until_sent. With the change to a mutex, this does not
happen any more.

When I did the change, I did not anticipate the call actually timing
out the 30 seconds.

> I should only note that all TTY close() calls take 30 seconds, and that 
> all ports get a ioctl() and extra close() after the file descriptor is 
> closed, with an EBADF as response.

This must be an unrelated bug in modem-manager.

Can you confirm that this only happens on tty1 through tty4, but not on
tty0, which is your real uart?

I think we need to find out why it's hitting the timeout most of all,
and that means understanding what the hardware on your system is actually
doing or supposed to be doing. What does "lspci -vvx" show about the
port? Can you disable it in some way?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ