[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5447EA9F.1070401@hurleysoftware.com>
Date: Wed, 22 Oct 2014 13:34:23 -0400
From: Peter Hurley <peter@...leysoftware.com>
To: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
CC: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org, Jiri Slaby <jslaby@...e.cz>,
linux-serial@...r.kernel.org
Subject: Re: [PATCH -next 11/27] tty: Don't release tty locks for wait queue
sanity check
On 10/22/2014 11:29 AM, One Thousand Gnomes wrote:
>> However, without needing the global tty_mutex held, the tty locks for
>> the releasing tty can now be held through the sleep. The sanity check
>> is for abnormal conditions caused by kernel bugs, not for recoverable
>> errors caused by misbehaving userspace; dropping the tty locks only
>> allows the tty state to get more sideways.
>
> An open with O_NDELAY on the closing port now appears to be able to jam
> for 2 minutes ? Peviously it would at least be released by a signal.
>
> That seems like a regression (and given the timeout is long) a bug.
This patch should only affect _really abnormal_ situations.
The only way that a tty is spinning in this loop and not getting released
is if the tty count is going to zero but some other thread is still on one
of the wait queues, but that's only possible if either:
1. the other thread never removed itself from the wait queue because it
crashed while on the wait queue, or
2. if somehow a thread is sleeping on one of the wait queues without having
passed through vfs.
IOW, since the tty count is going zero, the release in progress must be
for the last file descriptor for this tty, so how can some other thread
be on one of the wait queues without an in-use descriptor.
Both are serious errors, and the failed sanity test shows that the tty state
is corrupted; an open should not succeed as long as this is true.
It'll take some experimentation to see if the first situation is identifiable
and remediable; I'll put it on my todo list.
> Given that some code handles multiple tty devices using select and
> nonblocking opens on physical ports this one bothers me a little. The old
> behaviour wasn't right either (and actually stops Linux running some
> modem manager type tools), but the new behaviour looks worse.
>
> Probably though the right way to fix it is in the open path ?
Yes, the tty lock in tty_open() should be interruptible. I've built a matrix
of how open() races with the previous release behavior at different locking
points so that the existing outcome can be replicated (or more easily analyzed
to decide if that's the behavior we want and how/whether to change that
behavior). The sticking point right now is dealing with how ASYNC_HUP_NOTIFY
modifies the outcome of the open. This also entails significant code archaeology.
I'm also exploring making the tty count atomic so that a racing open
can prevent a concurrent release from going to final close, which will
help to minimize the time window that an open will fail with EIO.
But first, I need to push out some more patches that have been unit-tested
(and -- don't laugh -- explore why printk disables interrupts and prevents
cpu migration while calling the console drivers. Seems ok to me...)
Regards,
Peter Hurley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists