netdev - Re: Bug report: epoll can fail to report EPOLLOUT when unix datagram socket peer is closed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2bd73c15-7725-2b9a-78bb-d364b7faf6bd@akamai.com>
Date:   Wed, 11 Jul 2018 22:46:46 -0400
From:   Jason Baron <jbaron@...mai.com>
To:     Ian Lance Taylor <iant@...ang.org>, netdev@...r.kernel.org
Subject: Re: Bug report: epoll can fail to report EPOLLOUT when unix datagram
 socket peer is closed



On 06/26/2018 10:18 AM, Ian Lance Taylor wrote:
> I'm reporting what appears to be a bug in the Linux kernel's epoll
> support.  It seems that epoll appears to sometimes fail to report an
> EPOLLOUT event when the other side of an AF_UNIX/SOCK_DGRAM socket is
> closed.  This bug report started as a Go program reported at
> https://golang.org/issue/23604.  I've written a C program that
> demonstrates the same symptoms, at
> https://github.com/golang/go/issues/23604#issuecomment-398945027 .
> 
> The C program sets up an AF_UNIX/SOCK_DGRAM server and serveral
> identical clients, all running in non-blocking mode.  All the
> non-blocking sockets are added to epoll, using EPOLLET.  The server
> periodically closes and reopens its socket.  The clients look for
> ECONNREFUSED errors on their write calls, and close and reopen their
> sockets when they see one.
> 
> The clients will sometimes fill up their buffer and block with EAGAIN.
> At that point they expect the poller to return an EPOLLOUT event to
> tell them when they are ready to write again.  The expectation is that
> either the server will read data, freeing up buffer space, or will
> close the socket, which should cause the sending packets to be
> discarded, freeing up buffer space.  Generally the EPOLLOUT event
> happens.  But sometimes, the poller never returns such an event, and
> the client stalls.  In the test program this is reported as a client
> that waits more than 20 seconds to be told to continue.
> 
> A similar bug report was made, with few details, at
> https://stackoverflow.com/questions/38441059/edge-triggered-epoll-for-unix-domain-socket
> .
> 
> I've tested the program and seen the failure on kernel 4.9.0-6-amd64.
> A colleague has tested the program and seen the failure on
> 4.18.0-smp-DEV #3 SMP @1529531011 x86_64 GNU/Linux.
> 
> If there is a better way for me to report this, please let me know.
> 
> Thanks for your attention.
> 
> Ian
> 

Hi,

Thanks for the report and the test program. The patch below seems to
have cured the reproducer for me. But perhaps you can confirm?

Thanks,

-Jason


[PATCH] af_unix: ensure POLLOUT on remote close() for connected dgram socket

Applictions use ECONNREFUSED as returned from write() in order to
determine that a socket should be closed. When using connected dgram
unix sockets in a poll/write loop, this relies on POLLOUT being
signaled when the remote end closes. However, due to a race POLLOUT
can be missed when the remote closes:

          thread 1 (client)                   thread 2 (server)

connect() to server
write() returns -EAGAIN
unix_dgram_poll()
 -> unix_recvq_full() is true
                                       close()
                                        ->unix_release_sock()
                                         ->wake_up_interruptible_all()
unix_dgram_poll() (due to the
     wake_up_interruptible_all)
 -> unix_recvq_full() still is true
                                         ->free all skbs


Now thread 1 is stuck and will not receive anymore wakeups. In this
case, when thread 1 gets the -EAGAIN, it has not queued any skbs
otherwise the 'free all skbs' step would in fact cause a wakeup and
a POLLOUT return. So the race here is probably fairly rare because
it means there are no skbs that thread 1 queued and that thread 1
schedules before the 'free all skbs' step. Nevertheless, this has
been observed in the wild via syslog.

The proposed fix is to move the wake_up_interruptible_all() call
after the 'free all skbs' step.

Signed-off-by: Jason Baron <jbaron@...mai.com>
---
 net/unix/af_unix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e5473c0..de242cf 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -529,8 +529,6 @@ static void unix_release_sock(struct sock *sk, int
embrion)
 	sk->sk_state = TCP_CLOSE;
 	unix_state_unlock(sk);

-	wake_up_interruptible_all(&u->peer_wait);
-
 	skpair = unix_peer(sk);

 	if (skpair != NULL) {
@@ -560,6 +558,9 @@ static void unix_release_sock(struct sock *sk, int
embrion)
 		kfree_skb(skb);
 	}

+	/* after freeing skbs to make sure POLLOUT triggers */
+	wake_up_interruptible_all(&u->peer_wait);
+
 	if (path.dentry)
 		path_put(&path);

-- 
2.7.4