linux-kernel - Re: [PATCHv2] SUNRPC: Fix a race in xs_reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1442595095.11370.13.camel@primarydata.com>
Date:	Fri, 18 Sep 2015 12:51:35 -0400
From:	Trond Myklebust <trond.myklebust@...marydata.com>
To:	"Suzuki K. Poulose" <Suzuki.Poulose@....com>,
	Jeff Layton <jlayton@...chiereds.net>
Cc:	Anna Schumaker <anna.schumaker@...app.com>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	"David S. Miller" <davem@...emloft.net>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Marc Zyngier <Marc.Zyngier@....com>
Subject: Re: [PATCHv2] SUNRPC: Fix a race in xs_reset_transport

On Fri, 2015-09-18 at 12:19 +0100, Suzuki K. Poulose wrote:
> On 16/09/15 12:17, Jeff Layton wrote:
> > On Wed, 16 Sep 2015 10:35:49 +0100
> > "Suzuki K. Poulose" <suzuki.poulose@....com> wrote:
> > 
> > > From: "Suzuki K. Poulose" <suzuki.poulose@....com>
> > > 
> 
> ...
> 
> > > +		write_unlock_bh(&sk->sk_callback_lock);
> > > +		return;
> > > +	}
> > > +	sock = transport->sock;
> > > +
> > >   	transport->inet = NULL;
> > >   	transport->sock = NULL;
> > > 
> > > @@ -833,6 +838,10 @@ static void xs_reset_transport(struct
> > > sock_xprt *transport)
> > >   	xs_restore_old_callbacks(transport, sk);
> > >   	xprt_clear_connected(xprt);
> > >   	write_unlock_bh(&sk->sk_callback_lock);
> > > +
> > > +	if (sock)
> > > +		kernel_sock_shutdown(sock, SHUT_RDWR);
> > > +
> > >   	xs_sock_reset_connection_flags(xprt);
> > > 
> > >   	trace_rpc_socket_close(xprt, sock);
> > 
> > Better, but now I'm wondering...is it problematic to restore the
> > old
> > callbacks before calling kernel_sock_shutdown? I can't quite tell
> > whether it matters in all cases.
> > 
> > It might be best to just go ahead and take the spinlock twice here.
> > Do
> > it once to clear the transport->sock pointer, call
> > kernel_sock_shutdown, and then take it again to restore the old
> > callbacks, etc.
> > 
> > I don't know though...I get the feeling there are races all over
> > the
> > place in this code. It seems like there's a similar one wrt to the
> > transport->inet pointer. It seems a little silly that we clear it
> > under
> > the sk->sk_callback_lock. You have to dereference that pointer
> > in order to get to the lock.
> > 
> > Maybe the right solution is to use an xchg to swap the inet pointer
> > with NULL so it can act as a gatekeeper. Whoever gets there first
> > does
> > the rest of the shutdown.
> > 
> > Something like this maybe? Would this also fix the original
> > problem?
> > Note that this patch is untested...
> > 
> > [PATCH] sunrpc: use xchg to fetch and clear the transport->inet
> > pointer in xs_reset_transport
> > 
> > Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@....com>
> > Signed-off-by: Jeff Layton <jeff.layton@...marydata.com>
> > ---
> >   net/sunrpc/xprtsock.c | 4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index 7be90bc1a7c2..57f79dcab493 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -813,9 +813,10 @@ static void xs_error_report(struct sock *sk)
> >   static void xs_reset_transport(struct sock_xprt *transport)
> >   {
> >   	struct socket *sock = transport->sock;
> > -	struct sock *sk = transport->inet;
> > +	struct sock *sk;
> >   	struct rpc_xprt *xprt = &transport->xprt;
> > 
> > +	sk = xchg(&transport->inet, NULL);
> >   	if (sk == NULL)
> >   		return;
> > 
> > @@ -825,7 +826,6 @@ static void xs_reset_transport(struct sock_xprt
> > *transport)
> >   	kernel_sock_shutdown(sock, SHUT_RDWR);
> > 
> >   	write_lock_bh(&sk->sk_callback_lock);
> > -	transport->inet = NULL;
> >   	transport->sock = NULL;
> > 
> >   	sk->sk_user_data = NULL;
> > 
> 
> 
> This one seemed to fix it, so if it matters :
> 
> Tested-by: Suzuki K. Poulose <suzuki.poulose@....com>


I don't think it does. It addresses a symptom, but the actual problem
is that we're running 2 parallel close() calls on the same socket
during a shutdown. That must not happen because it means we have
something else trying to use the socket while it is being freed.

I think what is happening is that we're triggering the socket autoclose
mechanism from the state change callback. You're seeing the problem
more frequently because we added the call to kernel_sock_shutdown() as
part of the socket shutdown, but AFAICS, it could still be triggered
from some external event such as a server-initiated shutdown that
happened at the same time.
In fact, looking at the code, it could even be triggered from the data
receive side of things.
Both of these things are bad, because autoclose puts the transport
struct that is being freed onto a workqueue. That again can lead to a
really bad use-after-free situation if the timing is just a little
different.

So how about the following patch? It should apply cleanly on top of the
first one (which is still needed, btw).

8<--------------------------------------------------------------------
>From 6261682c433cf57e6bff4ab57d615460dc15e75c Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@...marydata.com>
Date: Fri, 18 Sep 2015 09:52:07 -0400
Subject: [PATCH 1/2] SUNRPC: xs_sock_mark_closed() should not trigger socket
 autoclose

Trggering a socket autoclose from inside xs_tcp_state_change() is
potentially racy when it happens during the transport destroy phase,
because it restarts the xprt->task_cleanup work.

The same is true of autoclose triggered from inside xs_tcp_data_ready().

Under all conditions, it should be quite sufficient just to mark
the socket as disconnected. It will then be closed by the
transport shutdown or reconnect code.

Signed-off-by: Trond Myklebust <trond.myklebust@...marydata.com>
---
 net/sunrpc/xprtsock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index c35038511686..bb4fff4d4e5e 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -777,7 +777,6 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
 	xs_sock_reset_connection_flags(xprt);
 	/* Mark transport as closed and wake up all pending tasks */
 	xprt_disconnect_done(xprt);
-	xprt_force_disconnect(xprt);
 }
 
 /**
@@ -1068,7 +1067,7 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea
 	/* Sanity check of the record length */
 	if (unlikely(transport->tcp_reclen < 8)) {
 		dprintk("RPC:       invalid TCP record fragment length\n");
-		xs_tcp_force_close(xprt);
+		xprt_disconnect_done(xprt);
 		return;
 	}
 	dprintk("RPC:       reading TCP record fragment of length %d\n",
@@ -1149,7 +1148,7 @@ static inline void xs_tcp_read_calldir(struct sock_xprt *transport,
 		break;
 	default:
 		dprintk("RPC:       invalid request message type\n");
-		xs_tcp_force_close(&transport->xprt);
+		xprt_disconnect_done(&transport->xprt);
 	}
 	xs_tcp_check_fraghdr(transport);
 }
@@ -1283,7 +1282,7 @@ static int xs_tcp_read_callback(struct rpc_xprt *xprt,
 	if (req == NULL) {
 		spin_unlock(&xprt->transport_lock);
 		printk(KERN_WARNING "Callback slot table overflowed\n");
-		xprt_force_disconnect(xprt);
+		xprt_disconnect_done(xprt);
 		return -1;
 	}
 
@@ -1484,8 +1483,7 @@ static void xs_tcp_state_change(struct sock *sk)
 	case TCP_CLOSE_WAIT:
 		/* The server initiated a shutdown of the socket */
 		xprt->connect_cookie++;
-		clear_bit(XPRT_CONNECTED, &xprt->state);
-		xs_tcp_force_close(xprt);
+		xprt_disconnect_done(xprt);
 	case TCP_CLOSING:
 		/*
 		 * If the server closed down the connection, make sure that
-- 
2.4.3

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@...marydata.com



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/