linux-kernel - [PATCH 4.16 106/196] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20180422135109.726959401@linuxfoundation.org>
Date:   Sun, 22 Apr 2018 15:52:06 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Chuck Lever <chuck.lever@...cle.com>,
        Anna Schumaker <Anna.Schumaker@...app.com>
Subject: [PATCH 4.16 106/196] xprtrdma: Fix latency regression on NUMA NFS/RDMA clients

4.16-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chuck Lever <chuck.lever@...cle.com>

commit 6720a89933739cb8dec748cd253f7c8df2c0ae4d upstream.

With v4.15, on one of my NFS/RDMA clients I measured a nearly
doubling in the latency of small read and write system calls. There
was no change in server round trip time. The extra latency appears
in the whole RPC execution path.

"git bisect" settled on commit ccede7598588 ("xprtrdma: Spread reply
processing over more CPUs") .

After some experimentation, I found that leaving the WQ bound and
allowing the scheduler to pick the dispatch CPU seems to eliminate
the long latencies, and it does not introduce any new regressions.

The fix is implemented by reverting only the part of
commit ccede7598588 ("xprtrdma: Spread reply processing over more
CPUs") that dispatches RPC replies specifically on the CPU where the
matching RPC call was made.

Interestingly, saving the CPU number and later queuing reply
processing there was effective _only_ for a NFS READ and WRITE
request. On my NUMA client, in-kernel RPC reply processing for
asynchronous RPCs was dispatched on the same CPU where the RPC call
was made, as expected. However synchronous RPCs seem to get their
reply dispatched on some other CPU than where the call was placed,
every time.

Fixes: ccede7598588 ("xprtrdma: Spread reply processing over ... ")
Signed-off-by: Chuck Lever <chuck.lever@...cle.com>
Cc: stable@...r.kernel.org # v4.15+
Signed-off-by: Anna Schumaker <Anna.Schumaker@...app.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

---
 net/sunrpc/xprtrdma/rpc_rdma.c  |    2 +-
 net/sunrpc/xprtrdma/transport.c |    2 --
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 -
 3 files changed, 1 insertion(+), 4 deletions(-)

--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -1366,7 +1366,7 @@ void rpcrdma_reply_handler(struct rpcrdm

 	trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);

-	queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
+	queue_work(rpcrdma_receive_wq, &rep->rr_work);
 	return;

 out_badstatus:
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -52,7 +52,6 @@
 #include <linux/slab.h>
 #include <linux/seq_file.h>
 #include <linux/sunrpc/addr.h>
-#include <linux/smp.h>

 #include "xprt_rdma.h"

@@ -651,7 +650,6 @@ xprt_rdma_allocate(struct rpc_task *task
 	if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
 		goto out_fail;

-	req->rl_cpu = smp_processor_id();
 	req->rl_connect_cookie = 0;	/* our reserved value */
 	rpcrdma_set_xprtdata(rqst, req);
 	rqst->rq_buffer = req->rl_sendbuf->rg_base;
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -334,7 +334,6 @@ enum {
 struct rpcrdma_buffer;
 struct rpcrdma_req {
 	struct list_head	rl_list;
-	int			rl_cpu;
 	unsigned int		rl_connect_cookie;
 	struct rpcrdma_buffer	*rl_buffer;
 	struct rpcrdma_rep	*rl_reply;