[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1636548651-44649-2-git-send-email-guwen@linux.alibaba.com>
Date: Wed, 10 Nov 2021 20:50:50 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: kgraul@...ux.ibm.com, tonylu@...ux.alibaba.com
Cc: davem@...emloft.net, kuba@...nel.org, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
dust.li@...ux.alibaba.com, xuanzhuo@...ux.alibaba.com,
guwen@...ux.alibaba.com
Subject: [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback
There may be a mismatch of socket wait queue caused by SMC fallback.
When user applications used SMC to replace TCP, they might add
an eppoll_entry into smc socket->wq and expect to be notified if
epoll events like EPOLL_IN/EPOLL_OUT occurred on the smc socket.
However, once a fallback occurred, user applications start to use
clcsocket. So it is clcsocket->wq instead of smc socket->wq which
is woken up when there is data to be processed or sending space
available. Thus the eppoll_entry which was added into smc socket->wq
before fallback does not work anymore.
[if not fallback]
application
^
(poll)
|
smc socket->wq clcsocket->wq
(has eppoll_entry) .
. .
. .
smc socket->sk->sk_wq clcsocket->sk->sk_wq
^ ^
| (data flow) |(tcp handshake in rendezvous)
| |
|-------------------------------------------|
| sk_data_ready / sk_write_space .. |
|-------------------------------------------|
[if fallback]
application <------------------|
(can't poll anything)
|
smc socket->wq clcsocket->wq
(has eppoll_entry) .
. .
. .
smc socket->sk->sk_wq clcsocket->sk->sk_wq
^
|(data flow)
|
|-------------------------------------------|
| sk_data_ready / sk_write_space .. |
|-------------------------------------------|
For example, in nginx/wrk benchmark, this issue will cause an
all-zeros test result:
server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html
Running 5s test @ http://11.200.15.93/index.html
1 threads and 1 connections
Thread Stats Avg Stdev Max ± Stdev
Latency 0.00us 0.00us 0.00us -nan%
Req/Sec 0.00 0.00 0.00 -nan%
0 requests in 5.00s, 0.00B read
Requests/sec: 0.00
Transfer/sec: 0.00B
The main idea of this patch is that since wait queue is switched
from smc socket->wq to clcsocket->wq after fallback, disabling the
eppoll_entry in smc socket->wq, maybe we can try to use clcsocket->wq
from the beginning.
So this patch set smc socket->sk->sk_wq to clcsocket->wq when smc
socket is created and reset clcsocket->sk->sk_wq to clcsocket->wq
once fallback occurrs, making clcsocket->wq the constant wait queue
that user applications use. Thus the user application can be
notified by the eppoll_entry in clcsocket->wq no matter whether
a fallback occurrs.
After this patch:
[if not fallback]
application <------------------|
(poll)
|
smc socket->wq clcsocket->wq
. (has eppoll_entry)
` . .
` . . `
. ` ` .
` `
smc socket->sk->sk_wq clcsocket->sk->sk_wq
^ ^
|(data flow) |(tcp handshake in rendezvous)
| |
|-------------------------------------------|
| sk_data_ready / sk_write_space .. |
|-------------------------------------------|
[if fallback]
application <------------------|
(poll)
|
smc socket->wq clcsocket->wq
. (has eppoll_entry)
` . .
` . .
` . .
`
smc socket->sk->sk_wq clcsocket->sk->sk_wq
^
|(data flow)
|
|-------------------------------------------|
| sk_data_ready / sk_write_space .. |
|-------------------------------------------|
Link: https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@...ux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@...ux.alibaba.com>
---
v1 -> v2
- Add the complete description about the intention of initial patch.
---
net/smc/af_smc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 54 insertions(+), 2 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..fe5a2cd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -566,6 +566,20 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
smc->fallback_rsn = reason_code;
smc_stat_fallback(smc);
trace_smc_switch_to_fallback(smc, reason_code);
+
+ /* We swapped sk->sk_wq of clcsocket and smc socket in smc_create()
+ * to prevent mismatch of wait queue caused by fallback.
+ *
+ * If fallback really occurred, user application starts to use clcsocket.
+ * Accordingly, clcsocket->sk->sk_wq should be reset to clcsocket->wq
+ * in order that user application still uses the same wait queue as what
+ * was used before fallback.
+ *
+ * After fallback, the relation between socket->wq and socket->sk->sk_wq is:
+ * - clcsocket->sk->sk_wq --> clcsocket->wq
+ * - smcsocket->sk->sk_wq --> clcsocket->wq
+ */
+ rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->clcsock->wq);
if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
smc->clcsock->file = smc->sk.sk_socket->file;
smc->clcsock->file->private_data = smc->clcsock;
@@ -2174,6 +2188,12 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
if (rc)
goto out;
+ /* The new accepted smc sock sets sk->sk_wq to clcsocket->wq like what
+ * smc_create() did when the fallback does not occur.
+ */
+ if (!smc_sk(nsk)->use_fallback)
+ rcu_assign_pointer(nsk->sk_wq, &smc_sk(nsk)->clcsock->wq);
+
if (lsmc->sockopt_defer_accept && !(flags & O_NONBLOCK)) {
/* wait till data arrives on the socket */
timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept *
@@ -2310,8 +2330,15 @@ static __poll_t smc_poll(struct file *file, struct socket *sock,
mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
sk->sk_err = smc->clcsock->sk->sk_err;
} else {
- if (sk->sk_state != SMC_CLOSED)
- sock_poll_wait(file, sock, wait);
+ if (sk->sk_state != SMC_CLOSED) {
+ /* SMC always uses clcsocket->wq for the call to
+ * sock_poll_wait(), which is the same wait queue
+ * as what smc socket->sk->sk_wq points to before
+ * fallback or what clcsocket->sk->sk_wq points to
+ * after fallback.
+ */
+ sock_poll_wait(file, smc->clcsock, wait);
+ }
if (sk->sk_err)
mask |= EPOLLERR;
if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
@@ -2707,6 +2734,31 @@ static int smc_create(struct net *net, struct socket *sock, int protocol,
smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE);
smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE);
+ /* To ensure that user applications consistently use the same wait queue
+ * before and after fallback, we set smc socket->sk->sk_wq to clcsocket->wq
+ * here and reset clcsocket->sk->sk_wq to clcsocket->wq in
+ * smc_switch_to_fallback() if fallback occurrs, making clcsocket->wq the
+ * constant wait queue which user applications use.
+ *
+ * here:
+ * - Set smc socket->sk->sk_wq to clcsocket->wq
+ * So that the sk_data_ready()/sk_write_space.. will wake up clcsocket->wq
+ * and user applications will be notified of epoll events if they added
+ * eppoll_entry into clcsocket->wq through smc_poll().
+ *
+ * - Set clcsocket->sk->sk_wq to smc socket->wq
+ * Since clcsocket->wq is occupied by smcsocket->sk->sk_wq, clcsocket->
+ * sk->sk_wq have to use another wait queue (smc socket->wq) during TCP
+ * handshake or CLC messages transmission in rendezvous.
+ *
+ * So the relation between socket->wq and socket->sk->sk_wq before fallback is:
+ *
+ * - smc socket->sk->sk_wq --> clcsocket->wq
+ * - clcsocket->sk->sk_wq --> smc socket->wq
+ */
+ rcu_assign_pointer(sk->sk_wq, &smc->clcsock->wq);
+ rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->sk.sk_socket->wq);
+
out:
return rc;
}
--
1.8.3.1
Powered by blists - more mailing lists