lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1636548651-44649-2-git-send-email-guwen@linux.alibaba.com>
Date:   Wed, 10 Nov 2021 20:50:50 +0800
From:   Wen Gu <guwen@...ux.alibaba.com>
To:     kgraul@...ux.ibm.com, tonylu@...ux.alibaba.com
Cc:     davem@...emloft.net, kuba@...nel.org, linux-s390@...r.kernel.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        dust.li@...ux.alibaba.com, xuanzhuo@...ux.alibaba.com,
        guwen@...ux.alibaba.com
Subject: [RFC PATCH net v2 1/2] net/smc: Fix socket wait queue mismatch issue caused by fallback

There may be a mismatch of socket wait queue caused by SMC fallback.

When user applications used SMC to replace TCP, they might add
an eppoll_entry into smc socket->wq and expect to be notified if
epoll events like EPOLL_IN/EPOLL_OUT occurred on the smc socket.

However, once a fallback occurred, user applications start to use
clcsocket. So it is clcsocket->wq instead of smc socket->wq which
is woken up when there is data to be processed or sending space
available. Thus the eppoll_entry which was added into smc socket->wq
before fallback does not work anymore.

[if not fallback]
   application
        ^
      (poll)
        |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        | (data flow)             |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                        (can't poll anything)
                                  |
  smc socket->wq            clcsocket->wq
(has eppoll_entry)                .
        .                         .
        .                         .
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

For example, in nginx/wrk benchmark, this issue will cause an
all-zeros test result:

server: nginx -g 'daemon off;'
client: smc_run wrk -c 1 -t 1 -d 5 http://11.200.15.93/index.html

  Running 5s test @ http://11.200.15.93/index.html
     1 threads and 1 connections
     Thread Stats   Avg      Stdev     Max   ± Stdev
     	Latency     0.00us    0.00us   0.00us    -nan%
	Req/Sec     0.00      0.00     0.00      -nan%
	0 requests in 5.00s, 0.00B read
     Requests/sec:      0.00
     Transfer/sec:       0.00B

The main idea of this patch is that since wait queue is switched
from smc socket->wq to clcsocket->wq after fallback, disabling the
eppoll_entry in smc socket->wq, maybe we can try to use clcsocket->wq
from the beginning.

So this patch set smc socket->sk->sk_wq to clcsocket->wq when smc
socket is created and reset clcsocket->sk->sk_wq to clcsocket->wq
once fallback occurrs, making clcsocket->wq the constant wait queue
that user applications use. Thus the user application can be
notified by the eppoll_entry in clcsocket->wq no matter whether
a fallback occurrs.

After this patch:

[if not fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   . .   `
                  .   `    `   .
               `                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
        ^                         ^
        |(data flow)              |(tcp handshake in rendezvous)
        |                         |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

[if fallback]
   application <------------------|
                                (poll)
                                  |
  smc socket->wq            clcsocket->wq
        .                 (has eppoll_entry)
            `   .                 .
                    `   .         .
                           `   .  .
                                  `
smc socket->sk->sk_wq    clcsocket->sk->sk_wq
                                  ^
                                  |(data flow)
                                  |
|-------------------------------------------|
|   sk_data_ready / sk_write_space ..       |
|-------------------------------------------|

Link: https://lore.kernel.org/all/db9acf73-abef-209e-6ec2-8ada92e2cfbc@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@...ux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@...ux.alibaba.com>
---
v1 -> v2
- Add the complete description about the intention of initial patch.
---
 net/smc/af_smc.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 0cf7ed2..fe5a2cd 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -566,6 +566,20 @@ static void smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
 	smc->fallback_rsn = reason_code;
 	smc_stat_fallback(smc);
 	trace_smc_switch_to_fallback(smc, reason_code);
+
+	/* We swapped sk->sk_wq of clcsocket and smc socket in smc_create()
+	 * to prevent mismatch of wait queue caused by fallback.
+	 *
+	 * If fallback really occurred, user application starts to use clcsocket.
+	 * Accordingly, clcsocket->sk->sk_wq should be reset to clcsocket->wq
+	 * in order that user application still uses the same wait queue as what
+	 * was used before fallback.
+	 *
+	 * After fallback, the relation between socket->wq and socket->sk->sk_wq is:
+	 * - clcsocket->sk->sk_wq  -->  clcsocket->wq
+	 * - smcsocket->sk->sk_wq  -->  clcsocket->wq
+	 */
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->clcsock->wq);
 	if (smc->sk.sk_socket && smc->sk.sk_socket->file) {
 		smc->clcsock->file = smc->sk.sk_socket->file;
 		smc->clcsock->file->private_data = smc->clcsock;
@@ -2174,6 +2188,12 @@ static int smc_accept(struct socket *sock, struct socket *new_sock,
 	if (rc)
 		goto out;
 
+	/* The new accepted smc sock sets sk->sk_wq to clcsocket->wq like what
+	 * smc_create() did when the fallback does not occur.
+	 */
+	if (!smc_sk(nsk)->use_fallback)
+		rcu_assign_pointer(nsk->sk_wq, &smc_sk(nsk)->clcsock->wq);
+
 	if (lsmc->sockopt_defer_accept && !(flags & O_NONBLOCK)) {
 		/* wait till data arrives on the socket */
 		timeo = msecs_to_jiffies(lsmc->sockopt_defer_accept *
@@ -2310,8 +2330,15 @@ static __poll_t smc_poll(struct file *file, struct socket *sock,
 		mask = smc->clcsock->ops->poll(file, smc->clcsock, wait);
 		sk->sk_err = smc->clcsock->sk->sk_err;
 	} else {
-		if (sk->sk_state != SMC_CLOSED)
-			sock_poll_wait(file, sock, wait);
+		if (sk->sk_state != SMC_CLOSED) {
+			/* SMC always uses clcsocket->wq for the call to
+			 * sock_poll_wait(), which is the same wait queue
+			 * as what smc socket->sk->sk_wq points to before
+			 * fallback or what clcsocket->sk->sk_wq points to
+			 * after fallback.
+			 */
+			sock_poll_wait(file, smc->clcsock, wait);
+		}
 		if (sk->sk_err)
 			mask |= EPOLLERR;
 		if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
@@ -2707,6 +2734,31 @@ static int smc_create(struct net *net, struct socket *sock, int protocol,
 	smc->sk.sk_sndbuf = max(smc->clcsock->sk->sk_sndbuf, SMC_BUF_MIN_SIZE);
 	smc->sk.sk_rcvbuf = max(smc->clcsock->sk->sk_rcvbuf, SMC_BUF_MIN_SIZE);
 
+	/* To ensure that user applications consistently use the same wait queue
+	 * before and after fallback, we set smc socket->sk->sk_wq to clcsocket->wq
+	 * here and reset clcsocket->sk->sk_wq to clcsocket->wq in
+	 * smc_switch_to_fallback() if fallback occurrs, making clcsocket->wq the
+	 * constant wait queue which user applications use.
+	 *
+	 * here:
+	 * - Set smc socket->sk->sk_wq to clcsocket->wq
+	 *   So that the sk_data_ready()/sk_write_space.. will wake up clcsocket->wq
+	 *   and user applications will be notified of epoll events if they added
+	 *   eppoll_entry into clcsocket->wq through smc_poll().
+	 *
+	 * - Set clcsocket->sk->sk_wq to smc socket->wq
+	 *   Since clcsocket->wq is occupied by smcsocket->sk->sk_wq, clcsocket->
+	 *   sk->sk_wq have to use another wait queue (smc socket->wq) during TCP
+	 *   handshake or CLC messages transmission in rendezvous.
+	 *
+	 * So the relation between socket->wq and socket->sk->sk_wq before fallback is:
+	 *
+	 * - smc socket->sk->sk_wq --> clcsocket->wq
+	 * - clcsocket->sk->sk_wq  --> smc socket->wq
+	 */
+	rcu_assign_pointer(sk->sk_wq, &smc->clcsock->wq);
+	rcu_assign_pointer(smc->clcsock->sk->sk_wq, &smc->sk.sk_socket->wq);
+
 out:
 	return rc;
 }
-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ