[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com>
Date: Thu, 30 May 2024 17:20:12 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Gerd Bayer <gbayer@...ux.ibm.com>, Wenjia Zhang <wenjia@...ux.ibm.com>,
Jan Karcher <jaka@...ux.ibm.com>
Cc: "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3 2/2] net/smc: Use correct buffer sizes when
switching between TCP and SMC
> Tuning of the effective buffer size through setsockopts was working for
> SMC traffic only but not for TCP fall-back connections even before
> commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and
> make them tunable"). That change made it apparent that TCP fall-back
> connections would use net.smc.[rw]mem as buffer size instead of
> net.ipv4_tcp_[rw]mem.
>
> Amend the code that copies attributes between the (TCP) clcsock and the
> SMC socket and adjust buffer sizes appropriately:
> - Copy over sk_userlocks so that both sockets agree on whether tuning
> via setsockopt is active.
> - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with
> setsockopt. Otherwise, use the sysctl value for TCP/IPv4.
> - Likewise, use either values from setsockopt or from sysctl for SMC
> (duplicated) on successful SMC connect.
>
> In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that
> is taken care of by the attribute copy.
[...]
> +/* if set, use value set by setsockopt() - else use IPv4 or SMC sysctl value */
> +static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk,
> + unsigned long mask)
> +{
> + struct net *nnet = sock_net(nsk);
> +
> + nsk->sk_userlocks = osk->sk_userlocks;
> + if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) {
> + nsk->sk_sndbuf = osk->sk_sndbuf;
> + } else {
> + if (mask == SK_FLAGS_SMC_TO_CLC)
> + WRITE_ONCE(nsk->sk_sndbuf,
> + READ_ONCE(nnet->ipv4.sysctl_tcp_wmem[1]));
Hi Gerd,
I noticed that during TCP connection establishment, tcp_sndbuf_expand()
will tune sk->sk_sndbuf, that causes clcsock's sk_sndbuf to no longer
be sysctl_tcp_wmem[1]. But here we set it back to sysctl_tcp_wmem[1].
So I did some tests to see if the values of sk_sndbuf and sk_rcvbuf are
as expected in SMC and fallback cases (see the attached server.c and
client.c for the reproducer and here are the sysctl values in my environment)
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.tcp_rmem = 4096 4096 16777216
net.smc.wmem = 65536
net.smc.rmem = 65536
1. No additional sk_{snd|rcv}buf settings
1.1 TCP
./server
./client -i <serv_ip>
results:
- server: sndbuf_size 87040, rcvbuf_size 4096
- client: sndbuf_size 87040, rcvbuf_size 4096
1.2 SMC
smc_run ./server
smc_run ./client -i <serv_ip>
results:
- server: sndbuf_size 131072, rcvbuf_size 131072
- client: sndbuf_size 131072, rcvbuf_size 131072
1.3 SMC, but server fallback
smc_run ./server
./client -i <serv_ip>
results:
- server: sndbuf_size 87040, rcvbuf_size 4096
- client: sndbuf_size 87040, rcvbuf_size 4096
1.4 SMC, but client fallback
./server
smc_run ./client -i <serv_ip>
results:
- server: sndbuf_size 87040, rcvbuf_size 4096
- client: sndbuf_size 4096, rcvbuf_size 4096 <--- I think clcsock's sk_sndbuf should
be the same as 1.1 after fallback?
2. Set server listen sock's and client sock's sk_{snd|rcv}buf
as 16KB by setsockopt() before connection establishment.
2.1 TCP
./server -s 16384
./client -i <serv_ip> -s 16384
results:
- server: sndbuf_size 32768, rcvbuf_size 32768
- client: sndbuf_size 32768, rcvbuf_size 32768
2.2 SMC
smc_run ./server -s 16384
smc_run ./client -i <serv_ip> -s 16384
results:
- server: sndbuf_size 32768, rcvbuf_size 32768
- client: sndbuf_size 32768, rcvbuf_size 32768
2.3 SMC, but server fallback
smc_run ./server -s 16384
./client -i <serv_ip> -s 16384
results:
- server: sndbuf_size 32768, rcvbuf_size 32768
- client: sndbuf_size 32768, rcvbuf_size 32768
2.4 SMC, but client fallback
./server -s 16384
smc_run ./client -i <serv_ip> -s 16384
results:
- server: sndbuf_size 32768, rcvbuf_size 32768
- client: sndbuf_size 32768, rcvbuf_size 32768
In the above 8 sets of tests, 1.4 does not seem to meet expectations.
It is because we reset clcsock's sk_sndbuf to sysctl_tcp_wmem[1] in
smc_copy_sock_settings_to_clc(). I think it should be like 1.1 TCP values
after fallback. What do you think?
If so, we may need to avoid setting sysctl value to clcsock's sk_sndbuf
in smc_adjust_sock_bufsizes(). Furthermore, maybe all the setting-sysctl-value
can be omitted, since smc sock's and clcsock's sk_{snd|rcv}buf have been
set to sysctl value during their sock initialization (smc_sock_alloc() and
tcp_init_sock()).
And another question is why 1.3 is as expected? The direct cause is that
server does not call smc_copy_sock_settings_to_clc() when fallback, like the
client smc_connect_fallback() does. But I didn't figure out what is the
reason for the different behavior? Do you have any information? Thanks a lot!
Best regards,
Wen Gu
> + else
> + WRITE_ONCE(nsk->sk_sndbuf,
> + 2 * READ_ONCE(nnet->smc.sysctl_wmem));
> + }
> + if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) {
> + nsk->sk_rcvbuf = osk->sk_rcvbuf;
> + } else {
> + if (mask == SK_FLAGS_SMC_TO_CLC)
> + WRITE_ONCE(nsk->sk_rcvbuf,
> + READ_ONCE(nnet->ipv4.sysctl_tcp_rmem[1]));
> + else
> + WRITE_ONCE(nsk->sk_rcvbuf,
> + 2 * READ_ONCE(nnet->smc.sysctl_rmem));
> + }
> +}
> +
[...]
View attachment "client.c" of type "text/plain" (3094 bytes)
View attachment "server.c" of type "text/plain" (3551 bytes)
Powered by blists - more mailing lists