lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com>
Date: Thu, 30 May 2024 17:20:12 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Gerd Bayer <gbayer@...ux.ibm.com>, Wenjia Zhang <wenjia@...ux.ibm.com>,
 Jan Karcher <jaka@...ux.ibm.com>
Cc: "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
 "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3 2/2] net/smc: Use correct buffer sizes when
 switching between TCP and SMC

> Tuning of the effective buffer size through setsockopts was working for
> SMC traffic only but not for TCP fall-back connections even before
> commit 0227f058aa29 ("net/smc: Unbind r/w buffer size from clcsock and
> make them tunable"). That change made it apparent that TCP fall-back
> connections would use net.smc.[rw]mem as buffer size instead of
> net.ipv4_tcp_[rw]mem.
> 
> Amend the code that copies attributes between the (TCP) clcsock and the
> SMC socket and adjust buffer sizes appropriately:
> - Copy over sk_userlocks so that both sockets agree on whether tuning
>   via setsockopt is active.
> - When falling back to TCP use sk_sndbuf or sk_rcvbuf as specified with
>   setsockopt. Otherwise, use the sysctl value for TCP/IPv4.
> - Likewise, use either values from setsockopt or from sysctl for SMC
>   (duplicated) on successful SMC connect.
> 
> In smc_tcp_listen_work() drop the explicit copy of buffer sizes as that
> is taken care of by the attribute copy.

[...]
> +/* if set, use value set by setsockopt() - else use IPv4 or SMC sysctl value */
> +static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk,
> +				     unsigned long mask)
> +{
> +	struct net *nnet = sock_net(nsk);
> +
> +	nsk->sk_userlocks = osk->sk_userlocks;
> +	if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) {
> +		nsk->sk_sndbuf = osk->sk_sndbuf;
> +	} else {
> +		if (mask == SK_FLAGS_SMC_TO_CLC)
> +			WRITE_ONCE(nsk->sk_sndbuf,
> +				   READ_ONCE(nnet->ipv4.sysctl_tcp_wmem[1]));

Hi Gerd,

I noticed that during TCP connection establishment, tcp_sndbuf_expand()
will tune sk->sk_sndbuf, that causes clcsock's sk_sndbuf to no longer
be sysctl_tcp_wmem[1]. But here we set it back to sysctl_tcp_wmem[1].

So I did some tests to see if the values of sk_sndbuf and sk_rcvbuf are
as expected in SMC and fallback cases (see the attached server.c and
client.c for the reproducer and here are the sysctl values in my environment)

net.ipv4.tcp_wmem = 4096        4096    16777216
net.ipv4.tcp_rmem = 4096        4096    16777216
net.smc.wmem = 65536
net.smc.rmem = 65536


1. No additional sk_{snd|rcv}buf settings

1.1 TCP

     ./server
     ./client -i <serv_ip>

     results:
     - server: sndbuf_size 87040, rcvbuf_size 4096
     - client: sndbuf_size 87040, rcvbuf_size 4096

1.2 SMC

     smc_run ./server
     smc_run ./client -i <serv_ip>

     results:
     - server: sndbuf_size 131072, rcvbuf_size 131072
     - client: sndbuf_size 131072, rcvbuf_size 131072

1.3 SMC, but server fallback

     smc_run ./server
     ./client -i <serv_ip>

     results:
     - server: sndbuf_size 87040, rcvbuf_size 4096
     - client: sndbuf_size 87040, rcvbuf_size 4096

1.4 SMC, but client fallback

     ./server
     smc_run ./client -i <serv_ip>

     results:
     - server: sndbuf_size 87040, rcvbuf_size 4096
     - client: sndbuf_size 4096, rcvbuf_size 4096    <--- I think clcsock's sk_sndbuf should
                                                          be the same as 1.1 after fallback?


2. Set server listen sock's and client sock's sk_{snd|rcv}buf
    as 16KB by setsockopt() before connection establishment.

2.1 TCP

     ./server -s 16384
     ./client -i <serv_ip> -s 16384

     results:
     - server: sndbuf_size 32768, rcvbuf_size 32768
     - client: sndbuf_size 32768, rcvbuf_size 32768

2.2 SMC

     smc_run ./server -s 16384
     smc_run ./client -i <serv_ip> -s 16384

     results:
     - server: sndbuf_size 32768, rcvbuf_size 32768
     - client: sndbuf_size 32768, rcvbuf_size 32768

2.3 SMC, but server fallback

     smc_run ./server -s 16384
     ./client -i <serv_ip> -s 16384

     results:
     - server: sndbuf_size 32768, rcvbuf_size 32768
     - client: sndbuf_size 32768, rcvbuf_size 32768

2.4 SMC, but client fallback

     ./server -s 16384
     smc_run ./client -i <serv_ip> -s 16384

     results:
     - server: sndbuf_size 32768, rcvbuf_size 32768
     - client: sndbuf_size 32768, rcvbuf_size 32768


In the above 8 sets of tests, 1.4 does not seem to meet expectations.
It is because we reset clcsock's sk_sndbuf to sysctl_tcp_wmem[1] in
smc_copy_sock_settings_to_clc(). I think it should be like 1.1 TCP values
after fallback. What do you think?

If so, we may need to avoid setting sysctl value to clcsock's sk_sndbuf
in smc_adjust_sock_bufsizes(). Furthermore, maybe all the setting-sysctl-value
can be omitted, since smc sock's and clcsock's sk_{snd|rcv}buf have been
set to sysctl value during their sock initialization (smc_sock_alloc() and
tcp_init_sock()).


And another question is why 1.3 is as expected? The direct cause is that
server does not call smc_copy_sock_settings_to_clc() when fallback, like the
client smc_connect_fallback() does. But I didn't figure out what is the
reason for the different behavior? Do you have any information? Thanks a lot!


Best regards,
Wen Gu

> +		else
> +			WRITE_ONCE(nsk->sk_sndbuf,
> +				   2 * READ_ONCE(nnet->smc.sysctl_wmem));
> +	}
> +	if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) {
> +		nsk->sk_rcvbuf = osk->sk_rcvbuf;
> +	} else {
> +		if (mask == SK_FLAGS_SMC_TO_CLC)
> +			WRITE_ONCE(nsk->sk_rcvbuf,
> +				   READ_ONCE(nnet->ipv4.sysctl_tcp_rmem[1]));
> +		else
> +			WRITE_ONCE(nsk->sk_rcvbuf,
> +				   2 * READ_ONCE(nnet->smc.sysctl_rmem));
> +	}
> +}
> +

[...]

View attachment "client.c" of type "text/plain" (3094 bytes)

View attachment "server.c" of type "text/plain" (3551 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ