lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 23 Apr 2024 12:31:23 -0700
From: Zijian Zhang <zijianzhang@...edance.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>, netdev@...r.kernel.org
Cc: edumazet@...gle.com, davem@...emloft.net, kuba@...nel.org,
 cong.wang@...edance.com, xiaochun.lu@...edance.com
Subject: Re: [External] Re: [PATCH net-next v2 1/3] selftests: fix OOM problem
 in msg_zerocopy selftest

Thanks for the suggestions, all make sense to me. I will update in the
next version.

On 4/21/24 8:16 AM, Willem de Bruijn wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@...edance.com>
>>
>> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
>> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
>> until the socket is not writable. Typically, it will start the receiving
>> process after around 30+ sendmsgs. However, because of the
>> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
>> the sender is always writable and does not get any chance to run recv
>> notifications. The selftest always exits with OUT_OF_MEMORY because the
>> memory used by opt_skb exceeds the core.sysctl_optmem_max.
>>
>> According to our experiments, this problem can be solved by open the
>> DEBUG_LOCKDEP configuration for the kernel.
> 
> Not so much solved, as mitigated.
> 
>> But it will makes the
>> notificatoins disordered even in good commits before
> 
> typo: notifications
> 
> We still have no explanation for this behavior, right. OOO
> notifications for TCP should be extremely rare.
> 
> A completion is queued when both the skb with the send() data was sent
> and freed, and the ACK arrived, freeing the clone on the retransmit
> queue. This is almost certainly some effect of running over loopback.
> 
> OOO being rare is also what makes the notification batch mechanism so
> effective for TCP.
> 
>> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").
>>
>> We introduce "cfg_notification_limit" to force sender to receive
>> notifications after some number of sendmsgs. And, notifications may not
>> come in order, because of the reason we present above. We have order
>> checking code managed by cfg_verbose.
>>
>> Signed-off-by: Zijian Zhang <zijianzhang@...edance.com>
>> Signed-off-by: Xiaochun Lu <xiaochun.lu@...edance.com>
>> ---
>>   tools/testing/selftests/net/msg_zerocopy.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
>> index bdc03a2097e8..6c18b07cab30 100644
>> --- a/tools/testing/selftests/net/msg_zerocopy.c
>> +++ b/tools/testing/selftests/net/msg_zerocopy.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0
>>   /* Evaluate MSG_ZEROCOPY
>>    *
>>    * Send traffic between two processes over one of the supported
>> @@ -85,6 +86,7 @@ static bool cfg_rx;
>>   static int  cfg_runtime_ms	= 4200;
>>   static int  cfg_verbose;
>>   static int  cfg_waittime_ms	= 500;
>> +static int  cfg_notification_limit = 32;
>>   static bool cfg_zerocopy;
>>   
>>   static socklen_t cfg_alen;
>> @@ -95,6 +97,8 @@ static char payload[IP_MAXPACKET];
>>   static long packets, bytes, completions, expected_completions;
>>   static int  zerocopied = -1;
>>   static uint32_t next_completion;
>> +/* The number of sendmsgs which have not received notified yet */
>> +static uint32_t sendmsg_counter;
>>   
>>   static unsigned long gettimeofday_ms(void)
>>   {
>> @@ -208,6 +212,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
>>   		error(1, errno, "send");
>>   	if (cfg_verbose && ret != len)
>>   		fprintf(stderr, "send: ret=%u != %u\n", ret, len);
>> +	sendmsg_counter++;
>>   
>>   	if (len) {
>>   		packets++;
>> @@ -435,7 +440,7 @@ static bool do_recv_completion(int fd, int domain)
>>   	/* Detect notification gaps. These should not happen often, if at all.
>>   	 * Gaps can occur due to drops, reordering and retransmissions.
>>   	 */
>> -	if (lo != next_completion)
>> +	if (cfg_verbose && lo != next_completion)
>>   		fprintf(stderr, "gap: %u..%u does not append to %u\n",
>>   			lo, hi, next_completion);
>>   	next_completion = hi + 1;
>> @@ -460,6 +465,7 @@ static bool do_recv_completion(int fd, int domain)
>>   static void do_recv_completions(int fd, int domain)
>>   {
>>   	while (do_recv_completion(fd, domain)) {}
>> +	sendmsg_counter = 0;
>>   }
>>   
>>   /* Wait for all remaining completions on the errqueue */
>> @@ -549,6 +555,9 @@ static void do_tx(int domain, int type, int protocol)
>>   		else
>>   			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
>>   
>> +		if (cfg_zerocopy && sendmsg_counter >= cfg_notification_limit)
>> +			do_recv_completions(fd, domain);
>> +
>>   		while (!do_poll(fd, POLLOUT)) {
>>   			if (cfg_zerocopy)
>>   				do_recv_completions(fd, domain);
>> @@ -708,7 +717,7 @@ static void parse_opts(int argc, char **argv)
>>   
>>   	cfg_payload_len = max_payload_len;
>>   
>> -	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) {
>> +	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vzl:n")) != -1) {
> 
> no n defined
> 
> please keep lexicographic order
>>   		switch (c) {
>>   		case '4':
>>   			if (cfg_family != PF_UNSPEC)
>> @@ -760,6 +769,9 @@ static void parse_opts(int argc, char **argv)
>>   		case 'z':
>>   			cfg_zerocopy = true;
>>   			break;
>> +		case 'l':
>> +			cfg_notification_limit = strtoul(optarg, NULL, 0);
>> +			break;
>>   		}
>>   	}
>>   
>> -- 
>> 2.20.1
>>
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ