lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a21e5d42-5718-4633-b812-be47ec6acf65@redhat.com>
Date: Thu, 26 Jun 2025 10:31:09 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Feng Yang <yangfeng59949@....com>, stfomichev@...il.com
Cc: aleksander.lobakin@...el.com, almasrymina@...gle.com,
 asml.silence@...il.com, davem@...emloft.net, ebiggers@...gle.com,
 edumazet@...gle.com, horms@...nel.org, kerneljasonxing@...il.com,
 kuba@...nel.org, linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
 willemb@...gle.com, yangfeng@...inos.cn
Subject: Re: [PATCH] skbuff: Improve the sending efficiency of __skb_send_sock

On 6/26/25 9:50 AM, Feng Yang wrote:
> On Wed, 25 Jun 2025 11:35:55 -0700, Stanislav Fomichev <stfomichev@...il.com> wrote:
>> On 06/23, Feng Yang wrote:
>>> From: Feng Yang <yangfeng@...inos.cn>
>>>
>>> By aggregating skb data into a bvec array for transmission, when using sockmap to forward large packets,
>>> what previously required multiple transmissions now only needs a single transmission, which significantly enhances performance.
>>> For small packets, the performance remains comparable to the original level.
>>>
>>> When using sockmap for forwarding, the average latency for different packet sizes
>>> after sending 10,000 packets is as follows:
>>> size	old(us)		new(us)
>>> 512	56		55
>>> 1472	58		58
>>> 1600	106		79
>>> 3000	145		108
>>> 5000	182		123
>>>
>>> Signed-off-by: Feng Yang <yangfeng@...inos.cn>
>>> ---
>>>  net/core/skbuff.c | 112 +++++++++++++++++++++-------------------------
>>>  1 file changed, 52 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 85fc82f72d26..664443fc9baf 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3235,84 +3235,75 @@ typedef int (*sendmsg_func)(struct sock *sk, struct msghdr *msg);
>>>  static int __skb_send_sock(struct sock *sk, struct sk_buff *skb, int offset,
>>>  			   int len, sendmsg_func sendmsg, int flags)
>>>  {
>>> -	unsigned int orig_len = len;
>>>  	struct sk_buff *head = skb;
>>>  	unsigned short fragidx;
>>> -	int slen, ret;
>>> +	struct msghdr msg;
>>> +	struct bio_vec *bvec;
>>> +	int max_vecs, ret, slen;
>>> +	int bvec_count = 0;
>>> +	unsigned int copied = 0;
>>>  
>>> -do_frag_list:
>>> -
>>> -	/* Deal with head data */
>>> -	while (offset < skb_headlen(skb) && len) {
>>> -		struct kvec kv;
>>> -		struct msghdr msg;
>>> -
>>> -		slen = min_t(int, len, skb_headlen(skb) - offset);
>>> -		kv.iov_base = skb->data + offset;
>>> -		kv.iov_len = slen;
>>> -		memset(&msg, 0, sizeof(msg));
>>> -		msg.msg_flags = MSG_DONTWAIT | flags;
>>> -
>>> -		iov_iter_kvec(&msg.msg_iter, ITER_SOURCE, &kv, 1, slen);
>>> -		ret = INDIRECT_CALL_2(sendmsg, sendmsg_locked,
>>> -				      sendmsg_unlocked, sk, &msg);
>>> -		if (ret <= 0)
>>> -			goto error;
>>> +	max_vecs = skb_shinfo(skb)->nr_frags + 1; // +1 for linear data
>>> +	if (skb_has_frag_list(skb)) {
>>> +		struct sk_buff *frag_skb = skb_shinfo(skb)->frag_list;
>>>  
>>> -		offset += ret;
>>> -		len -= ret;
>>> +		while (frag_skb) {
>>> +			max_vecs += skb_shinfo(frag_skb)->nr_frags + 1; // +1 for linear data
>>> +			frag_skb = frag_skb->next;
>>> +		}
>>>  	}
>>>  
>>> -	/* All the data was skb head? */
>>> -	if (!len)
>>> -		goto out;
>>> +	bvec = kcalloc(max_vecs, sizeof(struct bio_vec), GFP_KERNEL);
>>> +	if (!bvec)
>>> +		return -ENOMEM;
>>
>> Not sure allocating memory here is a good idea. From what I can tell
>> this function is used by non-sockmap callers as well..

Adding a per packet allocation and a free is IMHO a no-go for a patch
intended to improve performances.

> Alternatively, we can use struct bio_vec bvec[size] to avoid memory allocation.

If you mean using a fixed size bio vec allocated on the stack, that
could work...

> Even if the "size" is insufficient, the unsent portion will be transmitted in the next call to `__skb_send_sock`.

... but I think this part is not acceptable, the callers may/should
already assume that partial transmissions are due to errors.

Instead I think you should loop, batching bio_vec_size tx each loop.

Side note: the patch has a few style issues:
- it should not use // for comments
- variable declaration should respect the reverse christmas tree order

and possibly you could use this refactoring to avoid the use backward
goto statement.

Thanks,

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ