netdev - Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKhg4tKRB7XEYd-3qY4hZgAXExMunP4+hCoji_r2EBO7jK+o_A@mail.gmail.com>
Date: Mon, 3 Jul 2023 17:09:38 +0800
From: Liang Chen <liangchen.linux@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: ilias.apalodimas@...aro.org, hawk@...nel.org, kuba@...nel.org, 
	davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com, 
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next] skbuff: Optimize SKB coalescing for page pool case

On Fri, Jun 30, 2023 at 7:52 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2023/6/29 20:19, Liang Chen wrote:
> > On Thu, Jun 29, 2023 at 8:17 PM Liang Chen <liangchen.linux@...il.com> wrote:
> >>
> >> On Thu, Jun 29, 2023 at 2:53 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >>>
> >>> On 2023/6/28 20:11, Liang Chen wrote:
> >>>> In order to address the issues encountered with commit 1effe8ca4e34
> >>>> ("skbuff: fix coalescing for page_pool fragment recycling"), the
> >>>> combination of the following condition was excluded from skb coalescing:
> >>>>
> >>>> from->pp_recycle = 1
> >>>> from->cloned = 1
> >>>> to->pp_recycle = 1
> >>>>
> >>>> However, with page pool environments, the aforementioned combination can
> >>>> be quite common. In scenarios with a higher number of small packets, it
> >>>> can significantly affect the success rate of coalescing. For example,
> >>>> when considering packets of 256 bytes size, our comparison of coalescing
> >>>> success rate is as follows:
> >>>
> >>> As skb_try_coalesce() only allow coaleascing when 'to' skb is not cloned.
> >>>
> >>> Could you give more detailed about the testing when we have a non-cloned
> >>> 'to' skb and a cloned 'from' skb? As both of them should be belong to the
> >>> same flow.
> >>>
> >>> I had the below patchset trying to do something similar as this patch does:
> >>> https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/
> >>>
> >>> It seems this patch is only trying to optimize a specific case for skb
> >>> coalescing, So if skb coalescing between non-cloned and cloned skb is a
> >>> common case, then it might worth optimizing.
> >>>
> >>
> >> Sure, Thanks for the information! The testing is just a common iperf
> >> test as below.
> >>
> >> iperf3 -c <server IP> -i 5 -f g -t 0 -l 128
> >>
> >> We observed the frequency of each combination of the pp (page pool)
> >> and clone condition when entering skb_try_coalesce. The results
> >> motivated us to propose such an optimization, as we noticed that case
> >> 11 (from pp/clone=1/1 and to pp/clone = 1/0) occurs quite often.
> >>
> >> +-------------+--------------+--------------+--------------+--------------+
> >> |   from/to   | pp/clone=0/0 | pp/clone=0/1 | pp/clone=1/0 | pp/clone=1/1 |
> >> +-------------+--------------+--------------+--------------+--------------+
> >> |pp/clone=0/0 | 0            | 1            | 2            | 3            |
> >> |pp/clone=0/1 | 4            | 5            | 6            | 7            |
> >> |pp/clone=1/0 | 8            | 9            | 10           | 11           |
> >> |pp/clone=1/1 | 12           | 13           | 14           | 15           |
> >> |+------------+--------------+--------------+--------------+--------------+
>
>
> I run the iperf test, it seems there is only one skb_clone() calling for each
> round, and I was using 'iperf', not 'iperf3'.
> Is there any app like tcpdump running? It seems odd that the skb from the rx
> need to be cloned for a common iperf test, which app or configuration is causing
> the cloning?
>
> Maybe using the ftrace to see the skb_clone() calling?
> echo skb_clone > set_ftrace_filter
> echo function > current_tracer

Thanks for raising the concerns. We did some investigation into the
cause of skb cloning. The result is that in our environment (fedora 37
default network setup) NetworkMananger creates a SOCK_DGRAM socket,
which eventually leads to the additional packet_type (struct
packet_sock.prot_hook) being registered, thus the cloning. Since
__netif_receive_skb_core iterates through orig_dev->ptype_specific for
all possible skb delivery targets and increases skb->users
accordingly.

We will update the commit message to include this information to point
out that the figures are specific to our environment. But there are
many possible reasons skbs can be cloned, and improvements in this
code path can still bring benefits.

Thanks,
Liang