linux-kernel - Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UfFVFa4zT2DnPZEGaHp0uh5V1u1aGymgdL4Vu8Q1VV8hQ@mail.gmail.com>
Date:   Tue, 20 Jun 2023 09:19:03 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Lorenzo Bianconi <lorenzo@...nel.org>
Cc:     Jesper Dangaard Brouer <jbrouer@...hat.com>, brouer@...hat.com,
        Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
        kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Maryam Tahhan <mtahhan@...hat.com>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API

On Sun, Jun 18, 2023 at 8:05 AM Lorenzo Bianconi <lorenzo@...nel.org> wrote:
>
> [...]
> > >
> > > Yes, precisely.
> > > I distinctly remember what I tried to poke you and Eric on this approach
> > > earlier, but I cannot find a link to that email.
> > >
> > > I would really appreciate, if you Alex, could give the approach in
> > > veth_convert_skb_to_xdp_buff() some review, as I believe that is a huge
> > > potential for improvements that will lead to large performance
> > > improvements. (I'm sure Maryam will be eager to help re-test performance
> > > for her use-cases).
> >
> > Well just looking at it the quick and dirty answer would be to look at
> > making use of something like page_frag_cache. I won't go into details
> > since it isn't too different from the frag allocator, but it is much
> > simpler since it is just doing reference count hacks instead of having
> > to do the extra overhead to keep the DMA mapping in place. The veth
> > would then just be sitting on at most an order 3 page while it is
> > waiting to fully consume it rather than waiting on a full pool of
> > pages.
>
> Hi,
>
> I did some experiments using page_frag_cache/page_frag_alloc() instead of
> page_pools in a simple environment I used to test XDP for veth driver.
> In particular, I allocate a new buffer in veth_convert_skb_to_xdp_buff() from
> the page_frag_cache in order to copy the full skb in the new one, actually
> "linearizing" the packet (since we know the original skb length).
> I run an iperf TCP connection over a veth pair where the
> remote device runs the xdp_rxq_info sample (available in the kernel source
> tree, with action XDP_PASS):
>
> TCP clietn -- v0 === v1 (xdp_rxq_info) -- TCP server
>
> net-next (page_pool):
> - MTU 1500B: ~  7.5 Gbps
> - MTU 8000B: ~ 15.3 Gbps
>
> net-next + page_frag_alloc:
> - MTU 1500B: ~  8.4 Gbps
> - MTU 8000B: ~ 14.7 Gbps
>
> It seems there is no a clear "win" situation here (at least in this environment
> and we this simple approach). Moreover:

For the 1500B packets it is a win, but for 8000B it looks like there
is a regression. Any idea what is causing it?

> - can the linearization introduce any issue whenever we perform XDP_REDIRECT
>   into a destination device?

It shouldn't. If it does it would probably point to an issue w/ the
destination driver rather than an issue with the code doing this.

> - can the page_frag_cache introduce more memory fragmentation (IIRC we were
>   experiencing this issue in mt76 before switching to page_pools).

I think it largely depends on where the packets are ending up. I know
this is the approach we are using for sockets, see
skb_page_frag_refill(). If nothing else, if you took a similar
approach to it you might be able to bypass the need for the
page_frag_cache itself, although you would likely still end up
allocating similar structures.