linux-kernel - Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfzP30OiBQu+YKefLD+=32t+oA6KGzkvsW6k7CMTXU8KA@mail.gmail.com>
Date:   Fri, 16 Jun 2023 10:34:02 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc:     Yunsheng Lin <linyunsheng@...wei.com>, brouer@...hat.com,
        davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Maryam Tahhan <mtahhan@...hat.com>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API

On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
>
> On 16/06/2023 13.57, Yunsheng Lin wrote:
> > On 2023/6/16 0:19, Jesper Dangaard Brouer wrote:
> >
> > ...
> >
> >> You have mentioned veth as the use-case. I know I acked adding page_pool
> >> use-case to veth, for when we need to convert an SKB into an
> >> xdp_buff/xdp-frame, but maybe it was the wrong hammer(?).
> >> In this case in veth, the size is known at the page allocation time.
> >> Thus, using the page_pool API is wasting memory.  We did this for
> >> performance reasons, but we are not using PP for what is was intended
> >> for.  We mostly use page_pool, because it an existing recycle return
> >> path, and we were too lazy to add another alloc-type (see enum
> >> xdp_mem_type).
> >>
> >> Maybe you/we can extend veth to use this dynamic size API, to show us
> >> that this is API is a better approach.  I will signup for benchmarking
> >> this (and coordinating with CC Maryam as she came with use-case we
> >> improved on).
> >
> > Thanks, let's find out if page pool is the right hammer for the
> > veth XDP case.
> >
> > Below is the change for veth using the new api in this patch.
> > Only compile test as I am not familiar enough with veth XDP and
> > testing environment for it.
> > Please try it if it is helpful.
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 614f3e3efab0..8850394f1d29 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >          if (skb_shared(skb) || skb_head_is_locked(skb) ||
> >              skb_shinfo(skb)->nr_frags ||
> >              skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> > -               u32 size, len, max_head_size, off;
> > +               u32 size, len, max_head_size, off, truesize, page_offset;
> >                  struct sk_buff *nskb;
> >                  struct page *page;
> >                  int i, head_off;
> > @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >                  if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
> >                          goto drop;
> >
> > +               size = min_t(u32, skb->len, max_head_size);
> > +               truesize = size;
> > +
> >                  /* Allocate skb head */
> > -               page = page_pool_dev_alloc_pages(rq->page_pool);
> > +               page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize);
>
> Maybe rename API to:
>
>   addr = netmem_alloc(rq->page_pool, &truesize);
>
> >                  if (!page)
> >                          goto drop;
> >
> > -               nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> > +               nskb = napi_build_skb(page_address(page) + page_offset, truesize);
>
> IMHO this illustrates that API is strange/funky.
> (I think this is what Alex Duyck is also pointing out).
>
> This is the memory (virtual) address "pointer":
>   addr = page_address(page) + page_offset
>
> This is what napi_build_skb() takes as input. (I looked at other users
> of napi_build_skb() whom all give a mem ptr "va" as arg.)
> So, why does your new API provide the "page" and not just the address?
>
> As proposed above:
>    addr = netmem_alloc(rq->page_pool, &truesize);
>
> Maybe the API should be renamed, to indicate this isn't returning a "page"?
> We have talked about the name "netmem" before.

Yeah, this is more-or-less what I was getting at. Keep in mind this is
likely the most common case since most frames passed and forth aren't
ever usually much larger than 1500B.

> >                  if (!nskb) {
> >                          page_pool_put_full_page(rq->page_pool, page, true);
> >                          goto drop;
> > @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >                  skb_copy_header(nskb, skb);
> >                  skb_mark_for_recycle(nskb);
> >
> > -               size = min_t(u32, skb->len, max_head_size);
> >                  if (skb_copy_bits(skb, 0, nskb->data, size)) {
> >                          consume_skb(nskb);
> >                          goto drop;
> > @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >                  len = skb->len - off;
> >
> >                  for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> > -                       page = page_pool_dev_alloc_pages(rq->page_pool);
> > +                       size = min_t(u32, len, PAGE_SIZE);
> > +                       truesize = size;
> > +
> > +                       page = page_pool_dev_alloc(rq->page_pool, &page_offset,
> > +                                                  &truesize);
> >                          if (!page) {
> >                                  consume_skb(nskb);
> >                                  goto drop;
> >                          }
> >
> > -                       size = min_t(u32, len, PAGE_SIZE);
> > -                       skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> > +                       skb_add_rx_frag(nskb, i, page, page_offset, size, truesize);
>
> Guess, this shows the opposite; that the "page" _is_ used by the
> existing API.

This is a sort-of. One thing that has come up as of late is that all
this stuff is being moved over to folios anyway and getting away from
pages. In addition I am not sure how often we are having to take this
path as I am not sure how many non-Tx frames end up having to have
fragments added to them. For something like veth it might be more
common though since Tx becomes Rx in this case.

One thought I had on this is that we could look at adding a new
function that abstracts this away and makes use of netmem instead.
Then the whole page/folio thing would be that much further removed.

One other question I have now that I look at this code as well. Why is
it using page_pool and not just a frag cache allocator, or pages
themselves? It doesn't seem like it has a DMA mapping to deal with
since this is essentially copy-break code. Seems problematic that
there is no DMA involved here at all. This could be more easily
handled with just a single page_frag_cache style allocator.