[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UfzP30OiBQu+YKefLD+=32t+oA6KGzkvsW6k7CMTXU8KA@mail.gmail.com>
Date: Fri, 16 Jun 2023 10:34:02 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: Yunsheng Lin <linyunsheng@...wei.com>, brouer@...hat.com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Lorenzo Bianconi <lorenzo@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Eric Dumazet <edumazet@...gle.com>,
Maryam Tahhan <mtahhan@...hat.com>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc() API
On Fri, Jun 16, 2023 at 9:31 AM Jesper Dangaard Brouer
<jbrouer@...hat.com> wrote:
>
>
>
> On 16/06/2023 13.57, Yunsheng Lin wrote:
> > On 2023/6/16 0:19, Jesper Dangaard Brouer wrote:
> >
> > ...
> >
> >> You have mentioned veth as the use-case. I know I acked adding page_pool
> >> use-case to veth, for when we need to convert an SKB into an
> >> xdp_buff/xdp-frame, but maybe it was the wrong hammer(?).
> >> In this case in veth, the size is known at the page allocation time.
> >> Thus, using the page_pool API is wasting memory. We did this for
> >> performance reasons, but we are not using PP for what is was intended
> >> for. We mostly use page_pool, because it an existing recycle return
> >> path, and we were too lazy to add another alloc-type (see enum
> >> xdp_mem_type).
> >>
> >> Maybe you/we can extend veth to use this dynamic size API, to show us
> >> that this is API is a better approach. I will signup for benchmarking
> >> this (and coordinating with CC Maryam as she came with use-case we
> >> improved on).
> >
> > Thanks, let's find out if page pool is the right hammer for the
> > veth XDP case.
> >
> > Below is the change for veth using the new api in this patch.
> > Only compile test as I am not familiar enough with veth XDP and
> > testing environment for it.
> > Please try it if it is helpful.
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index 614f3e3efab0..8850394f1d29 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > if (skb_shared(skb) || skb_head_is_locked(skb) ||
> > skb_shinfo(skb)->nr_frags ||
> > skb_headroom(skb) < XDP_PACKET_HEADROOM) {
> > - u32 size, len, max_head_size, off;
> > + u32 size, len, max_head_size, off, truesize, page_offset;
> > struct sk_buff *nskb;
> > struct page *page;
> > int i, head_off;
> > @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
> > goto drop;
> >
> > + size = min_t(u32, skb->len, max_head_size);
> > + truesize = size;
> > +
> > /* Allocate skb head */
> > - page = page_pool_dev_alloc_pages(rq->page_pool);
> > + page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize);
>
> Maybe rename API to:
>
> addr = netmem_alloc(rq->page_pool, &truesize);
>
> > if (!page)
> > goto drop;
> >
> > - nskb = napi_build_skb(page_address(page), PAGE_SIZE);
> > + nskb = napi_build_skb(page_address(page) + page_offset, truesize);
>
> IMHO this illustrates that API is strange/funky.
> (I think this is what Alex Duyck is also pointing out).
>
> This is the memory (virtual) address "pointer":
> addr = page_address(page) + page_offset
>
> This is what napi_build_skb() takes as input. (I looked at other users
> of napi_build_skb() whom all give a mem ptr "va" as arg.)
> So, why does your new API provide the "page" and not just the address?
>
> As proposed above:
> addr = netmem_alloc(rq->page_pool, &truesize);
>
> Maybe the API should be renamed, to indicate this isn't returning a "page"?
> We have talked about the name "netmem" before.
Yeah, this is more-or-less what I was getting at. Keep in mind this is
likely the most common case since most frames passed and forth aren't
ever usually much larger than 1500B.
> > if (!nskb) {
> > page_pool_put_full_page(rq->page_pool, page, true);
> > goto drop;
> > @@ -767,7 +770,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > skb_copy_header(nskb, skb);
> > skb_mark_for_recycle(nskb);
> >
> > - size = min_t(u32, skb->len, max_head_size);
> > if (skb_copy_bits(skb, 0, nskb->data, size)) {
> > consume_skb(nskb);
> > goto drop;
> > @@ -782,14 +784,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > len = skb->len - off;
> >
> > for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
> > - page = page_pool_dev_alloc_pages(rq->page_pool);
> > + size = min_t(u32, len, PAGE_SIZE);
> > + truesize = size;
> > +
> > + page = page_pool_dev_alloc(rq->page_pool, &page_offset,
> > + &truesize);
> > if (!page) {
> > consume_skb(nskb);
> > goto drop;
> > }
> >
> > - size = min_t(u32, len, PAGE_SIZE);
> > - skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
> > + skb_add_rx_frag(nskb, i, page, page_offset, size, truesize);
>
> Guess, this shows the opposite; that the "page" _is_ used by the
> existing API.
This is a sort-of. One thing that has come up as of late is that all
this stuff is being moved over to folios anyway and getting away from
pages. In addition I am not sure how often we are having to take this
path as I am not sure how many non-Tx frames end up having to have
fragments added to them. For something like veth it might be more
common though since Tx becomes Rx in this case.
One thought I had on this is that we could look at adding a new
function that abstracts this away and makes use of netmem instead.
Then the whole page/folio thing would be that much further removed.
One other question I have now that I look at this code as well. Why is
it using page_pool and not just a frag cache allocator, or pages
themselves? It doesn't seem like it has a DMA mapping to deal with
since this is essentially copy-break code. Seems problematic that
there is no DMA involved here at all. This could be more easily
handled with just a single page_frag_cache style allocator.
Powered by blists - more mailing lists