[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a80a095d-1f02-a8bf-f658-66ae114a6e4b@gmail.com>
Date: Sat, 17 Jun 2023 20:47:15 +0800
From: Yunsheng Lin <yunshenglin0825@...il.com>
To: Alexander Duyck <alexander.duyck@...il.com>,
Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: Yunsheng Lin <linyunsheng@...wei.com>, brouer@...hat.com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Lorenzo Bianconi <lorenzo@...nel.org>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Eric Dumazet <edumazet@...gle.com>,
Maryam Tahhan <mtahhan@...hat.com>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH net-next v3 3/4] page_pool: introduce page_pool_alloc()
API
On 2023/6/17 1:34, Alexander Duyck wrote:
...
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index 614f3e3efab0..8850394f1d29 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -736,7 +736,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>> if (skb_shared(skb) || skb_head_is_locked(skb) ||
>>> skb_shinfo(skb)->nr_frags ||
>>> skb_headroom(skb) < XDP_PACKET_HEADROOM) {
>>> - u32 size, len, max_head_size, off;
>>> + u32 size, len, max_head_size, off, truesize, page_offset;
>>> struct sk_buff *nskb;
>>> struct page *page;
>>> int i, head_off;
>>> @@ -752,12 +752,15 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
>>> if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
>>> goto drop;
>>>
>>> + size = min_t(u32, skb->len, max_head_size);
>>> + truesize = size;
>>> +
>>> /* Allocate skb head */
>>> - page = page_pool_dev_alloc_pages(rq->page_pool);
>>> + page = page_pool_dev_alloc(rq->page_pool, &page_offset, &truesize);
>>
>> Maybe rename API to:
>>
>> addr = netmem_alloc(rq->page_pool, &truesize);
Unless we create a subsystem called netmem, I am not sure about
the 'netmem', it seems more confusing to use it here.
>>
>>> if (!page)
>>> goto drop;
>>>
>>> - nskb = napi_build_skb(page_address(page), PAGE_SIZE);
>>> + nskb = napi_build_skb(page_address(page) + page_offset, truesize);
>>
>> IMHO this illustrates that API is strange/funky.
>> (I think this is what Alex Duyck is also pointing out).
>>
>> This is the memory (virtual) address "pointer":
>> addr = page_address(page) + page_offset
>>
>> This is what napi_build_skb() takes as input. (I looked at other users
>> of napi_build_skb() whom all give a mem ptr "va" as arg.)
>> So, why does your new API provide the "page" and not just the address?
>>
>> As proposed above:
>> addr = netmem_alloc(rq->page_pool, &truesize);
>>
>> Maybe the API should be renamed, to indicate this isn't returning a "page"?
>> We have talked about the name "netmem" before.
>
> Yeah, this is more-or-less what I was getting at. Keep in mind this is
> likely the most common case since most frames passed and forth aren't
> ever usually much larger than 1500B.
I do feel the pain here, there is why I use a per cpu 'struct
page_pool_frag' to report the result back to user so that we
can report both 'va' and 'page' to the user in the RFC of this
patchset.
IHMO, compared to the above point, it is more importance that
we have a unified implementation for both of them instead
of page frag based on the page allocator.
Currently there are three implementations for page frag:
1. mm/page_alloc.c: net stack seems to be using it in the
rx part with 'struct page_frag_cache' and the main API
being page_frag_alloc_align().
2. net/core/sock.c: net stack seems to be using it in the
tx part with 'struct page_frag' and the main API being
skb_page_frag_refill().
3. drivers/vhost/net.c: vhost seems to be using it to build
xdp frame, and it's implementation seems to be a mix of
the above two.
Acctually I have a patchset to remove the third one waiting
to send out after this one.
And I wonder if the first and second one can be unified as
one, as it seems the only user facing difference is one
returning va, and the other returning page. other difference
seems to be implementation specific, for example, one is
doing offset incrementing, and the other doing offset
decrementing.
Powered by blists - more mailing lists