[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2d31020f-2b94-e4ad-1100-378778424b12@huawei.com>
Date: Mon, 12 Jul 2021 15:57:45 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Russell King - ARM Linux <linux@...linux.org.uk>,
Marcin Wojtas <mw@...ihalf.com>, <linuxarm@...neuler.org>,
<yisen.zhuang@...wei.com>, "Salil Mehta" <salil.mehta@...wei.com>,
<thomas.petazzoni@...tlin.com>, <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
"Alexei Starovoitov" <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"John Fastabend" <john.fastabend@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
"Will Deacon" <will@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
"Vlastimil Babka" <vbabka@...e.cz>, <fenghua.yu@...el.com>,
<guro@...com>, Peter Xu <peterx@...hat.com>,
Feng Tang <feng.tang@...el.com>,
Jason Gunthorpe <jgg@...pe.ca>,
Matteo Croce <mcroce@...rosoft.com>,
Hugh Dickins <hughd@...gle.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"Alexander Lobakin" <alobakin@...me>,
Willem de Bruijn <willemb@...gle.com>, <wenxu@...oud.cn>,
Cong Wang <cong.wang@...edance.com>,
Kevin Hao <haokexin@...il.com>, <nogikh@...gle.com>,
Marco Elver <elver@...gle.com>,
Netdev <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>
Subject: Re: [PATCH rfc v2 4/5] page_pool: support page frag API for page pool
On 2021/7/11 1:43, Alexander Duyck wrote:
> On Sat, Jul 10, 2021 at 12:44 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>
>> Currently each desc use a whole page to do ping-pong page
>> reusing in most driver. As the page pool has support page
>> recycling based on elevated refcnt, it makes sense to add
>> a page frag API in page pool to split a page to different
>> frag to serve multi descriptions.
>>
>> This means a huge memory saving for kernel with page size of
>> 64K, as a page can be used by 32 descriptions with 2k buffer
>> size, comparing to each desc using one page currently.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
>> ---
>> include/net/page_pool.h | 14 ++++++++++++++
>> net/core/page_pool.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 63 insertions(+)
>>
>> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
>> index f0e708d..06a5e43 100644
>> --- a/include/net/page_pool.h
>> +++ b/include/net/page_pool.h
>> @@ -80,6 +80,7 @@ struct page_pool_params {
>> enum dma_data_direction dma_dir; /* DMA mapping direction */
>> unsigned int max_len; /* max DMA sync memory size */
>> unsigned int offset; /* DMA addr offset */
>> + unsigned int frag_size;
>> };
>>
>> struct page_pool {
>> @@ -91,6 +92,8 @@ struct page_pool {
>> unsigned long defer_warn;
>>
>> u32 pages_state_hold_cnt;
>> + unsigned int frag_offset;
>> + struct page *frag_page;
>>
>> /*
>> * Data structure for allocation side
>> @@ -140,6 +143,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
>> return page_pool_alloc_pages(pool, gfp);
>> }
>>
>> +struct page *page_pool_alloc_frag(struct page_pool *pool,
>> + unsigned int *offset, gfp_t gfp);
>> +
>> +static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
>> + unsigned int *offset)
>> +{
>> + gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
>> +
>> + return page_pool_alloc_frag(pool, offset, gfp);
>> +}
>> +
>> /* get the stored dma direction. A driver might decide to treat this locally and
>> * avoid the extra cache line from page_pool to determine the direction
>> */
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index a87cbe1..b787033 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -350,6 +350,53 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>> }
>> EXPORT_SYMBOL(page_pool_alloc_pages);
>>
>> +struct page *page_pool_alloc_frag(struct page_pool *pool,
>> + unsigned int *offset, gfp_t gfp)
>> +{
>> + unsigned int frag_offset = pool->frag_offset;
>> + unsigned int frag_size = pool->p.frag_size;
>> + struct page *frag_page = pool->frag_page;
>> + unsigned int max_len = pool->p.max_len;
>> +
>> + if (!frag_page || frag_offset + frag_size > max_len) {
>> + frag_page = page_pool_alloc_pages(pool, gfp);
>> + if (unlikely(!frag_page)) {
>> + pool->frag_page = NULL;
>> + return NULL;
>> + }
>> +
>> + pool->frag_page = frag_page;
>> + frag_offset = 0;
>> +
>> + page_pool_sub_bias(pool, frag_page,
>> + max_len / frag_size - 1);
>> + }
>> +
>> + *offset = frag_offset;
>> + pool->frag_offset = frag_offset + frag_size;
>> +
>> + return frag_page;
>> +}
>> +EXPORT_SYMBOL(page_pool_alloc_frag);
>
> I'm still not a fan of the fixed implementation. For the cost of the
> division as I said before you could make this flexible like
> page_frag_alloc_align and just decrement the bias by one per
> allocation instead of trying to batch it.
>
> I'm sure there would likely be implementations that might need to
> operate at two different sizes, for example a header and payload size.
Will try to implement the frag allocation of different sizes
in new version.
>
>> +static void page_pool_empty_frag(struct page_pool *pool)
>> +{
>> + unsigned int frag_offset = pool->frag_offset;
>> + unsigned int frag_size = pool->p.frag_size;
>> + struct page *frag_page = pool->frag_page;
>> + unsigned int max_len = pool->p.max_len;
>> +
>> + if (!frag_page)
>> + return;
>> +
>> + while (frag_offset + frag_size <= max_len) {
>> + page_pool_put_full_page(pool, frag_page, false);
>> + frag_offset += frag_size;
>> + }
>> +
>
> Having to call this to free the page seems confusing. Rather than
> reserving multiple and having to free the page multiple times I really
> think you would be better off just holding one bias reservation on the
> page at a time.
will remove the above freeing the page multiple times.
>
>> + pool->frag_page = NULL;
>> +}
>> +
>> /* Calculate distance between two u32 values, valid if distance is below 2^(31)
>> * https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
>> */
>> @@ -670,6 +717,8 @@ void page_pool_destroy(struct page_pool *pool)
>> if (!page_pool_put(pool))
>> return;
>>
>> + page_pool_empty_frag(pool);
>> +
>> if (!page_pool_release(pool))
>> return;
>>
>> --
>> 2.7.4
>>
> .
>
Powered by blists - more mailing lists