[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f71d9448-70c8-8793-dc9a-0eb48a570300@huawei.com>
Date: Fri, 18 Aug 2023 16:59:14 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Jakub Kicinski <kuba@...nel.org>
CC: Mina Almasry <almasrymina@...gle.com>, <davem@...emloft.net>,
<pabeni@...hat.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Alexander Duyck <alexander.duyck@...il.com>,
Liang Chen <liangchen.linux@...il.com>,
Alexander Lobakin <aleksander.lobakin@...el.com>,
Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Jesper Dangaard Brouer <hawk@...nel.org>
Subject: Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit
arch with 64-bit DMA
On 2023/8/18 14:12, Ilias Apalodimas wrote:
> On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski <kuba@...nel.org> wrote:
>>
>> On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
>>>> Can we assume the DMA mapping of page pool is page aligned? We should
>>>> be, right?
>>>
>>> Yes
>>>
>>>> That means we're storing 12 bits of 0 at the lower end.
>>>> So even with 32b of space we can easily store addresses for 32b+12b =>
>>>> 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
>>>> problem is only in our heads?
>>>
>>> Do you mean moving the pp_frag_count there?
>>
>> Right, IIUC we don't have enough space to fit dma_addr_t and the
>> refcount, but if we store the dma addr on a shifted u32 instead
>> of using dma_addr_t explicitly - the refcount should fit?
>
> struct page looks like this:
>
> unsigned long dma_addr;
> union {
> unsigned long dma_addr_upper;
> atomic_long_t pp_frag_count;
> };
>
> So, on 32bit platforms with 64bit dma we can't support a frag count at all.
> We could either use the lower 12 bits (and have support for 4096 frags
> 'only') or do what you suggest.
Maybe we can rethink about defining a new memdesc for page used by
the network stack. As I took a glance at the Device Memory TCP v2
patchset from Mina, we may use the new memdesc to support more new
memory types, to decupple page_pool from the spcae limitation of
'stuct page', and to support frag page for 32bit platforms with
64bit dma too.
> TBH I don't love any of these and since those platforms are rare (or
> at least that's what I think), I prefer not supporting them at all.
>
>>
>>> I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
>>> overall. With Yunshengs patches such a platform would allocate a
>>> page, so why should we prevent it from splitting it internally?
>>
>> Splitting it is fine, the problem is that the refcount AKA
>> page->pp_frag_count** counts outstanding PP-aware references
>> and page->refcount counts PP-unaware references.
>>
>> If we want to use page->refcount directly we'd need to unmap
>> the page whenever drivers calls page_pool_defrag_page().
>> But the driver may assume the page is still mapped afterwards.
>
> What I am suggesting here is to not add the new
> PP_FLAG_PAGE_SPLIT_IN_DRIVER flag. If a driver wants to split pages
> internally it should create a pool without
> PP_FLAG_DMA_SYNC_DEV to begin with. The only responsibility the
I am not sure why using PP_FLAG_DMA_SYNC_DEV is conflicted with page
split if the frag page is freed with dma_sync_size being -1 and the
pool->p.max_len is setup correctly.
I was thinking about defining page_pool_put_frag() which corresponds
to page_pool_dev_alloc_frag() and page_pool_free() which corresponds
to page_pool_dev_alloc(), so that driver author does not misuse the
dma_sync_size parameter for page_pool_put_page() related API.
And PP_FLAG_PAGE_SPLIT_IN_DRIVER is used to fail the page_pool creation
when driver is using page->pp_frag_count to split page itself in 32-bit
arch with 64-bit DMA.
> driver would have is to elevate the page refcnt so page pool would not
> try to free/recycle it. Since it won't be able to allocate fragments
> we don't have to worry about the rest.
>
>> We can change the API to make this behavior explicit. Although
>> IMHO that's putting the burden of rare platforms on non-rare
>> platforms which we should avoid.
>
> Yep, agree here.
>
>>
>> ** I said it before and I will keep saying this until someone gets
>> angry at me - I really think we should rename this field because
>> the association with frags is a coincidence.
As my understanding, currently the naming is fine as it is because
page->pp_frag_count is used to indicate the number of frags a page
is split, there is only one user holding onto one frag for now.
But if there are more than one user holding onto one frag, then we
should probably rename it to something like page->pp_ref_count as
Jakub suggested before, for example below patch may enable more than
one user holding onto one frag.
https://patchwork.kernel.org/project/netdevbpf/patch/20230628121150.47778-1-liangchen.linux@gmail.com/
Powered by blists - more mailing lists