[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c6ef4594-2d87-4fff-bee2-a09556d33274@huawei.com>
Date: Tue, 11 Mar 2025 20:25:25 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Matthew Wilcox <willy@...radead.org>,
Toke Høiland-Jørgensen <toke@...hat.com>
CC: Yunsheng Lin <yunshenglin0825@...il.com>, Andrew Morton
<akpm@...ux-foundation.org>, Jesper Dangaard Brouer <hawk@...nel.org>, Ilias
Apalodimas <ilias.apalodimas@...aro.org>, "David S. Miller"
<davem@...emloft.net>, Yonglong Liu <liuyonglong@...wei.com>, Mina Almasry
<almasrymina@...gle.com>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski
<kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
<horms@...nel.org>, <linux-mm@...ck.org>, <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH net-next] page_pool: Track DMA-mapped pages and unmap
them when destroying the pool
On 2025/3/10 23:42, Matthew Wilcox wrote:
> On Mon, Mar 10, 2025 at 10:13:32AM +0100, Toke Høiland-Jørgensen wrote:
>> Yunsheng Lin <yunshenglin0825@...il.com> writes:
>>> Also, Using the more space in 'struct page' for the page_pool seems to
>>> make page_pool more coupled to the mm subsystem, which seems to not
>>> align with the folios work that is trying to decouple non-mm subsystem
>>> from the mm subsystem by avoid other subsystem using more of the 'struct
>>> page' as metadata from the long term point of view.
>>
>> This seems a bit theoretical; any future changes of struct page would
>> have to shuffle things around so we still have the ID available,
>> obviously :)
>
> See https://kernelnewbies.org/MatthewWilcox/Memdescs
> and more immediately
> https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
>
> pagepool is going to be renamed "bump" because it's a bump allocator and
> "pagepool" is a nonsense name. I haven't looked into it in a lot of
> detail yet, but in the not-too-distant future, struct page will look
> like this (from your point of view):
>
> struct page {
> unsigned long flags;
> unsigned long memdesc;
It seems there may be memory behind the above 'memdesc' with different size
and layout for different subsystem?
I am not sure if I understand the case of the same page might be handle in
two subsystems concurrently or a page is allocated in one subsystem and
then passed to be handled in other subsystem, for examlpe:
page_pool owned page is mmap'ed into user space through tcp zero copy,
see tcp_zerocopy_vm_insert_batch(), it seems the same page is handled in
both networking/page_pool and vm subsystem?
And page->mapping seems to have been moved into 'memdesc' as there is no
'mapping' field in 'struct page' you list here? Does we need a similar
field like 'mapping' in the 'memdesc' for page_pool subsystem to support
tcp zero copy?
> int _refcount; // 0 for bump
> union {
> unsigned long private;
> atomic_t _mapcount; // maybe used by bump? not sure
> };
> };
>
> 'memdesc' will be a pointer to struct bump with the bottom four bits of
> that pointer indicating that it's a struct bump pointer (and not, say, a
> folio or a slab).
The above seems similar as what I was doing, the difference seems to be
that memory behind the above pointer is managed by page_pool itself
instead of mm subsystem allocating 'memdesc' memory from a slab cache?
>
> So if you allocate a multi-page bump, you'll get N of these pages,
> and they'll all point to the same struct bump where you'll maintain
> your actual refcount. And you'll be able to grow struct bump to your
> heart's content. I don't know exactly what struct bump looks like,
> but the core mm will have no requirements on you.
>
Powered by blists - more mailing lists