[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izM1P6d8jgyWE9wFJUJah2YFsjHP2uikDwA0vR=3QA+BXQ@mail.gmail.com>
Date: Wed, 8 Nov 2023 19:20:21 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Arnd Bergmann <arnd@...db.de>,
David Ahern <dsahern@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Shuah Khan <shuah@...nel.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Shakeel Butt <shakeelb@...gle.com>,
Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>
Subject: Re: [RFC PATCH v3 07/12] page-pool: device memory support
On Wed, Nov 8, 2023 at 2:56 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2023/11/8 5:56, Mina Almasry wrote:
> > On Tue, Nov 7, 2023 at 12:00 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >>
> >> On 2023/11/6 10:44, Mina Almasry wrote:
> >>> Overload the LSB of struct page* to indicate that it's a page_pool_iov.
> >>>
> >>> Refactor mm calls on struct page* into helpers, and add page_pool_iov
> >>> handling on those helpers. Modify callers of these mm APIs with calls to
> >>> these helpers instead.
> >>>
> >>> In areas where struct page* is dereferenced, add a check for special
> >>> handling of page_pool_iov.
> >>>
> >>> Signed-off-by: Mina Almasry <almasrymina@...gle.com>
> >>>
> >>> ---
> >>> include/net/page_pool/helpers.h | 74 ++++++++++++++++++++++++++++++++-
> >>> net/core/page_pool.c | 63 ++++++++++++++++++++--------
> >>> 2 files changed, 118 insertions(+), 19 deletions(-)
> >>>
> >>> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> >>> index b93243c2a640..08f1a2cc70d2 100644
> >>> --- a/include/net/page_pool/helpers.h
> >>> +++ b/include/net/page_pool/helpers.h
> >>> @@ -151,6 +151,64 @@ static inline struct page_pool_iov *page_to_page_pool_iov(struct page *page)
> >>> return NULL;
> >>> }
> >>>
> >>> +static inline int page_pool_page_ref_count(struct page *page)
> >>> +{
> >>> + if (page_is_page_pool_iov(page))
> >>> + return page_pool_iov_refcount(page_to_page_pool_iov(page));
> >>
> >> We have added a lot of 'if' for the devmem case, it would be better to
> >> make it more generic so that we can have more unified metadata handling
> >> for normal page and devmem. If we add another memory type here, do we
> >> need another 'if' here?
> >
> > Maybe, not sure. I'm guessing new memory types will either be pages or
> > iovs, so maybe no new if statements needed.
> >
> >> That is part of the reason I suggested using a more unified metadata for
> >> all the types of memory chunks used by page_pool.
> >
> > I think your suggestion was to use struct pages for devmem. That was
> > thoroughly considered and intensely argued about in the initial
> > conversations regarding devmem and the initial RFC, and from the
> > conclusions there it's extremely clear to me that devmem struct pages
> > are categorically a no-go.
>
> Not exactly, I was wondering if adding a more abstract structure specificly
> for page pool makes any sense, and each mem type can add its own specific
> fields, net stack only see and handle the common fields so that it does not
> care about specific mem type, and each provider only see the and handle the
> specific fields belonging to it most of the time.
>
> Ideally something like beleow:
>
> struct netmem {
> /* common fields */
> refcount_t refcount;
> struct page_pool *pp;
> ......
>
> union {
> struct devmem{
> struct dmabuf_genpool_chunk_owner *owner;
> };
>
> struct other_mem{
> ...
> ...
> };
> };
> };
>
> But untill we completely decouple the 'struct page' from the net stack,
> the above seems undoable in the near term.
Agreed everything above is undoable.
> But we might be able to do something as folio is doing now, mm subsystem
> is still seeing 'struct folio/page', but other subsystem like slab is using
> 'struct slab', and there is still some common fields shared between
> 'struct folio' and 'struct slab'.
>
In my eyes this is almost exactly what I suggested in RFC v1 and got
immediately nacked with no room to negotiate. What we did for v1 is to
allocate struct pages for dma-buf to make dma-bufs look like struct
page to mm subsystem. Almost exactly what you're describing above.
It's a no-go. I don't think renaming struct page to netmem is going to
move the needle (it also re-introduces code-churn). What I feel like I
learnt is that dma-bufs are not struct pages and can't be made to look
like one, I think.
> As the netmem patchset, is devmem able to reuse the below 'struct netmem'
> and rename it to 'struct page_pool_iov'?
I don't think so. For the reasons above, but also practically it
immediately falls apart. Consider this field in netmem:
+ * @flags: The same as the page flags. Do not use directly.
dma-buf don't have or support page-flags, and making dma-buf looks
like they support page flags or any page-like features (other than
dma_addr) seems extremely unacceptable to mm folks.
> So that 'struct page' for normal
> memory and 'struct page_pool_iov' for devmem share the common fields used
> by page pool and net stack?
Are you suggesting that we'd cast a netmem* to a page* and call core
mm APIs on it? It's basically what was happening with RFC v1, where
things that are not struct pages were made to look like struct pages.
Also, there isn't much upside for what you're suggesting, I think. For
example I can align the refcount variable in struct page_pool_iov with
the refcount in struct page so that this works:
put_page((struct page*)ppiov);
but it's a disaster. Because put_page() will call __put_page() if the
page is freed, and __put_page() will try to return the page to the
buddy allocator!
> And we might be able to reuse the 'flags',
> '_pp_mapping_pad' and '_mapcount' for specific mem provider, which is enough
> for the devmem only requiring a single pointer to point to it's
> owner?
>
All the above seems quite similar to RFC v1 again, using netmem
instead of struct page. In RFC v1 we re-used zone_device_data() for
the dma-buf owner equivalent.
> https://lkml.kernel.org/netdev/20230105214631.3939268-2-willy@infradead.org/
>
> +/**
> + * struct netmem - A memory allocation from a &struct page_pool.
> + * @flags: The same as the page flags. Do not use directly.
> + * @pp_magic: Magic value to avoid recycling non page_pool allocated pages.
> + * @pp: The page pool this netmem was allocated from.
> + * @dma_addr: Call netmem_get_dma_addr() to read this value.
> + * @dma_addr_upper: Might need to be 64-bit on 32-bit architectures.
> + * @pp_frag_count: For frag page support, not supported in 32-bit
> + * architectures with 64-bit DMA.
> + * @_mapcount: Do not access this member directly.
> + * @_refcount: Do not access this member directly. Read it using
> + * netmem_ref_count() and manipulate it with netmem_get() and netmem_put().
> + *
> + * This struct overlays struct page for now. Do not modify without a
> + * good understanding of the issues.
> + */
> +struct netmem {
> + unsigned long flags;
> + unsigned long pp_magic;
> + struct page_pool *pp;
> + /* private: no need to document this padding */
> + unsigned long _pp_mapping_pad; /* aliases with folio->mapping */
> + /* public: */
> + unsigned long dma_addr;
> + union {
> + unsigned long dma_addr_upper;
> + atomic_long_t pp_frag_count;
> + };
> + atomic_t _mapcount;
> + atomic_t _refcount;
> +};
>
> If we do that, it seems we might be able to allow net stack and page pool to see
> the metadata for devmem chunk as 'struct page', and may be able to aovid most of
> the 'if' checking in net stack and page pool?
>
> >
> > --
> > Thanks,
> > Mina
> >
> > .
> >
--
Thanks,
Mina
Powered by blists - more mailing lists