[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250717063225.GA28772@system.software.com>
Date: Thu, 17 Jul 2025 15:32:25 +0900
From: Byungchul Park <byungchul@...com>
To: Mina Almasry <almasrymina@...gle.com>
Cc: "Lobakin, Aleksander" <aleksander.lobakin@...el.com>,
willy@...radead.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kernel_team@...ynix.com, ilias.apalodimas@...aro.org,
harry.yoo@...cle.com, akpm@...ux-foundation.org,
andrew+netdev@...n.ch, asml.silence@...il.com, toke@...hat.com,
david@...hat.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
linux-rdma@...r.kernel.org, bpf@...r.kernel.org,
vishal.moola@...il.com, hannes@...xchg.org, ziy@...dia.com,
jackmanb@...gle.com, wei.fang@....com, shenwei.wang@....com,
xiaoning.wang@....com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, anthony.l.nguyen@...el.com,
przemyslaw.kitszel@...el.com, sgoutham@...vell.com,
gakula@...vell.com, sbhatta@...vell.com, hkelam@...vell.com,
bbhushan2@...vell.com, tariqt@...dia.com, ast@...nel.org,
daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com,
sdf@...ichev.me, saeedm@...dia.com, leon@...nel.org,
mbloch@...dia.com, danishanwar@...com, rogerq@...nel.org,
nbd@....name, lorenzo@...nel.org, ryder.lee@...iatek.com,
shayne.chen@...iatek.com, sean.wang@...iatek.com,
matthias.bgg@...il.com, angelogioacchino.delregno@...labora.com,
horms@...nel.org, m-malladi@...com, krzysztof.kozlowski@...aro.org,
matthias.schiffer@...tq-group.com, robh@...nel.org,
imx@...ts.linux.dev, intel-wired-lan@...ts.osuosl.org,
linux-arm-kernel@...ts.infradead.org,
linux-wireless@...r.kernel.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH net-next v10 02/12] netmem: use netmem_desc instead of
page to access ->pp in __netmem_get_pp()
On Wed, Jul 16, 2025 at 12:41:04PM -0700, Mina Almasry wrote:
> On Tue, Jul 15, 2025 at 9:51 PM Byungchul Park <byungchul@...com> wrote:
> >
> > On Tue, Jul 15, 2025 at 12:09:34PM -0700, Mina Almasry wrote:
> > > On Mon, Jul 14, 2025 at 6:36 PM Byungchul Park <byungchul@...com> wrote:
> > > >
> > > > On Mon, Jul 14, 2025 at 12:58:15PM -0700, Mina Almasry wrote:
> > > > > On Mon, Jul 14, 2025 at 12:37 PM Mina Almasry <almasrymina@...gle.com> wrote:
> > > > > >
> > > > > > On Mon, Jul 14, 2025 at 5:01 AM Byungchul Park <byungchul@...com> wrote:
> > > > > > >
> > > > > > > To eliminate the use of the page pool fields in struct page, the page
> > > > > > > pool code should use netmem descriptor and APIs instead.
> > > > > > >
> > > > > > > However, __netmem_get_pp() still accesses ->pp via struct page. So
> > > > > > > change it to use struct netmem_desc instead, since ->pp no longer will
> > > > > > > be available in struct page.
> > > > > > >
> > > > > > > While at it, add a helper, pp_page_to_nmdesc(), that can be used to
> > > > > > > extract netmem_desc from page only if it's pp page. For now that
> > > > > > > netmem_desc overlays on page, it can be achieved by just casting.
> > > > > > >
> > > > > > > Signed-off-by: Byungchul Park <byungchul@...com>
> > > > > > > ---
> > > > > > > include/net/netmem.h | 13 ++++++++++++-
> > > > > > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > > > index 535cf17b9134..2b8a7b51ac99 100644
> > > > > > > --- a/include/net/netmem.h
> > > > > > > +++ b/include/net/netmem.h
> > > > > > > @@ -267,6 +267,17 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > > > > > > return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> > > > > > > }
> > > > > > >
> > > > > > > +static inline struct netmem_desc *pp_page_to_nmdesc(struct page *page)
> > > > > > > +{
> > > > > > > + DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(page));
> > > > > > > +
> > > > > > > + /* XXX: How to extract netmem_desc from page must be changed,
> > > > > > > + * once netmem_desc no longer overlays on page and will be
> > > > > > > + * allocated through slab.
> > > > > > > + */
> > > > > > > + return (struct netmem_desc *)page;
> > > > > > > +}
> > > > > > > +
> > > > > >
> > > > > > Same thing. Do not create a generic looking pp_page_to_nmdesc helper
> > > > > > which does not check that the page is the correct type. The
> > > > > > DEBUG_NET... is not good enough.
> > > > > >
> > > > > > You don't need to add a generic helper here. There is only one call
> > > > > > site. Open code this in the callsite. The one callsite is marked as
> > > > > > unsafe, only called by code that knows that the netmem is specifically
> > > > > > a pp page. Open code this in the unsafe callsite, instead of creating
> > > > > > a generic looking unsafe helper and not even documenting it's unsafe.
> > > > > >
> > > > >
> > > > > On second read through the series, I actually now think this is a
> > > > > great idea :-) Adding this helper has simplified the series greatly. I
> > > > > did not realize you were converting entire drivers to netmem just to
> > > > > get rid of page->pp accesses. Adding a pp_page_to_nmdesc helper makes
> > > > > the entire series simpler.
> > > > >
> > > > > You're also calling it only from code paths like drivers that already
> > > > > assumed that the page is a pp page and did page->pp deference without
> > > > > a check, so this should be safe.
> > > > >
> > > > > Only thing I would change is add a comment explaining that the calling
> > > > > code needs to check the page is pp page or know it's a pp page (like a
> > > > > driver that supports pp).
> > > > >
> > > > >
> > > > > > > /**
> > > > > > > * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
> > > > > > > * @netmem: netmem reference to get the pointer from
> > > > > > > @@ -280,7 +291,7 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > > > > > > */
> > > > > > > static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
> > > > > > > {
> > > > > > > - return __netmem_to_page(netmem)->pp;
> > > > > > > + return pp_page_to_nmdesc(__netmem_to_page(netmem))->pp;
> > > > > > > }
> > > > > >
> > > > > > This makes me very sad. Casting from netmem -> page -> nmdesc...
> > > > > >
> > > > > > Instead, we should be able to go from netmem directly to nmdesc. I
> > > > > > would suggest rename __netmem_clear_lsb to netmem_to_nmdesc and have
> > > > > > it return netmem_desc instead of net_iov. Then use it here.
> > > > > >
> > > > > > We could have an unsafe version of netmem_to_nmdesc which converts the
> > > > > > netmem to netmem_desc without clearing the lsb and mark it unsafe.
> > > > > >
> > > > >
> > > > > This, I think, we should address to keep some sanity in the code and
> > > > > reduce the casts and make it a bit more maintainable.
> > > >
> > > > I will reflect your suggestions. To summarize:
> > > >
> > > > 1) The current implementation of pp_page_to_nmdesc() is good enough
> > > > to keep, but add a comment on it like "Check if the page is a pp
> > > > page before calling this function or know it's a pp page.".
> > > >
> > >
> > > Yes please.
> > >
> > > > 2) Introduce the unsafe version, __netmem_to_nmdesc(), and use it in
> > > > __netmem_get_pp().
> > > >
> > >
> > > No need following Pavel's feedback. We can just delete
> > > __netmem_get_pp. If we do find a need in the future to extract the
> > > netmem_desc from a netmem_ref, I would rather we do a straight cast
> > > from netmem_ref to netmem_desc rather than netmem_ref -> pages/net_iov
> > > -> netmem_desc.
> > >
> > > But that seems unnecessary for this series.
> >
> > No. The series should remove accessing ->pp through page.
> >
> > I will kill __netmem_get_pp() as you and I prefer. However,
> > __netmem_get_pp() users e.i. libeth_xdp_return_va() and
> > libeth_xdp_tx_fill_buf() should be altered. I will modify the code like:
> >
> > as is: __netmem_get_pp(netmem)
> > to be: __netmem_nmdesc(netmem)->pp
> >
> > Is it okay with you?
> >
>
> When Pavel and I were saying 'remove __netmem_get_pp', I think we
> meant to remove the entire concept of unsafe netmem -> page
> conversions. I think we both don't like them. From this perspective,
> __netmem_nmdesc(netmem)->pp is just as bad as __netmem_get_pp(netmem).
>
> I think since the unsafe netmem-to-page casts are already in mainline,
> lets assume they should stay there until someone feels strongly enough
> to remove them. The logic in Olek's patch is sound:
>
> https://lore.kernel.org/all/20241203173733.3181246-8-aleksander.lobakin@intel.com/
>
> Header buffer page pools do always use pages and will likely remain so
> for a long time, so I guess lets continue to support them rather than
> try to remove them in this series. A followup series could try to
> remove them.
>
> > > > 3) Rename __netmem_clear_lsb() to netmem_to_nmdesc(), and return
> > > > netmem_desc, and use it in all users of __netmem_clear_lsb().
> > > >
> > >
> > > Following Pavel's comment, this I think also is not necessary for this
> > > series. Cleaning up the return value of __netmem_clear_lsb is good
> > > work I think, but we're already on v10 of this and I think it would
> > > unnecessary to ask for added cleanups. We can do the cleanup on top.
> >
> > However, I still need to include 'introduce __netmem_nmdesc() helper'
>
> Yes.
>
> > in this series since it should be used to remove __netmem_get_pp() as I
>
> lets keep __netmem_get_pp, which does a `return
> __netmem_nmdesc(netmem)->pp;` In general we avoid allowing the driver
> to do any netmem casts in the driver code, and we do any casting in
> core.
>
> > described above. I think I'd better add netmem_nmdesc() too while at it.
> >
>
> Yes. netmem_nmdesc should replace __netmem_clear_lsb.
Even though the unsafe version is required in this series, on second
though, the safe version, netmem_nmdesc() doesn't have to be a part of
this series. Let's do adding the safe version on top after.
Byungchul
> > I assume __netmem_nmdesc() is an unsafe version not clearing lsb. The
>
> Yes.
>
> > safe version, netmem_nmdesc() needs an additional operation clearing lsb.
>
> Yes.
>
>
> --
> Thanks,
> Mina
Powered by blists - more mailing lists