[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOOZ5w_+5W=uYGrr3V8VjYXfo6fzs3cjJF1eoZsqcm71Q@mail.gmail.com>
Date: Mon, 14 Jul 2025 12:02:02 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Byungchul Park <byungchul@...com>
Cc: willy@...radead.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, kernel_team@...ynix.com, kuba@...nel.org,
ilias.apalodimas@...aro.org, harry.yoo@...cle.com, hawk@...nel.org,
akpm@...ux-foundation.org, davem@...emloft.net, john.fastabend@...il.com,
andrew+netdev@...n.ch, asml.silence@...il.com, toke@...hat.com,
tariqt@...dia.com, edumazet@...gle.com, pabeni@...hat.com, saeedm@...dia.com,
leon@...nel.org, ast@...nel.org, daniel@...earbox.net, david@...hat.com,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, horms@...nel.org,
linux-rdma@...r.kernel.org, bpf@...r.kernel.org, vishal.moola@...il.com,
hannes@...xchg.org, ziy@...dia.com, jackmanb@...gle.com
Subject: Re: [PATCH net-next v9 6/8] mlx4: use netmem descriptor and APIs for
page pool
On Thu, Jul 10, 2025 at 6:33 PM Byungchul Park <byungchul@...com> wrote:
>
> On Thu, Jul 10, 2025 at 11:29:35AM -0700, Mina Almasry wrote:
> > On Thu, Jul 10, 2025 at 1:28 AM Byungchul Park <byungchul@...com> wrote:
> > >
> > > To simplify struct page, the effort to separate its own descriptor from
> > > struct page is required and the work for page pool is on going.
> > >
> > > Use netmem descriptor and APIs for page pool in mlx4 code.
> > >
> > > Signed-off-by: Byungchul Park <byungchul@...com>
> > > ---
> > > drivers/net/ethernet/mellanox/mlx4/en_rx.c | 48 +++++++++++---------
> > > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 8 ++--
> > > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +-
> > > 3 files changed, 32 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > index b33285d755b9..7cf0d2dc5011 100644
> > > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> > > @@ -62,18 +62,18 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
> > > int i;
> > >
> > > for (i = 0; i < priv->num_frags; i++, frags++) {
> > > - if (!frags->page) {
> > > - frags->page = page_pool_alloc_pages(ring->pp, gfp);
> > > - if (!frags->page) {
> > > + if (!frags->netmem) {
> > > + frags->netmem = page_pool_alloc_netmems(ring->pp, gfp);
> > > + if (!frags->netmem) {
> > > ring->alloc_fail++;
> > > return -ENOMEM;
> > > }
> > > - page_pool_fragment_page(frags->page, 1);
> > > + page_pool_fragment_netmem(frags->netmem, 1);
> > > frags->page_offset = priv->rx_headroom;
> > >
> > > ring->rx_alloc_pages++;
> > > }
> > > - dma = page_pool_get_dma_addr(frags->page);
> > > + dma = page_pool_get_dma_addr_netmem(frags->netmem);
> > > rx_desc->data[i].addr = cpu_to_be64(dma + frags->page_offset);
> > > }
> > > return 0;
> > > @@ -83,10 +83,10 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
> > > struct mlx4_en_rx_ring *ring,
> > > struct mlx4_en_rx_alloc *frag)
> > > {
> > > - if (frag->page)
> > > - page_pool_put_full_page(ring->pp, frag->page, false);
> > > + if (frag->netmem)
> > > + page_pool_put_full_netmem(ring->pp, frag->netmem, false);
> > > /* We need to clear all fields, otherwise a change of priv->log_rx_info
> > > - * could lead to see garbage later in frag->page.
> > > + * could lead to see garbage later in frag->netmem.
> > > */
> > > memset(frag, 0, sizeof(*frag));
> > > }
> > > @@ -440,29 +440,33 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> > > unsigned int truesize = 0;
> > > bool release = true;
> > > int nr, frag_size;
> > > - struct page *page;
> > > + netmem_ref netmem;
> > > dma_addr_t dma;
> > >
> > > /* Collect used fragments while replacing them in the HW descriptors */
> > > for (nr = 0;; frags++) {
> > > frag_size = min_t(int, length, frag_info->frag_size);
> > >
> > > - page = frags->page;
> > > - if (unlikely(!page))
> > > + netmem = frags->netmem;
> > > + if (unlikely(!netmem))
> > > goto fail;
> > >
> > > - dma = page_pool_get_dma_addr(page);
> > > + dma = page_pool_get_dma_addr_netmem(netmem);
> > > dma_sync_single_range_for_cpu(priv->ddev, dma, frags->page_offset,
> > > frag_size, priv->dma_dir);
> > >
> > > - __skb_fill_page_desc(skb, nr, page, frags->page_offset,
> > > - frag_size);
> > > + __skb_fill_netmem_desc(skb, nr, netmem, frags->page_offset,
> > > + frag_size);
> > >
> > > truesize += frag_info->frag_stride;
> > > if (frag_info->frag_stride == PAGE_SIZE / 2) {
> > > + struct page *page = netmem_to_page(netmem);
> >
> > This cast is not safe, try to use the netmem type directly.
>
> Can it be net_iov? It already ensures it's a page-backed netmem. Why
> is that unsafe?
>
Precisely because it can be net_iov. The whole point of netmem_ref is
that it's an abstract type that can be something else underneath.
Converting the driver to netmem only to then say "well, the netmem is
just pages, cast it back" is just adding a lot of useless typecasting
and wasted cpu cycles with no benefit. In general netmem_to_page casts
in drivers are heavily discouraged.
> With netmem, page_count() and page_to_nid() cannot be used, but needed.
> Or checking 'page == NULL' after the casting works for you?
>
I'm really hoping that you can read the code and be a bit familiar
with the driver that you're modifying and suggest a solution that
works for us. netmem_to_page() followed by page_count() without a NULL
check crashes the kernel if the netmem is not a page underneath... so
it shouldn't work for anyone.
What may be acceptable here is netmem_count helper and what not like
we have netmem_is_pfmemalloc which handles net_iov correctly.
However, it looks like this driver is trying to check if the netmem is
recyclable, we have a helper like that already in
__page_pool_page_can_be_recycled. Maybe that can be reused somehow.
--
Thanks,
Mina
Powered by blists - more mailing lists