lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izP18q7s8=fGCjknrEu3uJE5xnQCKceB8u1VvTV5GxTTTg@mail.gmail.com>
Date: Wed, 9 Jul 2025 15:56:16 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Dragos Tatulea <dtatulea@...dia.com>
Cc: asml.silence@...il.com, Andrew Lunn <andrew+netdev@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Jens Axboe <axboe@...nel.dk>, Simona Vetter <simona.vetter@...ll.ch>, 
	Willem de Bruijn <willemb@...gle.com>, Kaiyuan Zhang <kaiyuanz@...gle.com>, cratiu@...dia.com, 
	parav@...dia.com, Tariq Toukan <tariqt@...dia.com>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, io-uring@...r.kernel.org
Subject: Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA

On Wed, Jul 9, 2025 at 12:54 PM Dragos Tatulea <dtatulea@...dia.com> wrote:
>
> On Wed, Jul 09, 2025 at 12:29:22PM -0700, Mina Almasry wrote:
> > On Wed, Jul 9, 2025 at 5:46 AM Dragos Tatulea <dtatulea@...dia.com> wrote:
> > >
> > > For zerocopy (io_uring, devmem), there is an assumption that the
> > > parent device can do DMA. However that is not always the case:
> > > ScalableFunction devices have the DMA device in the grandparent.
> > >
> > > This patch adds a helper for getting the DMA device for a netdev from
> > > its parent or grandparent if necessary. The NULL case is handled in the
> > > callers.
> > >
> > > devmem and io_uring are updated accordingly to use this helper instead
> > > of directly using the parent.
> > >
> > > Signed-off-by: Dragos Tatulea <dtatulea@...dia.com>
> > > Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
> >
> > nit: This doesn't seem like a fix? The current code supports all
> > devices that are not SF well enough, right? And in the case of SF
> > devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
> > dma mapping of a device that doesn't support it, I think, would fail
> > gracefully. So to me this seems like an improvement rather than a bug
> > fix.
> >
> dma_buf_map_attachment_unlocked() will return a sg_table with 0 nents.
> That is graceful. However this will result in page_pools that will
> always be returning errors further down the line which is very confusing
> regarding the motives that caused it.
>
> I am also fine to not make it a fix btw. Especially since the mlx5
> devmem code was just accepted.
>

If you submit another version I'd rather it be a non-fix, especially
since applying the io_uring hunk will be challenging when backporting
this patch, but I assume hunk can be dropped while backporting, so I'm
fine either way.

> > > Reviewed-by: Tariq Toukan <tariqt@...dia.com>
> > > ---
> > > Changes in v1:
> > > - Upgraded from RFC status.
> > > - Dropped driver specific bits for generic solution.
> > > - Implemented single patch as a fix as requested in RFC.
> > > - Handling of multi-PF netdevs will be handled in a subsequent patch
> > >   series.
> > >
> > > RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
> > > ---
> > >  include/linux/netdevice.h | 14 ++++++++++++++
> > >  io_uring/zcrx.c           |  2 +-
> > >  net/core/devmem.c         | 10 +++++++++-
> > >  3 files changed, 24 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 5847c20994d3..1cbde7193c4d 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
> > >                 atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
> > >  #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
> > >
> > > +static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
> > > +{
> > > +       struct device *dma_dev = dev->dev.parent;
> > > +
> > > +       if (!dma_dev)
> > > +               return NULL;
> > > +
> > > +       /* Some devices (e.g. SFs) have the dma device as a grandparent. */
> > > +       if (!dma_dev->dma_mask)
> >
> > I was able to confirm that !dev->dma_mask means "this device doesn't
> > support dma". Multiple existing places in the code seem to use this
> > check.
> >
> Ack. That was my understanding as well.
>
> > > +               dma_dev = dma_dev->parent;
> > > +
> > > +       return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;
> >
> > This may be a noob question, but are we sure that !dma_dev->dma_mask
> > && dma_dev->parent->dma_mask != NULL means that the parent is the
> > dma-device that we should use? I understand SF devices work that way
> > but it's not immediately obvious to me that this is generically true.
> >
> This is what I gathered from Parav's answer.
>
> > For example pavel came up with the case where for veth,
> > netdev->dev.parent == NULL , I wonder if there are weird devices in
> > the wild where netdev->dev.parent->dma_mask == NULL but that doesn't
> > necessarily mean that the grandparent is the dma-device that we should
> > use.
> >
> Yep.
>
> > I guess to keep my long question short: what makes you think this is
> > generically safe to do? Or is it not, but we think most devices behave
> > this way and we're going to handle more edge cases in follow up
> > patches?
> >
> It is just what we know so far about SFs. See end of mail.
>

I see. OK, even though this is 'just what we know so far', I'm still
in favor of this simple approach, but I would say it would be good to
communicate in the comments that this is a best-effort dma-device
finding and doesn't handle every case under the sun. Something like
(untested):

static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
{
       struct device *parent = dev->dev.parent;

       if (!parent)
               return NULL;

       /* For most netdevs, the parent supports dma and is the correct
        * dma-device
        */
       if (parent->dma_mask)
               return parent;

       /* For SF devices, the parent doesn't support dma, but the grandparent
        * does, and is the correct dma-device to use (link to docs that explain
        * this if any).
        */
       if (parent->parent && parent->parent->dma_mask)
               return parent->parent;

       /* If neither the parent nor grandparent support dma, then we're not
        * sure what dma-device to use. Error out. Special handling for new
        * netdevs may need to be added in the future.
        */
       return NULL;
}

With some comments explaining the logic a bit, you can add:

Reviewed-by: Mina Almasry <almasrymina@...gle.com>

And let's see if Jakub likes this. If not, we can always do the future
proof approach with the queue API giving the driver the ability to
tell us what exactly is the dma-device to use (or whatever approach he
prefers).

-- 
Thanks,
Mina

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ