[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izNHXvtXF+Xftocvi+1E2hZ0v9FiTWBxaY7NWhemhPy-hQ@mail.gmail.com>
Date: Wed, 9 Jul 2025 12:29:22 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Dragos Tatulea <dtatulea@...dia.com>
Cc: asml.silence@...il.com, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Jens Axboe <axboe@...nel.dk>, Simona Vetter <simona.vetter@...ll.ch>,
Willem de Bruijn <willemb@...gle.com>, Kaiyuan Zhang <kaiyuanz@...gle.com>, cratiu@...dia.com,
parav@...dia.com, Tariq Toukan <tariqt@...dia.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, io-uring@...r.kernel.org
Subject: Re: [PATCH net] net: Allow non parent devices to be used for ZC DMA
On Wed, Jul 9, 2025 at 5:46 AM Dragos Tatulea <dtatulea@...dia.com> wrote:
>
> For zerocopy (io_uring, devmem), there is an assumption that the
> parent device can do DMA. However that is not always the case:
> ScalableFunction devices have the DMA device in the grandparent.
>
> This patch adds a helper for getting the DMA device for a netdev from
> its parent or grandparent if necessary. The NULL case is handled in the
> callers.
>
> devmem and io_uring are updated accordingly to use this helper instead
> of directly using the parent.
>
> Signed-off-by: Dragos Tatulea <dtatulea@...dia.com>
> Fixes: 170aafe35cb9 ("netdev: support binding dma-buf to netdevice")
nit: This doesn't seem like a fix? The current code supports all
devices that are not SF well enough, right? And in the case of SF
devices, I expect net_devmem_bind_dmabuf() to fail gracefully as the
dma mapping of a device that doesn't support it, I think, would fail
gracefully. So to me this seems like an improvement rather than a bug
fix.
> Reviewed-by: Tariq Toukan <tariqt@...dia.com>
> ---
> Changes in v1:
> - Upgraded from RFC status.
> - Dropped driver specific bits for generic solution.
> - Implemented single patch as a fix as requested in RFC.
> - Handling of multi-PF netdevs will be handled in a subsequent patch
> series.
>
> RFC: https://lore.kernel.org/all/20250702172433.1738947-2-dtatulea@nvidia.com/
> ---
> include/linux/netdevice.h | 14 ++++++++++++++
> io_uring/zcrx.c | 2 +-
> net/core/devmem.c | 10 +++++++++-
> 3 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5847c20994d3..1cbde7193c4d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5560,4 +5560,18 @@ extern struct net_device *blackhole_netdev;
> atomic_long_add((VAL), &(DEV)->stats.__##FIELD)
> #define DEV_STATS_READ(DEV, FIELD) atomic_long_read(&(DEV)->stats.__##FIELD)
>
> +static inline struct device *netdev_get_dma_dev(const struct net_device *dev)
> +{
> + struct device *dma_dev = dev->dev.parent;
> +
> + if (!dma_dev)
> + return NULL;
> +
> + /* Some devices (e.g. SFs) have the dma device as a grandparent. */
> + if (!dma_dev->dma_mask)
I was able to confirm that !dev->dma_mask means "this device doesn't
support dma". Multiple existing places in the code seem to use this
check.
> + dma_dev = dma_dev->parent;
> +
> + return (dma_dev && dma_dev->dma_mask) ? dma_dev : NULL;
This may be a noob question, but are we sure that !dma_dev->dma_mask
&& dma_dev->parent->dma_mask != NULL means that the parent is the
dma-device that we should use? I understand SF devices work that way
but it's not immediately obvious to me that this is generically true.
For example pavel came up with the case where for veth,
netdev->dev.parent == NULL , I wonder if there are weird devices in
the wild where netdev->dev.parent->dma_mask == NULL but that doesn't
necessarily mean that the grandparent is the dma-device that we should
use.
I guess to keep my long question short: what makes you think this is
generically safe to do? Or is it not, but we think most devices behave
this way and we're going to handle more edge cases in follow up
patches?
> +}
> +
> #endif /* _LINUX_NETDEVICE_H */
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 797247a34cb7..93462e5b2207 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -584,7 +584,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
> goto err;
> }
>
> - ifq->dev = ifq->netdev->dev.parent;
> + ifq->dev = netdev_get_dma_dev(ifq->netdev);
nit: this hunk will not apply when backporting this to trees that only
have the Fixes commit... which makes it more weird that this is
considered a fix for that, but I'm fine either way.
--
Thanks,
Mina
Powered by blists - more mailing lists