[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22kf5wtxym5x3zllar7ek3onkav6nfzclf7w2lzifhebjme4jb@h4qycdqmwern>
Date: Fri, 4 Jul 2025 13:11:01 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Parav Pandit <parav@...dia.com>, Jakub Kicinski <kuba@...nel.org>
Cc: "almasrymina@...gle.com" <almasrymina@...gle.com>,
"asml.silence@...il.com" <asml.silence@...il.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
Cosmin Ratiu <cratiu@...dia.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC net-next 1/4] net: Allow non parent devices to be used for
ZC DMA
On Thu, Jul 03, 2025 at 01:58:50PM +0200, Parav Pandit wrote:
>
> > From: Jakub Kicinski <kuba@...nel.org>
> > Sent: 03 July 2025 02:23 AM
> >
[...]
> > Maybe someone with closer understanding can chime in. If the kind of
> > subfunctions you describe are expected, and there's a generic way of
> > recognizing them -- automatically going to parent of parent would indeed be
> > cleaner and less error prone, as you suggest.
>
> I am not sure when the parent of parent assumption would fail, but can be
> a good start.
>
> If netdev 8 bytes extension to store dma_dev is concern,
> probably a netdev IFF_DMA_DEV_PARENT can be elegant to refer parent->parent?
> So that there is no guess work in devmem layer.
>
> That said, my understanding of devmem is limited, so I could be mistaken here.
>
> In the long term, the devmem infrastructure likely needs to be
> modernized to support queue-level DMA mapping.
> This is useful because drivers like mlx5 already support
> socket-direct netdev that span across two PCI devices.
>
> Currently, devmem is limited to a single PCI device per netdev.
> While the buffer pool could be per device, the actual DMA
> mapping might need to be deferred until buffer posting
> time to support such multi-device scenarios.
>
> In an offline discussion, Dragos mentioned that io_uring already
> operates at the queue level, may be some ideas can be picked up
> from io_uring?
The problem for devmem is that the device based API is already set in
stone so not sure how we can change this. Maybe Mina can chime in.
To sum the conversation up, there are 2 imperfect and overlapping
solutions:
1) For the common case of having a single PCI device per netdev, going one
parent up if the parent device is not DMA capable would be a good
starting point.
2) For multi-PF netdev [0], a per-queue get_dma_dev() op would be ideal
as it provides the right PF device for the given queue. io_uring
could use this but devmem can't. Devmem could use 1. but the
driver has to detect and block the multi PF case.
I think we need both. Either that or a netdev op with an optional queue
parameter. Any thoughts?
[0] https://docs.kernel.org/networking/multi-pf-netdev.html
Thanks,
Dragos
Powered by blists - more mailing lists