lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <CY8PR12MB719584B0D85424AC2495CCC6DC4EA@CY8PR12MB7195.namprd12.prod.outlook.com>
Date: Tue, 8 Jul 2025 08:52:54 +0000
From: Parav Pandit <parav@...dia.com>
To: Mina Almasry <almasrymina@...gle.com>, Dragos Tatulea
	<dtatulea@...dia.com>
CC: Jakub Kicinski <kuba@...nel.org>, "asml.silence@...il.com"
	<asml.silence@...il.com>, Andrew Lunn <andrew+netdev@...n.ch>, "David S.
 Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Paolo
 Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, Saeed Mahameed
	<saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Cosmin Ratiu
	<cratiu@...dia.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC net-next 1/4] net: Allow non parent devices to be used for
 ZC DMA


> From: Mina Almasry <almasrymina@...gle.com>
> Sent: 08 July 2025 03:25 AM
> 
> On Mon, Jul 7, 2025 at 2:35 PM Dragos Tatulea <dtatulea@...dia.com> wrote:
> >
> > On Mon, Jul 07, 2025 at 11:44:19AM -0700, Mina Almasry wrote:
> > > On Fri, Jul 4, 2025 at 6:11 AM Dragos Tatulea <dtatulea@...dia.com>
> wrote:
> > > >
> > > > On Thu, Jul 03, 2025 at 01:58:50PM +0200, Parav Pandit wrote:
> > > > >
> > > > > > From: Jakub Kicinski <kuba@...nel.org>
> > > > > > Sent: 03 July 2025 02:23 AM
> > > > > >
> > > > [...]
> > > > > > Maybe someone with closer understanding can chime in. If the
> > > > > > kind of subfunctions you describe are expected, and there's a
> > > > > > generic way of recognizing them -- automatically going to
> > > > > > parent of parent would indeed be cleaner and less error prone, as you
> suggest.
> > > > >
> > > > > I am not sure when the parent of parent assumption would fail,
> > > > > but can be a good start.
> > > > >
> > > > > If netdev 8 bytes extension to store dma_dev is concern,
> > > > > probably a netdev IFF_DMA_DEV_PARENT can be elegant to refer
> parent->parent?
> > > > > So that there is no guess work in devmem layer.
> > > > >
> > > > > That said, my understanding of devmem is limited, so I could be
> mistaken here.
> > > > >
> > > > > In the long term, the devmem infrastructure likely needs to be
> > > > > modernized to support queue-level DMA mapping.
> > > > > This is useful because drivers like mlx5 already support
> > > > > socket-direct netdev that span across two PCI devices.
> > > > >
> > > > > Currently, devmem is limited to a single PCI device per netdev.
> > > > > While the buffer pool could be per device, the actual DMA
> > > > > mapping might need to be deferred until buffer posting time to
> > > > > support such multi-device scenarios.
> > > > >
> > > > > In an offline discussion, Dragos mentioned that io_uring already
> > > > > operates at the queue level, may be some ideas can be picked up
> > > > > from io_uring?
> > > > The problem for devmem is that the device based API is already set
> > > > in stone so not sure how we can change this. Maybe Mina can chime in.
> > > >
> > >
> > > I think what's being discussed here is pretty straight forward and
> > > doesn't need UAPI changes, right? Or were you referring to another
> > > API?
> > >
> > I was referring to the fact that devmem takes one big buffer, maps it
> > for a single device (in net_devmem_bind_dmabuf()) and then assigns it
> > to queues in net_devmem_bind_dmabuf_to_queue(). As the single buffer
> > is part of the API, I don't see how the mapping could be done in a per
> > queue way.
> >
> 
> Oh, I see. devmem does support mapping a single buffer to multiple queues in a
> single netlink API call, but there is nothing stopping the user from mapping N
> buffers to N queues in N netlink API calls.
> 
> > > > To sum the conversation up, there are 2 imperfect and overlapping
> > > > solutions:
> > > >
> > > > 1) For the common case of having a single PCI device per netdev, going
> one
> > > >    parent up if the parent device is not DMA capable would be a good
> > > >    starting point.
> > > >
> > > > 2) For multi-PF netdev [0], a per-queue get_dma_dev() op would be ideal
> > > >    as it provides the right PF device for the given queue.
> > >
> > > Agreed these are the 2 options.
> > >
> > > > io_uring
> > > >    could use this but devmem can't. Devmem could use 1. but the
> > > >    driver has to detect and block the multi PF case.
> > > >
> > >
> > > Why? AFAICT both io_uring and devmem are in the exact same boat
> > > right now, and your patchset seems to show that? Both use
> > > dev->dev.parent as the mapping device, and AFAIU you want to use
> > > dev->dev.parent.parent or something like that?
> > >
> > Right. My patches show that. But the issue raised by Parav is different:
> > different queues can belong to different DMA devices from different
> > PFs in the case of Multi PF netdev.
> >
> > io_uring can do it because it maps individual buffers to individual
> > queues. So it would be trivial to get the DMA device of each queue
> > through a new queue op.
> >
> 
> Right, devmem doesn't stop you from mapping individual buffers to individual
> queues. It just also supports mapping the same buffer to multiple queues.
> AFAIR, io_uring also supports mapping a single buffer to multiple queues, but I
> could easily be very wrong about that. It's just a vague recollection from
> reviewing the iozcrx.c implementation a while back.
> 
> In your case, I think, if the user is trying to map a single buffer to multiple
> queues, and those queues have different dma-devices, then you have to error
> out. I don't see how to sanely handle that without adding a lot of code. The user
> would have to fall back onto mapping a single buffer to a single queue (or
> multiple queues that share the same dma-device).
> 
> > > Also AFAIU the driver won't need to block the multi PF case, it's
> > > actually core that would need to handle that. For example, if devmem
> > > wants to bind a dmabuf to 4 queues, but queues 0 & 1 use 1 dma
> > > device, but queues 2 & 3 use another dma-device, then core doesn't
> > > know what to do, because it can't map the dmabuf to both devices at
> > > once. The restriction would be at bind time that all the queues
> > > being bound to have the same dma device. Core would need to check
> > > that and return an error if the devices diverge. I imagine all of
> > > this is the same for io_uring, unless I'm missing something.
> > >
> > Agreed. Currently I didn't see an API for Multi PF netdev to expose
> > this information so my thinking defaulted to "let's block it from the
> > driver side".
> >
> 
> Agreed.
> 
> > > > I think we need both. Either that or a netdev op with an optional
> > > > queue parameter. Any thoughts?
> > > >
> > >
> > > At the moment, from your description of the problem, I would lean to
> > > going with Jakub's approach and handling the common case via #1. If
> > > more use cases that require a very custom dma device to be passed we
> > > can always move to #2 later, but FWIW I don't see a reason to come
> > > up with a super future proof complicated solution right now, but I'm
> > > happy to hear disagreements.
> > But we also don't want to start off on the left foot when we know of
> > both issues right now. And I think we can wrap it up nicely in a
> > single function similary to how the current patch does it.
> >
> 
> FWIW I don't have a strong preference. I'm fine with the simple solution for now
> and I'm fine with the slightly more complicated future proof solution.
> 
Looks good to me as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ