lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <CY8PR12MB7195361C14592016B8D2217DDC43A@CY8PR12MB7195.namprd12.prod.outlook.com>
Date: Thu, 3 Jul 2025 11:58:50 +0000
From: Parav Pandit <parav@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>, Dragos Tatulea <dtatulea@...dia.com>
CC: "almasrymina@...gle.com" <almasrymina@...gle.com>,
	"asml.silence@...il.com" <asml.silence@...il.com>, Andrew Lunn
	<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Simon Horman
	<horms@...nel.org>, Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan
	<tariqt@...dia.com>, Cosmin Ratiu <cratiu@...dia.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFC net-next 1/4] net: Allow non parent devices to be used for
 ZC DMA


> From: Jakub Kicinski <kuba@...nel.org>
> Sent: 03 July 2025 02:23 AM
> 
> On Wed, 2 Jul 2025 20:01:48 +0000 Dragos Tatulea wrote:
> > On Wed, Jul 02, 2025 at 11:32:08AM -0700, Jakub Kicinski wrote:
> > > On Wed, 2 Jul 2025 20:24:23 +0300 Dragos Tatulea wrote:
> > > > For zerocopy (io_uring, devmem), there is an assumption that the
> > > > parent device can do DMA. However that is not always the case:
> > > > for example mlx5 SF devices have an auxiliary device as a parent.
> > >
> > > Noob question -- I thought that the point of SFs was that you can
> > > pass them thru to a VM. How do they not have DMA support? Is it
> > > added on demand by the mediated driver or some such?
> > They do have DMA support. Maybe didn't state it properly in the commit
> > message. It is just that the the parent device
> > (sf_netdev->dev.parent.device) is not a DMA device. The grandparent
> > device is a DMA device though (PCI dev of parent PFs). But I wanted to
> > keep it generic. Maybe it doesn't need to be so generic?
> >
> > Regarding SFs and VM passtrhough: my understanding is that SFs are
> > more for passing them to a container.
> 
> Mm. We had macvlan offload for over a decade, there's no need for a fake
> struct device, auxbus and all them layers to delegate a "subdevice" to a
> container in netdev world.

SFs are full PCI devices except having unique PCI BDF as they utilize the parent PCI
Device's BDF (RID).
Presently, SFs are used with and without containers when users need
hw based netdevs.
Some CSPs use them as hot-plug devices from the DPU side too.

Unlike macvlan,
SF netdevs have dedicated hw queues, switchdev representors,
mtu, qdiscs, QoS rate limiters.
vdpa of SFs is prominent use too to offload virtio queues.
And some are using SFs rdma devices too.

SFs are the pre-SIOV_R2 devices and hence reliance of auxiliary bus
and utilizing core driver infrastructure sort of aligns to the kernel core.
If I recollect correctly, the Intel ICE SFs are exactly similar.

> In my head subfunctions are a way of configuring a PCIe PASID ergo they
> _only_ make sense in context of DMA.
SF DMA is on the parent PCI device.

SIOV_R2 will have its own PCI RID which is ratified or getting ratified.
When its done, SF (as SIOV_R2 device) instantiation can be extended
with its own PCI RID. At that point they can be mapped to a VM.

> Maybe someone with closer understanding can chime in. If the kind of
> subfunctions you describe are expected, and there's a generic way of
> recognizing them -- automatically going to parent of parent would indeed be
> cleaner and less error prone, as you suggest.

I am not sure when the parent of parent assumption would fail, but can be
a good start.

If netdev 8 bytes extension to store dma_dev is concern,
probably a netdev IFF_DMA_DEV_PARENT can be elegant to refer parent->parent?
So that there is no guess work in devmem layer.

That said, my understanding of devmem is limited, so I could be mistaken here.

In the long term, the devmem infrastructure likely needs to be
modernized to support queue-level DMA mapping.
This is useful because drivers like mlx5 already support
socket-direct netdev that span across two PCI devices.

Currently, devmem is limited to a single PCI device per netdev.
While the buffer pool could be per device, the actual DMA
mapping might need to be deferred until buffer posting
time to support such multi-device scenarios.

In an offline discussion, Dragos mentioned that io_uring already
operates at the queue level, may be some ideas can be picked up
from io_uring?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ