lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 Mar 2024 15:44:16 +0100
From: Christoph Hellwig <hch@....de>
To: Leon Romanovsky <leon@...nel.org>
Cc: Robin Murphy <robin.murphy@....com>, Christoph Hellwig <hch@....de>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
	Jason Gunthorpe <jgg@...pe.ca>,
	Chaitanya Kulkarni <chaitanyak@...dia.com>,
	Jonathan Corbet <corbet@....net>, Jens Axboe <axboe@...nel.dk>,
	Keith Busch <kbusch@...nel.org>, Sagi Grimberg <sagi@...mberg.me>,
	Yishai Hadas <yishaih@...dia.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
	Kevin Tian <kevin.tian@...el.com>,
	Alex Williamson <alex.williamson@...hat.com>,
	Jérôme Glisse <jglisse@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-block@...r.kernel.org, linux-rdma@...r.kernel.org,
	iommu@...ts.linux.dev, linux-nvme@...ts.infradead.org,
	kvm@...r.kernel.org, linux-mm@...ck.org,
	Bart Van Assche <bvanassche@....org>,
	Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
	Amir Goldstein <amir73il@...il.com>,
	"josef@...icpanda.com" <josef@...icpanda.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	"daniel@...earbox.net" <daniel@...earbox.net>,
	Dan Williams <dan.j.williams@...el.com>,
	"jack@...e.com" <jack@...e.com>, Zhu Yanjun <zyjzyj2000@...il.com>
Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two
 steps

On Tue, Mar 05, 2024 at 02:29:35PM +0200, Leon Romanovsky wrote:
> > > These advanced DMA mapping APIs are needed to calculate IOVA size to
> > > allocate it as one chunk and some sort of offset calculations to know
> > > which part of IOVA to map.
> > 
> > I don't follow this part at all - at *some* point, something must know a
> > range of memory addresses involved in a DMA transfer, so that's where it
> > should map that range for DMA. 
> 
> In all presented cases in this series, the overall DMA size is known in
> advance. In RDMA case, it is known when user registers the memory, in
> VFIO, when live migration is happening and in NVMe, when BIO is created.
> 
> So once we allocated IOVA, we will need to link ranges, which si the
> same as map but without IOVA allocation.

But above you say:

"These advanced DMA mapping APIs are needed to calculate IOVA size to
allocate it as one chunk and some sort of offset calculations to know
which part of IOVA to map."

this suggests you need helpers to calculate the len and offset.  I
can't see where that would ever make sense.  The total transfer
size should just be passed in by the callers and be known, and
there should be no offset.

> > > Instead of teaching DMA to know these specific datatypes, let's separate
> > > existing DMA mapping routine to two steps and give an option to advanced
> > > callers (subsystems) perform all calculations internally in advance and
> > > map pages later when it is needed.
> > 
> > From a brief look, this is clearly an awkward reinvention of the IOMMU API.
> > If IOMMU-aware drivers/subsystems want to explicitly manage IOMMU address
> > spaces then they can and should use the IOMMU API. Perhaps there's room for
> > some quality-of-life additions to the IOMMU API to help with common usage
> > patterns, but the generic DMA mapping API is absolutely not the place for
> > it.
> 
> DMA mapping gives nice abstraction from IOMMU, and allows us to have
> same flow for IOMMU and non-IOMMU flows without duplicating code, while
> you suggest to teach almost every part in the kernel to know about IOMMU.

Except that the flows are fundamentally different for the "can coalesce"
vs "can't coalesce" case.  In the former we have one dma_addr_t range,
and in the latter as many as there are input vectors (this is ignoring
the weird iommu merging case where we we coalesce some but not all
segments, but I'd rather not have that in a new API).

So if we want to efficiently be able to handle these cases we need
two APIs in the driver and a good framework to switch between them.
Robins makes a point here that the iommu API handles the can coalesce
case and he has a point as that's exactly how the IOMMU API works.
I'd still prefer to wrap it with dma callers to handle things like
swiotlb and maybe Xen grant tables and to avoid the type confusion
between dma_addr_t and then untyped iova in the iommu layer, but
having this layer or not is probably worth a discussion.

> 
> In this series, we changed RDMA, VFIO and NVMe, and in all cases we
> removed more code than added. From what I saw, VDPA and virito-blk will
> benefit from proposed API too.
> 
> Even in this RFC, where Chaitanya did partial job and didn't convert
> whole driver, the gain is pretty obvious:
> https://lore.kernel.org/linux-rdma/016fc02cbfa9be3c156a6f74df38def1e09c08f1.1709635535.git.leon@kernel.org/T/#u
> 

I have no idea how that nvme patch is even supposed to work.  It removes
the PRP path in nvme-pci, which not only is the most common I/O path
but actually required for the admin queue as NVMe doesn't support
SGLs for the admin queue.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ