linux-kernel - Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240327171424.GI8419@ziepe.ca>
Date: Wed, 27 Mar 2024 14:14:24 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Christoph Hellwig <hch@....de>
Cc: Leon Romanovsky <leon@...nel.org>, Robin Murphy <robin.murphy@....com>,
	Marek Szyprowski <m.szyprowski@...sung.com>,
	Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
	Chaitanya Kulkarni <chaitanyak@...dia.com>,
	Jonathan Corbet <corbet@....net>, Jens Axboe <axboe@...nel.dk>,
	Keith Busch <kbusch@...nel.org>, Sagi Grimberg <sagi@...mberg.me>,
	Yishai Hadas <yishaih@...dia.com>,
	Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
	Kevin Tian <kevin.tian@...el.com>,
	Alex Williamson <alex.williamson@...hat.com>,
	Jérôme Glisse <jglisse@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-block@...r.kernel.org, linux-rdma@...r.kernel.org,
	iommu@...ts.linux.dev, linux-nvme@...ts.infradead.org,
	kvm@...r.kernel.org, linux-mm@...ck.org,
	Bart Van Assche <bvanassche@....org>,
	Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
	Amir Goldstein <amir73il@...il.com>,
	"josef@...icpanda.com" <josef@...icpanda.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	"daniel@...earbox.net" <daniel@...earbox.net>,
	Dan Williams <dan.j.williams@...el.com>,
	"jack@...e.com" <jack@...e.com>, Zhu Yanjun <zyjzyj2000@...il.com>
Subject: Re: [RFC RESEND 00/16] Split IOMMU DMA mapping operation to two steps

On Mon, Mar 25, 2024 at 12:22:15AM +0100, Christoph Hellwig wrote:
> On Fri, Mar 22, 2024 at 03:43:30PM -0300, Jason Gunthorpe wrote:
> > If we are going to make caller provided uniformity a requirement, lets
> > imagine a formal memory type idea to help keep this a little
> > abstracted?
> > 
> >  DMA_MEMORY_TYPE_NORMAL
> >  DMA_MEMORY_TYPE_P2P_NOT_ACS
> >  DMA_MEMORY_TYPE_ENCRYPTED
> >  DMA_MEMORY_TYPE_BOUNCE_BUFFER  // ??
> > 
> > Then maybe the driver flow looks like:
> > 
> > 	if (transaction.memory_type == DMA_MEMORY_TYPE_NORMAL && dma_api_has_iommu(dev)) {
> 
> Add a nice helper to make this somewhat readable, but yes.
> 
> > 	} else if (transaction.memory_type == DMA_MEMORY_TYPE_P2P_NOT_ACS) {
> > 		num_hwsgls = transcation.num_sgls;
> > 		for_each_range(transaction, range) {
> > 			hwsgl[i].addr = dma_api_p2p_not_acs_map(range.start_physical, range.length, p2p_memory_provider);
> > 			hwsgl[i].len = range.size;
> > 		}
> > 	} else {
> > 		/* Must be DMA_MEMORY_TYPE_NORMAL, DMA_MEMORY_TYPE_ENCRYPTED, DMA_MEMORY_TYPE_BOUNCE_BUFFER? */
> > 		num_hwsgls = transcation.num_sgls;
> > 		for_each_range(transaction, range) {
> > 			hwsgl[i].addr = dma_api_map_cpu_page(range.start_page, range.length);
> > 			hwsgl[i].len = range.size;
> > 		}
> >
> 
> And these two are really the same except that we call a different map
> helper underneath.  So I think as far as the driver is concerned
> they should be the same, the DMA API just needs to key off the
> memory tap.

Yeah.. If the caller is going to have compute the memory type of the
range then lets pass it to the helper

dma_api_map_memory_type(transaction.memory_type, range.start_page, range.length);

Then we can just hide all the differences under the API without doing
duplicated work.

Function names need some work ...

> > > > So I take it as a requirement that RDMA MUST make single MR's out of a
> > > > hodgepodge of page types. RDMA MRs cannot be split. Multiple MR's are
> > > > not a functional replacement for a single MR.
> > > 
> > > But MRs consolidate multiple dma addresses anyway.
> > 
> > I'm not sure I understand this?
> 
> The RDMA MRs take a a list of PFNish address, (or SGLs with the
> enhanced MRs from Mellanox) and give you back a single rkey/lkey.

Yes, that is the desire.
 
> > To go back to my main thesis - I would like a high performance low
> > level DMA API that is capable enough that it could implement
> > scatterlist dma_map_sg() and thus also implement any future
> > scatterlist_v2, bio, hmm_range_fault or any other thing we come up
> > with on top of it. This is broadly what I thought we agreed to at LSF
> > last year.
> 
> I think the biggest underlying problem of the scatterlist based
> DMA implementation for IOMMUs is that it's trying to handle to much,
> that is magic coalescing even if the segments boundaries don't align
> with the IOMMU page size.  If we can get rid of that misfeature I
> think we'd greatly simply the API and implementation.

Yeah, that stuff is not easy at all and takes extra computation to
figure out. I always assumed it was there for block...

Leon & Chaitanya will make a RFC v2 along these lines, lets see how it
goes.

Thanks,
Jason