[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4fbcdd49-cd93-4af8-83e2-f1cef0ea2eeb@arm.com>
Date: Fri, 1 Mar 2024 16:38:33 +0000
From: Robin Murphy <robin.murphy@....com>
To: Petr Tesařík <petr@...arici.cz>,
Christoph Hellwig <hch@....de>
Cc: Michael Kelley <mhklinux@...look.com>, Will Deacon <will@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Petr Tesarik <petr.tesarik1@...wei-partners.com>,
"kernel-team@...roid.com" <kernel-team@...roid.com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
Marek Szyprowski <m.szyprowski@...sung.com>, Dexuan Cui
<decui@...rosoft.com>, Nicolin Chen <nicolinc@...dia.com>
Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for
allocations >= PAGE_SIZE
On 2024-03-01 3:39 pm, Petr Tesařík wrote:
> On Thu, 29 Feb 2024 16:47:56 +0100
> Christoph Hellwig <hch@....de> wrote:
>
>> On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote:
>>> Any thoughts on how that historical behavior should apply if
>>> the DMA min_align_mask is non-zero, or the alloc_align_mask
>>> parameter to swiotbl_tbl_map_single() is non-zero? As currently
>>> used, alloc_align_mask is page aligned if the IOMMU granule is
>>>> = PAGE_SIZE. But a non-zero min_align_mask could mandate
>>> returning a buffer that is not page aligned. Perhaps do the
>>> historical behavior only if alloc_align_mask and min_align_mask
>>> are both zero?
>>
>> I think the driver setting min_align_mask is a clear indicator
>> that the driver requested a specific alignment and the defaults
>> don't apply. For swiotbl_tbl_map_single as used by dma-iommu
>> I'd have to tak a closer look at how it is used.
>
> I'm not sure it helps in this discussion, but let me dive into a bit
> of ancient history to understand how we ended up here.
>
> IIRC this behaviour was originally motivated by limitations of PC AT
> hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow
> usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM
> added a page register, but it was on a separate chip and it did not
> increment when the 8237 address rolled over back to zero. Effectively,
> the page register selected a 64K-aligned window of 64K buffers.
> Consequently, DMA buffers could not cross a 64K physical boundary.
>
> Thanks to how the buddy allocator works, the 64K-boundary constraint
> was satisfied by allocation size, and drivers took advantage of it when
> allocating device buffers. IMO software bounce buffers simply followed
> the same logic that worked for buffers allocated by the buddy allocator.
>
> OTOH min_align_mask was motivated by NVME which prescribes the value of
> a certain number of low bits in the DMA address (for simplicity assumed
> to be identical with the same bits in the physical address).
>
> The only pre-existing user of alloc_align_mask is x86 IOMMU code, and
> IIUC it is used to guarantee that unaligned transactions do not share
> the IOMMU granule with another device. This whole thing is weird,
> because swiotlb_tbl_map_single() is called like this:
>
> aligned_size = iova_align(iovad, size);
> phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
> iova_mask(iovad), dir, attrs);
>
> Here:
>
> * alloc_size = iova_align(iovad, size)
> * alloc_align_mask = iova_mask(iovad)
>
> Now, iova_align() rounds up its argument to a multiple of iova granule
> and iova_mask() is simply "granule - 1". This works, because granule
> size must be a power of 2, and I assume it must also be >= PAGE_SIZE.
Not quite, the granule must *not* be larger than PAGE_SIZE (but can be
smaller).
> In that case, the alloc_align_mask argument is not even needed if you
> adjust the code to match documentation---the resulting buffer will be
> aligned to a granule boundary by virtue of having a size that is a
> multiple of the granule size.
I think we've conflated two different notions of "allocation" here. The
page-alignment which Christoph quoted applies for dma_alloc_coherent(),
which nowadays *should* only be relevant to SWIOTLB code in the
swiotlb_alloc() path for stealing coherent pages out of restricted DMA
pools. Historically there was never any official alignment requirement
for the DMA addresses returned by dma_map_{page,sg}(), until
min_align_mask was introduced.
> To sum it up:
>
> 1. min_align_mask is by far the most important constraint. Devices will
> simply stop working if it is not met.
> 2. Alignment to the smallest PAGE_SIZE order which is greater than or
> equal to the requested size has been documented, and some drivers
> may rely on it.
Strictly it stopped being necessary since fafadcd16595 when the
historical swiotlb_alloc() was removed, but 602d9858f07c implies that
indeed the assumption of it for streaming mappings had already set in.
Of course NVMe is now using min_align_mask, but I'm not sure if it's
worth taking the risk to find out who else should also be.
> 3. alloc_align_mask is a misguided fix for a bug in the above.
No, alloc_align_mask is about describing the explicit requirement rather
than relying on any implicit behaviour, and thus being able to do the
optimal thing for, say, a 9KB mapping given a 4KB IOVA granule and 64KB
PAGE_SIZE.
Thanks,
Robin.
>
> Correct me if anything of the above is wrong.
>
> HTH
> Petr T
Powered by blists - more mailing lists