lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 1 Mar 2024 16:38:33 +0000
From: Robin Murphy <robin.murphy@....com>
To: Petr Tesařík <petr@...arici.cz>,
 Christoph Hellwig <hch@....de>
Cc: Michael Kelley <mhklinux@...look.com>, Will Deacon <will@...nel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 Petr Tesarik <petr.tesarik1@...wei-partners.com>,
 "kernel-team@...roid.com" <kernel-team@...roid.com>,
 "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 Marek Szyprowski <m.szyprowski@...sung.com>, Dexuan Cui
 <decui@...rosoft.com>, Nicolin Chen <nicolinc@...dia.com>
Subject: Re: [PATCH v5 6/6] swiotlb: Remove pointless stride adjustment for
 allocations >= PAGE_SIZE

On 2024-03-01 3:39 pm, Petr Tesařík wrote:
> On Thu, 29 Feb 2024 16:47:56 +0100
> Christoph Hellwig <hch@....de> wrote:
> 
>> On Thu, Feb 29, 2024 at 03:44:11PM +0000, Michael Kelley wrote:
>>> Any thoughts on how that historical behavior should apply if
>>> the DMA min_align_mask is non-zero, or the alloc_align_mask
>>> parameter to swiotbl_tbl_map_single() is non-zero? As currently
>>> used, alloc_align_mask is page aligned if the IOMMU granule is
>>>> = PAGE_SIZE. But a non-zero min_align_mask could mandate
>>> returning a buffer that is not page aligned. Perhaps do the
>>> historical behavior only if alloc_align_mask and min_align_mask
>>> are both zero?
>>
>> I think the driver setting min_align_mask is a clear indicator
>> that the driver requested a specific alignment and the defaults
>> don't apply.  For swiotbl_tbl_map_single as used by dma-iommu
>> I'd have to tak a closer look at how it is used.
> 
> I'm not sure it helps in this discussion, but let me dive into a bit
> of ancient history to understand how we ended up here.
> 
> IIRC this behaviour was originally motivated by limitations of PC AT
> hardware. Intel 8237 is a 16-bit DMA controller. To make it somehow
> usable with addresses up to 16MB (yeah, the infamous DMA zone), IBM
> added a page register, but it was on a separate chip and it did not
> increment when the 8237 address rolled over back to zero. Effectively,
> the page register selected a 64K-aligned window of 64K buffers.
> Consequently, DMA buffers could not cross a 64K physical boundary.
> 
> Thanks to how the buddy allocator works, the 64K-boundary constraint
> was satisfied by allocation size, and drivers took advantage of it when
> allocating device buffers. IMO software bounce buffers simply followed
> the same logic that worked for buffers allocated by the buddy allocator.
> 
> OTOH min_align_mask was motivated by NVME which prescribes the value of
> a certain number of low bits in the DMA address (for simplicity assumed
> to be identical with the same bits in the physical address).
> 
> The only pre-existing user of alloc_align_mask is x86 IOMMU code, and
> IIUC it is used to guarantee that unaligned transactions do not share
> the IOMMU granule with another device. This whole thing is weird,
> because swiotlb_tbl_map_single() is called like this:
> 
>                  aligned_size = iova_align(iovad, size);
>                  phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
>                                                iova_mask(iovad), dir, attrs);
> 
> Here:
> 
> * alloc_size = iova_align(iovad, size)
> * alloc_align_mask = iova_mask(iovad)
> 
> Now, iova_align() rounds up its argument to a multiple of iova granule
> and iova_mask() is simply "granule - 1". This works, because granule
> size must be a power of 2, and I assume it must also be >= PAGE_SIZE.

Not quite, the granule must *not* be larger than PAGE_SIZE (but can be 
smaller).
> In that case, the alloc_align_mask argument is not even needed if you
> adjust the code to match documentation---the resulting buffer will be
> aligned to a granule boundary by virtue of having a size that is a
> multiple of the granule size.

I think we've conflated two different notions of "allocation" here. The 
page-alignment which Christoph quoted applies for dma_alloc_coherent(), 
which nowadays *should* only be relevant to SWIOTLB code in the 
swiotlb_alloc() path for stealing coherent pages out of restricted DMA 
pools. Historically there was never any official alignment requirement 
for the DMA addresses returned by dma_map_{page,sg}(), until 
min_align_mask was introduced.

> To sum it up:
> 
> 1. min_align_mask is by far the most important constraint. Devices will
>     simply stop working if it is not met.
> 2. Alignment to the smallest PAGE_SIZE order which is greater than or
>     equal to the requested size has been documented, and some drivers
>     may rely on it.

Strictly it stopped being necessary since fafadcd16595 when the 
historical swiotlb_alloc() was removed, but 602d9858f07c implies that 
indeed the assumption of it for streaming mappings had already set in. 
Of course NVMe is now using min_align_mask, but I'm not sure if it's 
worth taking the risk to find out who else should also be.

> 3. alloc_align_mask is a misguided fix for a bug in the above.

No, alloc_align_mask is about describing the explicit requirement rather 
than relying on any implicit behaviour, and thus being able to do the 
optimal thing for, say, a 9KB mapping given a 4KB IOVA granule and 64KB 
PAGE_SIZE.

Thanks,
Robin.

> 
> Correct me if anything of the above is wrong.
> 
> HTH
> Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ