[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240223143613.1878beb6@meshulam.tesarici.cz>
Date: Fri, 23 Feb 2024 14:36:13 +0100
From: Petr Tesařík <petr@...arici.cz>
To: Will Deacon <will@...nel.org>
Cc: Michael Kelley <mhklinux@...look.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "kernel-team@...roid.com"
<kernel-team@...roid.com>, "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
Christoph Hellwig <hch@....de>, Marek Szyprowski
<m.szyprowski@...sung.com>, Robin Murphy <robin.murphy@....com>, Petr
Tesarik <petr.tesarik1@...wei-partners.com>, Dexuan Cui
<decui@...rosoft.com>, Nicolin Chen <nicolinc@...dia.com>
Subject: Re: [PATCH v4 1/5] swiotlb: Fix double-allocation of slots due to
broken alignment handling
On Fri, 23 Feb 2024 12:47:43 +0000
Will Deacon <will@...nel.org> wrote:
> On Wed, Feb 21, 2024 at 11:35:44PM +0000, Michael Kelley wrote:
> > From: Will Deacon <will@...nel.org> Sent: Wednesday, February 21, 2024 3:35 AM
> > >
> > > Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"),
> > > which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment
> > > checks"), causes a functional regression with vsock in a virtual machine
> > > using bouncing via a restricted DMA SWIOTLB pool.
> > >
> > > When virtio allocates the virtqueues for the vsock device using
> > > dma_alloc_coherent(), the SWIOTLB search can return page-unaligned
> > > allocations if 'area->index' was left unaligned by a previous allocation
> > > from the buffer:
> > >
> > > # Final address in brackets is the SWIOTLB address returned to the caller
> > > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask
> > > 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800)
> > > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask
> > > 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800)
> > > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask
> > > 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800)
> > >
> > > This ends badly (typically buffer corruption and/or a hang) because
> > > swiotlb_alloc() is expecting a page-aligned allocation and so blindly
> > > returns a pointer to the 'struct page' corresponding to the allocation,
> > > therefore double-allocating the first half (2KiB slot) of the 4KiB page.
> > >
> > > Fix the problem by treating the allocation alignment separately to any
> > > additional alignment requirements from the device, using the maximum
> > > of the two as the stride to search the buffer slots and taking care
> > > to ensure a minimum of page-alignment for buffers larger than a page.
> >
> > Could you also add some text that this patch fixes the scenario I
> > described in the other email thread? Something like:
> >
> > The changes to page alignment handling also fix a problem when
> > the alloc_align_mask is zero. The page alignment handling added
> > in the two mentioned commits could force alignment to more bits
> > in orig_addr than specified by the device's DMA min_align_mask,
> > resulting in a larger offset. Since swiotlb_max_mapping_size()
> > is based only on the DMA min_align_mask, that larger offset
> > plus the requested size could exceed IO_TLB_SEGSIZE slots, and
> > the mapping could fail when it shouldn't.
>
> Thanks, Michael. I can add that in.
>
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index b079a9a8e087..2ec2cc81f1a2 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> > > unsigned long max_slots = get_max_slots(boundary_mask);
> > > unsigned int iotlb_align_mask =
> > > - dma_get_min_align_mask(dev) | alloc_align_mask;
> > > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > > unsigned int nslots = nr_slots(alloc_size), stride;
> > > unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > > unsigned int index, slots_checked, count = 0, i;
> > > @@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > BUG_ON(!nslots);
> > > BUG_ON(area_index >= pool->nareas);
> > >
> > > + /*
> > > + * For mappings with an alignment requirement don't bother looping to
> > > + * unaligned slots once we found an aligned one.
> > > + */
> > > + stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask));
> > > +
> > > /*
> > > * For allocations of PAGE_SIZE or larger only look for page aligned
> > > * allocations.
> > > */
> > > if (alloc_size >= PAGE_SIZE)
> > > - iotlb_align_mask |= ~PAGE_MASK;
> > > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
> > > -
> > > - /*
> > > - * For mappings with an alignment requirement don't bother looping to
> > > - * unaligned slots once we found an aligned one.
> > > - */
> > > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> > > + stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> >
> > Is this special handling of alloc_size >= PAGE_SIZE really needed?
>
> I've been wondering that as well, but please note that this code (and the
> comment) are in the upstream code, so I was erring in favour of keeping
> that while fixing the bugs. We could have an extra patch dropping it if
> we can convince ourselves that it's not adding anything, though.
>
> > I think the comment is somewhat inaccurate. If orig_addr is non-zero, and
> > alloc_align_mask is zero, the requirement is for the alignment to match
> > the DMA min_align_mask bits in orig_addr, even if the allocation is
> > larger than a page. And with Patch 3 of this series, the swiotlb_alloc()
> > case passes in alloc_align_mask to handle page size and larger requests.
> > So it seems like this doesn't do anything useful unless orig_addr and
> > alloc_align_mask are both zero, and there aren't any cases of that
> > after this patch series. If the caller wants alignment, specify
> > it with alloc_align_mask.
>
> It's an interesting observation. Presumably the intention here is to
> reduce the cost of the linear search, but the code originates from a
> time when we didn't have iotlb_align_mask or alloc_align_mask and so I
> tend to agree that it should probably just be dropped. I'm also not even
> convinced that it works properly if the initial search index ends up
> being 2KiB (i.e. slot) aligned -- we'll end up jumping over the
> page-aligned addresses!
Originally, SWIOTLB was not used for allocations, so orig_addr was
never zero. The assumption was that if the bounce buffer should be
page-aligned, then the original buffer was also page-aligned, and the
check against iotlb_align_mask was sufficient.
> I'll add another patch to v5 which removes this check (and you've basically
> written the commit message for me, so thanks).
>
> > > spin_lock_irqsave(&area->lock, flags);
> > > if (unlikely(nslots > pool->area_nslabs - area->used))
> > > @@ -1015,11 +1014,14 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> > > index = area->index;
> > >
> > > for (slots_checked = 0; slots_checked < pool->area_nslabs; ) {
> > > - slot_index = slot_base + index;
> > > + phys_addr_t tlb_addr;
> > >
> > > - if (orig_addr &&
> > > - (slot_addr(tbl_dma_addr, slot_index) &
> > > - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
> > > + slot_index = slot_base + index;
> > > + tlb_addr = slot_addr(tbl_dma_addr, slot_index);
> > > +
> > > + if ((tlb_addr & alloc_align_mask) ||
> > > + (orig_addr && (tlb_addr & iotlb_align_mask) !=
> > > + (orig_addr & iotlb_align_mask))) {
> >
> > It looks like these changes will cause a mapping failure in some
> > iommu_dma_map_page() cases that previously didn't fail.
>
> Hmm, it's really hard to tell. This code has been quite badly broken for
> some time, so I'm not sure how far back you have to go to find a kernel
> that would work properly (e.g. for Nicolin's case with 64KiB pages).
I believe it fails exactly in the cases that previously found an
incorrectly aligned bounce buffer.
In any case, the "middle" bits (low bits but ignoring offset inside a
slot) of tlb_addr should indeed correspond to the middle bits of
orig_addr.
>
> > Everything is made right by Patch 4 of your series, but from a
> > bisect standpoint, there will be a gap where things are worse.
> > In [1], I think Nicolin reported a crash with just this patch applied.
>
> In Nicolin's case, I think it didn't work without the patch either, this
> just triggered the failure earlier.
>
> > While the iommu_dma_map_page() case can already fail due to
> > "too large" requests because of not setting a max mapping size,
> > this patch can cause smaller requests to fail as well until Patch 4
> > gets applied. That might be problem to avoid, perhaps by
> > merging the Patch 4 changes into this patch.
>
> I'll leave this up to Christoph. Personally, I'm keen to avoid having
> a giant patch trying to fix all the SWIOTLB allocation issues in one go,
> as it will inevitably get reverted due to a corner case that we weren't
> able to test properly, breaking the common cases at the same time.
I tend to think that more patches are better, even though this patch
alone does introduce some regressions.
Petr T
Powered by blists - more mailing lists