linux-kernel - Re: [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251016160138.374c8cfb@shazbot.org>
Date: Thu, 16 Oct 2025 16:01:38 -0600
From: Alex Williamson <alex@...zbot.org>
To: Alex Mastro <amastro@...com>
Cc: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, Jason Gunthorpe
 <jgg@...pe.ca>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable
 limit

On Thu, 16 Oct 2025 14:19:53 -0700
Alex Mastro <amastro@...com> wrote:

> On Wed, Oct 15, 2025 at 05:25:14PM -0400, Alejandro Jimenez wrote:
> > 
> > 
> > On 10/15/25 3:24 PM, Alex Williamson wrote:  
> > > On Sun, 12 Oct 2025 22:32:23 -0700
> > > Alex Mastro <amastro@...com> wrote:
> > >   
> > > > This patch series aims to fix vfio_iommu_type.c to support
> > > > VFIO_IOMMU_MAP_DMA and VFIO_IOMMU_UNMAP_DMA operations targeting IOVA
> > > > ranges which lie against the addressable limit. i.e. ranges where
> > > > iova_start + iova_size would overflow to exactly zero.  
> > > 
> > > The series looks good to me and passes my testing.  Any further reviews
> > > from anyone?  I think we should make this v6.18-rc material.  Thanks,
> > >   
> > 
> > I haven't had a chance yet to closely review the latest patchset versions,
> > but I did test this v4 and confirmed that it solves the issue of not being
> > able to unmap an IOVA range extending up to the address space boundary. I
> > verified both with the simplified test case at:
> > https://gist.github.com/aljimenezb/f3338c9c2eda9b0a7bf5f76b40354db8
> > 
> > plus using QEMU's amd-iommu and a guest with iommu.passthrough=0
> > iommu.forcedac=1 (which is how I first found the problem).
> > 
> > So Alex Mastro, please feel free to add:
> > 
> > Tested-by: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
> > 
> > for the series. I'll try to find time to review the patches in detail.
> > 
> > Thank you,
> > Alejandro
> >   
> > > Alex  
> >   
> 
> Thanks all. I would like additional scrutiny around vfio_iommu_replay. It was
> the one block of affected code I have not been able to test, since I don't have
> / am not sure how to simulate a setup which can cause mappings to be replayed
> on a newly added IOMMU domain. My confidence in that code is from close review
> only.
> 
> I explicitly tested various combinations of the following with mappings up to
> the addressable limit:
> - VFIO_IOMMU_MAP_DMA
> - VFIO_IOMMU_UNMAP_DMA range-based, and VFIO_DMA_UNMAP_FLAG_ALL
> - VFIO_IOMMU_DIRTY_PAGES with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP
> 
> My understanding is that my changes to vfio_iommu_replay would be traversed
> when binding a group to a container with existing mappings where the group's
> IOMMU domain is not already one of the domains in the container.

I really appreciate you making note of it if you're uneasy about the
change.  I'll take a closer look there and hopefully others can as
well.

The legacy vfio container represents a single IOMMU context, which is
typically managed by a single domain.  The replay comes into play when
groups are under IOMMUs with different properties that prevent us from
re-using the domain.  The case that most comes to mind for this is
Intel platforms with integrated graphics where there's a separate IOMMU
for the GPU, which iirc has different coherency settings.

That mechanism for triggering replay requires a specific hardware
configuration, but we can easily trigger it through code
instrumentation, ex:

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5167bec14e36..2cb19ddbb524 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2368,7 +2368,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
                    d->enforce_cache_coherency ==
                            domain->enforce_cache_coherency) {
                        iommu_detach_group(domain->domain, group->iommu_group);
-                       if (!iommu_attach_group(d->domain,
+                       if (0 && !iommu_attach_group(d->domain,
                                                group->iommu_group)) {
                                list_add(&group->next, &d->group_list);
                                iommu_domain_free(domain->domain);

We might consider whether it's useful for testing purposes to expose a
mechanism to toggle this.  For a unit test, if we create a container,
add a group, and build up some suspect mappings, if we then add another
group to the container with the above bypass we should trigger the
replay.

In general though the replay shouldn't have a mechanism to trigger
overflows, we're simply iterating the current set of mappings that have
already been validated and applying them to a new domain.

In any case, we can all take a second look at the changes there.
Thanks,

Alex