[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPJu5sXw6v3DI8w8@devgpu012.nha5.facebook.com>
Date: Fri, 17 Oct 2025 09:29:26 -0700
From: Alex Mastro <amastro@...com>
To: Alex Williamson <alex@...zbot.org>
CC: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>,
Jason Gunthorpe
<jgg@...pe.ca>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 0/3] vfio: handle DMA map/unmap up to the addressable
limit
On Thu, Oct 16, 2025 at 04:01:38PM -0600, Alex Williamson wrote:
> The legacy vfio container represents a single IOMMU context, which is
> typically managed by a single domain. The replay comes into play when
> groups are under IOMMUs with different properties that prevent us from
> re-using the domain. The case that most comes to mind for this is
> Intel platforms with integrated graphics where there's a separate IOMMU
> for the GPU, which iirc has different coherency settings.
Thanks, this context is helpful and makes sense.
> That mechanism for triggering replay requires a specific hardware
> configuration, but we can easily trigger it through code
> instrumentation, ex:
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5167bec14e36..2cb19ddbb524 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2368,7 +2368,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> d->enforce_cache_coherency ==
> domain->enforce_cache_coherency) {
> iommu_detach_group(domain->domain, group->iommu_group);
> - if (!iommu_attach_group(d->domain,
> + if (0 && !iommu_attach_group(d->domain,
> group->iommu_group)) {
> list_add(&group->next, &d->group_list);
> iommu_domain_free(domain->domain);
>
> We might consider whether it's useful for testing purposes to expose a
> mechanism to toggle this. For a unit test, if we create a container,
> add a group, and build up some suspect mappings, if we then add another
> group to the container with the above bypass we should trigger the
> replay.
Thanks for the tip. I did this, and validated via bpftrace-ing iommu_map that
the container's mappings (one of which lies at the end of address space) are
replayed correctly. Without the fix, the loop body
while (iova < dma->iova + dma->size) { ... iommu_map() ... }
would never be entered for the end of address space mapping due to
dma->iova + dma->size == 0
$ sudo bpftrace -e 'kprobe:iommu_map { printf("pid=%d comm=%s domain=%p iova=%p paddr=%p size=%p prot=%p gfp=%p\n", pid, comm, (void*)arg0, (void*)arg1, (void*)arg2, (void*)arg3, (void*)arg4, (void*)arg5); }'
Attached 1 probe
# original mappings
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dac210 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
# replayed mapping
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000000000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0x10000001000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
pid=616477 comm=test_dma_map_un domain=0xff11012805dab610 iova=0xfffffffffffff000 paddr=0x12ecfdd0000 size=0x1000 prot=0x7 gfp=0x400cc0
> In general though the replay shouldn't have a mechanism to trigger
> overflows, we're simply iterating the current set of mappings that have
> already been validated and applying them to a new domain.
Agree. Overflow means that some other invariant has broken, and nonsensical
vfio_dma have infiltrated iommu->dma_list. The combination of iommu->lock
serialization + overflow checks elsewhere should have prevented that.
> In any case, we can all take a second look at the changes there.
Thanks!
Alex
Powered by blists - more mailing lists