[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ea54d65-7ccc-479a-8912-bccd79d678d4@linux.intel.com>
Date: Fri, 18 Jul 2025 10:56:24 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Joerg Roedel <joro@...tes.org>, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 02/11] iommu/vt-d: Optimize iotlb_sync_map for
non-caching/non-RWBF modes
On 7/17/25 19:55, Jason Gunthorpe wrote:
> On Thu, Jul 17, 2025 at 10:40:01AM +0800, Baolu Lu wrote:
>> On 7/16/25 22:12, Jason Gunthorpe wrote:
>>> On Mon, Jul 14, 2025 at 12:50:19PM +0800, Lu Baolu wrote:
>>>> @@ -1833,6 +1845,8 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
>>>> if (ret)
>>>> goto out_block_translation;
>>>> + domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
>>>
>>> This has no locking and is in the wrong order anyhow :(
>>>
>>> Any change to how invalidation works has to be done before attaching
>>> the HW so that the required invalidations are already happening before
>>> the HW can walk the page table.
>>>
>>> And you need to serialize somehow with concurrent map/unmap as iommufd
>>> doesn't prevent userspace from racing attach with map/unmap.
>>
>> domain->iotlb_sync_map does not change the driver's behavior. It simply
>> indicates that there's no need to waste time calling
>> cache_tag_flush_range_np(), as it's just a no-op.
>
> Of course it changes the behavior, it changes what the invalidation
> callback does.
>
> Without locking you have a race situation where a PGD is visible to HW
> that requires extra flushing and the SW is not doing the extra
> flushing.
>
> Before any PGD is made visible to the HW the software must ensure all
> the required invalidations are happening.
Oh, I understand now. If there is no synchronization between attach/
detach and map/unmap operations, the cache invalidation behavior must be
determined when a domain is allocated.
>
>> I previously discussed this with Kevin, and we agreed on a phase-by-
>> phase approach. As I mentioned, domain->iotlb_sync_map is merely a hint
>> for the driver, preventing it from looping through all cache tags to
>> determine if any cache invalidation work needs to be performed. We
>> already know it's predetermined that no work needs to be done.
>
> The iteration though the cache tags is done inside a lock so it
> doesn't have this race (it has the issue I mentioned setting up the
> cache tage list though).
>
>> RWBF is only required on some early implementations where memory
>> coherence was not yet implemented by the VT-d engine. It should be
>> difficult to find such systems in modern environments.
>
> Then I would set it at domain creation time, check it during attach,
> and remove this race.
How about the following changes (compiled but not tested)?
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8db8be9b7e7d..bb00dc14275d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1780,18 +1780,6 @@ static int domain_setup_first_level(struct
intel_iommu *iommu,
__pa(pgd), flags, old);
}
-static bool domain_need_iotlb_sync_map(struct dmar_domain *domain,
- struct intel_iommu *iommu)
-{
- if (cap_caching_mode(iommu->cap) && intel_domain_is_ss_paging(domain))
- return true;
-
- if (rwbf_quirk || cap_rwbf(iommu->cap))
- return true;
-
- return false;
-}
-
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -1831,8 +1819,6 @@ static int dmar_domain_attach_device(struct
dmar_domain *domain,
if (ret)
goto out_block_translation;
- domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
-
return 0;
out_block_translation:
@@ -3352,6 +3338,14 @@ intel_iommu_domain_alloc_first_stage(struct
device *dev,
return ERR_CAST(dmar_domain);
dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
+ /*
+ * iotlb sync for map is only needed for legacy implementations that
+ * explicitly require flushing internal write buffers to ensure memory
+ * coherence.
+ */
+ if (rwbf_quirk || cap_rwbf(iommu->cap))
+ dmar_domain->iotlb_sync_map = true;
+
return &dmar_domain->domain;
}
@@ -3386,6 +3380,14 @@ intel_iommu_domain_alloc_second_stage(struct
device *dev,
if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
dmar_domain->domain.dirty_ops = &intel_dirty_ops;
+ /*
+ * Besides the internal write buffer flush, the caching mode used for
+ * legacy nested translation (which utilizes shadowing page tables)
+ * also requires iotlb sync on map.
+ */
+ if (rwbf_quirk || cap_rwbf(iommu->cap) || cap_caching_mode(iommu->cap))
+ dmar_domain->iotlb_sync_map = true;
+
return &dmar_domain->domain;
}
@@ -3446,6 +3448,11 @@ static int
paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
if (!cap_fl1gp_support(iommu->cap) &&
(dmar_domain->domain.pgsize_bitmap & SZ_1G))
return -EINVAL;
+
+ /* iotlb sync on map requirement */
+ if ((rwbf_quirk || cap_rwbf(iommu->cap)) && !dmar_domain->iotlb_sync_map)
+ return -EINVAL;
+
return 0;
}
@@ -3469,6 +3476,12 @@ paging_domain_compatible_second_stage(struct
dmar_domain *dmar_domain,
return -EINVAL;
if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
return -EINVAL;
+
+ /* iotlb sync on map requirement */
+ if ((rwbf_quirk || cap_rwbf(iommu->cap) ||
cap_caching_mode(iommu->cap)) &&
+ !dmar_domain->iotlb_sync_map)
+ return -EINVAL;
+
return 0;
}
--
2.43.0
Thanks,
baolu
Powered by blists - more mailing lists