[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b9492bc-b033-46c3-acf2-6fca3d19148b@linux.intel.com>
Date: Thu, 28 Aug 2025 14:33:29 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: David Woodhouse <dwmw2@...radead.org>, Eugene Koira
<eugkoira@...zon.com>, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Cc: joro@...tes.org, will@...nel.org, robin.murphy@....com,
longpeng2@...wei.com, graf@...zon.de, nsaenz@...zon.com,
nh-open-source@...zon.com, stable@...r.kernel.org
Subject: Re: [PATCH] iommu/intel: Fix __domain_mapping()'s usage of
switch_to_super_page()
On 8/26/25 23:48, David Woodhouse wrote:
> On Tue, 2025-08-26 at 14:38 +0000, Eugene Koira wrote:
>> switch_to_super_page() assumes the memory range it's working on is aligned
>> to the target large page level. Unfortunately, __domain_mapping() doesn't
>> take this into account when using it, and will pass unaligned ranges
>> ultimately freeing a PTE range larger than expected.
>>
>> Take for example a mapping with the following iov_pfn range [0x3fe400,
>> 0x4c0600], which should be backed by the following mappings:
>
> The range is [0x3fe400, 0x4c0600) ?
>
>> iov_pfn [0x3fe400, 0x3fffff] covered by 2MiB pages
>> iov_pfn [0x400000, 0x4bffff] covered by 1GiB pages
>> iov_pfn [0x4c0000, 0x4c05ff] covered by 2MiB pages
>>
>> Under this circumstance, __domain_mapping() will pass [0x400000, 0x4c05ff]
>> to switch_to_super_page() at a 1 GiB granularity, which will in turn
>> free PTEs all the way to iov_pfn 0x4fffff.
>>
>> Mitigate this by rounding down the iov_pfn range passed to
>> switch_to_super_page() in __domain_mapping()
>> to the target large page level.
>>
>> Additionally add range alignment checks to switch_to_super_page.
>>
>> Fixes: 9906b9352a35 ("iommu/vt-d: Avoid duplicate removing in __domain_mapping()")
>> Signed-off-by: Eugene Koira <eugkoira@...zon.com>
>> Cc: stable@...r.kernel.org
>> ---
>> drivers/iommu/intel/iommu.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 9c3ab9d9f69a..dff2d895b8ab 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1575,6 +1575,10 @@ static void switch_to_super_page(struct dmar_domain *domain,
>> unsigned long lvl_pages = lvl_to_nr_pages(level);
>> struct dma_pte *pte = NULL;
>>
>> + if (WARN_ON(!IS_ALIGNED(start_pfn, lvl_pages) ||
>> + !IS_ALIGNED(end_pfn + 1, lvl_pages)))
>> + return;
>> +
>> while (start_pfn <= end_pfn) {
>> if (!pte)
>> pte = pfn_to_dma_pte(domain, start_pfn, &level,
>> @@ -1650,7 +1654,8 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
>> unsigned long pages_to_remove;
>>
>> pteval |= DMA_PTE_LARGE_PAGE;
>> - pages_to_remove = min_t(unsigned long, nr_pages,
>> + pages_to_remove = min_t(unsigned long,
>> + round_down(nr_pages, lvl_pages),
>> nr_pte_to_next_page(pte) * lvl_pages);
>> end_pfn = iov_pfn + pages_to_remove - 1;
>> switch_to_super_page(domain, iov_pfn, end_pfn, largepage_lvl);
>
> I'm mildly entertained by the fact that the *only* comment in this
> block of code is a lie. Would you care to address that while you're
> here? Maybe the comment could look something like...
>
> /* If the new mapping is eligible for a large page, then
> * remove all smaller pages that the existing pte at this
> * level references.
> * XX: do we even need to bother calling switch_to_super_page()
> * if this PTE wasn't *present* before?
> */
>
> I bet it would benefit from one or two other one-line comments to make
> it clearer what's going on, too...
>
> But in general, I think this looks sane even though this code makes my
> brain hurt. Could do with a test case, in an ideal world. Maybe we can
> work on that as part of the generic pagetable support which is coming?
Agreed. The generic pagetable work will replace this code, so this will
be removed. Therefore, we need a fix patch that can be backported before
the generic pagetable for vt-d lands.
Thanks,
baolu
Powered by blists - more mailing lists