lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52760702D919B524849F08F28C34A@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 15 Aug 2025 09:46:47 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>, Jason Gunthorpe <jgg@...dia.com>
CC: "Hansen, Dave" <dave.hansen@...el.com>, Joerg Roedel <joro@...tes.org>,
	Will Deacon <will@...nel.org>, Robin Murphy <robin.murphy@....com>, Jann Horn
	<jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>, Alistair Popple
	<apopple@...dia.com>, Peter Zijlstra <peterz@...radead.org>, Uladzislau Rezki
	<urezki@...il.com>, Jean-Philippe Brucker <jean-philippe@...aro.org>, "Andy
 Lutomirski" <luto@...nel.org>, "Lai, Yi1" <yi1.lai@...el.com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, "security@...nel.org"
	<security@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>
Subject: RE: [PATCH v3 1/1] iommu/sva: Invalidate KVA range on kernel TLB
 flush

> From: Baolu Lu <baolu.lu@...ux.intel.com>
> Sent: Friday, August 15, 2025 5:17 PM
> 
> On 8/8/2025 10:57 AM, Tian, Kevin wrote:
> >> From: Jason Gunthorpe <jgg@...dia.com>
> >> Sent: Friday, August 8, 2025 3:52 AM
> >>
> >> On Thu, Aug 07, 2025 at 10:40:39PM +0800, Baolu Lu wrote:
> >>> +static void kernel_pte_work_func(struct work_struct *work)
> >>> +{
> >>> +	struct page *page, *next;
> >>> +
> >>> +	iommu_sva_invalidate_kva_range(0, TLB_FLUSH_ALL);
> >>> +
> >>> +	guard(spinlock)(&kernel_pte_work.lock);
> >>> +	list_for_each_entry_safe(page, next, &kernel_pte_work.list, lru) {
> >>> +		list_del_init(&page->lru);
> >>
> >> Please don't add new usages of lru, we are trying to get rid of this. :(
> >>
> >> I think the memory should be struct ptdesc, use that..
> >>
> >
> > btw with this change we should also defer free of the pmd page:
> >
> > pud_free_pmd_page()
> > 	...
> > 	for (i = 0; i < PTRS_PER_PMD; i++) {
> > 		if (!pmd_none(pmd_sv[i])) {
> > 			pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
> > 			pte_free_kernel(&init_mm, pte);
> > 		}
> > 	}
> >
> > 	free_page((unsigned long)pmd_sv);
> >
> > Otherwise the risk still exists if the pmd page is repurposed before the
> > pte work is scheduled.
> 
> You're right that freeing high-level page table pages also requires an
> IOTLB flush before the pages are freed. But I question the practical
> risk of the race given the extremely small time window. If this is a

It's already extremely difficult to conduct a real attack even w/o this
fix. I'm not sure the criteria how small we consider acceptable in this
specific case. but leaving an incomplete fix in code doesn't sound clean...

> real concern, a potential mitigation would be to clear the U/S bits in
> all page table entries for kernel address space? But I am not confident
> in making that change at this time as I am unsure of the side effects it
> might cause.

I think there was already consensus that clearing U/S bits in all entries
doesn't prevent the IOMMU caching them and setting A/D bits on
the freed pagetable.

> 
> >
> > another observation - pte_free_kernel is not used in remove_pagetable ()
> > and __change_page_attr(). Is it straightforward to put it in those paths
> > or do we need duplicate some deferring logic there?
> 
> The remove_pagetable() function is called in the path where memory is
> hot-removed from the system, right? If so, there should be no issue, as
> the threat model here is a page table page being freed and repurposed
> while it's still cached in the IOTLB. In the hot-remove case, the memory
> is removed and will not be reused, so that's fine as far as I can see.

what about the page is hot-added back while the stale entry pointing to
it is still valid in the IOMMU, theoretically? 😊

> 
> The same to __change_page_attr(), which only changes the attributes of a
> page table entry while the underlying page remains in use.
> 

it may lead to cpa_collapse_large_pages() if changing attribute leads to
all adjacent 4k pages in 2M range are with same attribute. Then page
table might be freed:

cpa_collapse_large_pages():
        list_for_each_entry_safe(ptdesc, tmp, &pgtables, pt_list) {
                list_del(&ptdesc->pt_list);
                __free_page(ptdesc_page(ptdesc));
        }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ