linux-kernel - Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZcOQ-0pzA16AEbct@casper.infradead.org>
Date: Wed, 7 Feb 2024 14:17:31 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Will Deacon <will@...nel.org>
Cc: Nanyong Sun <sunnanyong@...wei.com>,
	Catalin Marinas <catalin.marinas@....com>, muchun.song@...ux.dev,
	akpm@...ux-foundation.org, anshuman.khandual@....com,
	wangkefeng.wang@...wei.com, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

On Wed, Feb 07, 2024 at 12:11:25PM +0000, Will Deacon wrote:
> On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote:
> > The pte lock cannot be taken in irq context (which I think is what
> > you're asking?)  While it is not possible to reason about all users of
> > struct page, we are somewhat relieved of that work by noting that this is
> > only for hugetlbfs, so we don't need to reason about slab, page tables,
> > netmem or zsmalloc.
> 
> My concern is that an interrupt handler tries to access a 'struct page'
> which faults due to another core splitting a pmd mapping for the vmemmap.
> In this case, I think we'll end up trying to resolve the fault from irq
> context, which will try to take the spinlock.

Yes, this absolutely can happen (with this patch), and this patch should
be dropped for now.

While this array of ~512 pages have been allocated to hugetlbfs, and one
would think that there would be no way that there could still be
references to them, another CPU can have a pointer to this struct page
(eg attempting a speculative page cache reference or
get_user_pages_fast()).  That means it will try to call
atomic_add_unless(&page->_refcount, 1, 0);

Actually, I wonder if this isn't a problem on x86 too?  Do we need to
explicitly go through an RCU grace period before freeing the pages
for use by somebody else?