linux-kernel - Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <908066c7-b749-4f95-b006-ce9b5bd1a909@oracle.com>
Date: Wed, 7 Feb 2024 18:24:52 -0800
From: Jane Chu <jane.chu@...cle.com>
To: Matthew Wilcox <willy@...radead.org>, Will Deacon <will@...nel.org>
Cc: Nanyong Sun <sunnanyong@...wei.com>,
        Catalin Marinas <catalin.marinas@....com>, muchun.song@...ux.dev,
        akpm@...ux-foundation.org, anshuman.khandual@....com,
        wangkefeng.wang@...wei.com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize

On 2/7/2024 6:17 AM, Matthew Wilcox wrote:

> On Wed, Feb 07, 2024 at 12:11:25PM +0000, Will Deacon wrote:
>> On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote:
>>> The pte lock cannot be taken in irq context (which I think is what
>>> you're asking?)  While it is not possible to reason about all users of
>>> struct page, we are somewhat relieved of that work by noting that this is
>>> only for hugetlbfs, so we don't need to reason about slab, page tables,
>>> netmem or zsmalloc.
>> My concern is that an interrupt handler tries to access a 'struct page'
>> which faults due to another core splitting a pmd mapping for the vmemmap.
>> In this case, I think we'll end up trying to resolve the fault from irq
>> context, which will try to take the spinlock.
> Yes, this absolutely can happen (with this patch), and this patch should
> be dropped for now.
>
> While this array of ~512 pages have been allocated to hugetlbfs, and one
> would think that there would be no way that there could still be
> references to them, another CPU can have a pointer to this struct page
> (eg attempting a speculative page cache reference or
> get_user_pages_fast()).  That means it will try to call
> atomic_add_unless(&page->_refcount, 1, 0);
>
> Actually, I wonder if this isn't a problem on x86 too?  Do we need to
> explicitly go through an RCU grace period before freeing the pages
> for use by somebody else?
>
Sorry, not sure what I'm missing, please help.

 From hugetlb allocation perspective,  one of the scenarios is run time 
hugetlb page allocation (say 2M pages), starting from the buddy 
allocator returns compound pages, then the head page is set to frozen, 
then the folio(compound pages) is put thru the HVO process, one of which 
is vmemmap_split_pmd() in case a vmemmap page is a PMD page.

Until the HVO process completes, none of the vmemmap represented pages 
are available to any threads, so what are the causes for IRQ threads to 
access their vmemmap pages?

thanks!

-jane