[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <908066c7-b749-4f95-b006-ce9b5bd1a909@oracle.com>
Date: Wed, 7 Feb 2024 18:24:52 -0800
From: Jane Chu <jane.chu@...cle.com>
To: Matthew Wilcox <willy@...radead.org>, Will Deacon <will@...nel.org>
Cc: Nanyong Sun <sunnanyong@...wei.com>,
Catalin Marinas <catalin.marinas@....com>, muchun.song@...ux.dev,
akpm@...ux-foundation.org, anshuman.khandual@....com,
wangkefeng.wang@...wei.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize
On 2/7/2024 6:17 AM, Matthew Wilcox wrote:
> On Wed, Feb 07, 2024 at 12:11:25PM +0000, Will Deacon wrote:
>> On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote:
>>> The pte lock cannot be taken in irq context (which I think is what
>>> you're asking?) While it is not possible to reason about all users of
>>> struct page, we are somewhat relieved of that work by noting that this is
>>> only for hugetlbfs, so we don't need to reason about slab, page tables,
>>> netmem or zsmalloc.
>> My concern is that an interrupt handler tries to access a 'struct page'
>> which faults due to another core splitting a pmd mapping for the vmemmap.
>> In this case, I think we'll end up trying to resolve the fault from irq
>> context, which will try to take the spinlock.
> Yes, this absolutely can happen (with this patch), and this patch should
> be dropped for now.
>
> While this array of ~512 pages have been allocated to hugetlbfs, and one
> would think that there would be no way that there could still be
> references to them, another CPU can have a pointer to this struct page
> (eg attempting a speculative page cache reference or
> get_user_pages_fast()). That means it will try to call
> atomic_add_unless(&page->_refcount, 1, 0);
>
> Actually, I wonder if this isn't a problem on x86 too? Do we need to
> explicitly go through an RCU grace period before freeing the pages
> for use by somebody else?
>
Sorry, not sure what I'm missing, please help.
From hugetlb allocation perspective, one of the scenarios is run time
hugetlb page allocation (say 2M pages), starting from the buddy
allocator returns compound pages, then the head page is set to frozen,
then the folio(compound pages) is put thru the HVO process, one of which
is vmemmap_split_pmd() in case a vmemmap page is a PMD page.
Until the HVO process completes, none of the vmemmap represented pages
are available to any threads, so what are the causes for IRQ threads to
access their vmemmap pages?
thanks!
-jane
Powered by blists - more mailing lists