[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9c8b047c-e4c9-4abb-88f2-f7e15b59bd9c@intel.com>
Date: Mon, 17 Feb 2025 13:40:57 +0200
From: Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>
To: Dave Hansen <dave.hansen@...el.com>, "Harry (Hyeonggon) Yoo"
<42.hyeyoo@...il.com>
CC: <linux-kernel@...r.kernel.org>, <osalvador@...e.de>, <byungchul@...com>,
<dave.hansen@...ux.intel.com>, <luto@...nel.org>, <peterz@...radead.org>,
<akpm@...ux-foundation.org>, <max.byungchul.park@...com>,
<max.byungchul.park@...il.com>
Subject: Re: [RFC 1/1] x86/vmemmap: Add missing update of PML4 table / PML5
table entry
On 2/15/25 2:29 AM, Dave Hansen wrote:
> On 2/14/25 16:20, Harry (Hyeonggon) Yoo wrote:
>> On Fri, Feb 14, 2025 at 11:57:50AM -0800, Dave Hansen wrote:
>>> On 2/14/25 11:51, Gwan-gyeong Mun wrote:
>>>> when performing vmemmap populate, if the entry of the PML4 table/PML5 table
>>>> pointing to the target virtual address has never been updated, a page fault
>>>> occurs when the memset(start) called from the vmemmap_use_new_sub_pmd()
>>>> execution flow.
>>>
>>> "Page fault" meaning oops? Or something that we manage to handle and
>>> return from without oopsing?
>>
>> It means oops, because the kernel accesses part of vmemmap that's not
>> populated (yet) in current process's page table.
>
> Your 0/1 cover letter got to me after this mail did. I see the oops
> there clear as day now.
>
>> This oops was observed after increasing the size of struct page (as a part of
>> developing a debug feature), but the real cause is that page table entries are
>> only installed in init_mm's page table and then sync'd later, but in the mean
>> time the process that triggered hot-plug accesses new portion of vmemmap.
>>
>> If the process does not directly use the page table of init_mm (like swapper)
>> this oops can occur (e.g., I was able to trigger with `sudo modprobe hmm_test`
>> after increasing the size of struct page).
>
> Makes sense. Thanks for the explanation.
>
>>>> This fixes the problem of using the virtual address without updating the
>>>> entry in the PML4 table or PML5 table. But this is a temporary solution to
>>>> prevent page fault problems, and it requires improvement of the routine
>>>> that updates the missing entry in the PML4 table or PML5 table.
>>>
>>> Can we please skip past the band-aid and go to the real fix?
>>
>> Yes, of course it'd best to skip a temporary fix.
>> The intention is to report/discuss the problem and a fix as a starting point.
>
> Do you have a better fix in mind?
>
Yes, first what comes to mind right now to safely access the virtual
address is; translating vmemmap-based virtual address to direct-mapped
virtual address and use it, if the current top-level page table is not
init_mm's page table when accessing a vmemmap-based virtual address
before page table sync.
I will send a patch first with this idea.
If you have any better ideas, please let me know.
Br,
G.G.
Powered by blists - more mailing lists