linux-kernel - Re: [PATCH v2 1/2] arm64: mm: vmemmap populate to page level if not section aligned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04de1444-b28b-4c16-aafa-2b04f0a59304@quicinc.com>
Date: Wed, 25 Dec 2024 17:59:25 +0800
From: Zhenhua Huang <quic_zhenhuah@...cinc.com>
To: Catalin Marinas <catalin.marinas@....com>
CC: <will@...nel.org>, <ardb@...nel.org>, <ryan.roberts@....com>,
        <mark.rutland@....com>, <joey.gouly@....com>,
        <dave.hansen@...ux.intel.com>, <akpm@...ux-foundation.org>,
        <chenfeiyang@...ngson.cn>, <chenhuacai@...nel.org>,
        <linux-mm@...ck.org>, <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Tingwei Zhang
	<quic_tingweiz@...cinc.com>
Subject: Re: [PATCH v2 1/2] arm64: mm: vmemmap populate to page level if not
 section aligned



On 2024/12/24 22:09, Catalin Marinas wrote:
> On Tue, Dec 24, 2024 at 05:32:06PM +0800, Zhenhua Huang wrote:
>> Thanks Catalin for review!
>> Merry Christmas.
> 
> Merry Christmas to you too!
> 
>> On 2024/12/21 2:30, Catalin Marinas wrote:
>>> On Mon, Dec 09, 2024 at 05:42:26PM +0800, Zhenhua Huang wrote:
>>>> Fixes: c1cc1552616d ("arm64: MMU initialisation")
>>>
>>> I wouldn't add a fix for the first commit adding arm64 support, we did
>>> not even have memory hotplug at the time (added later in 5.7 by commit
>>> bbd6ec605c0f ("arm64/mm: Enable memory hot remove")). IIUC, this hasn't
>>> been a problem until commit ba72b4c8cf60 ("mm/sparsemem: support
>>> sub-section hotplug"). That commit broke some arm64 assumptions.
>>
>> Shall we add ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
>> because it broke arm64 assumptions ?
> 
> Yes, I think that would be better. And a cc stable to 5.4 (the above
> commit appeared in 5.3).

Got it, Thanks.

> 
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index e2739b69e11b..fd59ee44960e 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -1177,7 +1177,9 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
>>>>    {
>>>>    	WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
>>>> -	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
>>>> +	if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
>>>> +	!IS_ALIGNED(page_to_pfn((struct page *)start), PAGES_PER_SECTION) ||
>>>> +	!IS_ALIGNED(page_to_pfn((struct page *)end), PAGES_PER_SECTION))
>>>>    		return vmemmap_populate_basepages(start, end, node, altmap);
>>>>    	else
>>>>    		return vmemmap_populate_hugepages(start, end, node, altmap);
>>>
>>> An alternative would be to fix unmap_hotplug_pmd_range() etc. to avoid
>>> nuking the whole vmemmap pmd section if it's not empty. Not sure how
>>> easy that is, whether we have the necessary information (I haven't
>>> looked in detail).
>>>
>>> A potential issue - can we hotplug 128MB of RAM and only unplug 2MB? If
>>> that's possible, the problem isn't solved by this patch.
>>
>> Indeed, seems there is no guarantee that plug size must be equal to unplug
>> size...
>>
>> I have two ideas:
>> 1. Completely disable this PMD mapping optimization since there is no
>> guarantee we must align 128M memory for hotplug ..
> 
> I'd be in favour of this, at least if CONFIG_MEMORY_HOTPLUG is enabled.
> I think the only advantage here is that we don't allocate a full 2MB
> block for vmemmap when only plugging in a sub-section.

Yeah..
In my opinion, w/o subsection hotplugging support, it is definitely 
beneficial. However, w/ subsection hotplugging support, it may lead to 
memory overhead and necessitate special logic in codes since we always 
use a full 2MB block..

> 
>> 2. If we want to take this optimization.
>> I propose adding one argument to vmemmap_free to indicate if the entire
>> section is freed(based on subsection map). Vmemmap_free is a common function
>> and might affect other architectures... The process would be:
>> vmemmap_free
>> 	unmap_hotplug_range //In unmap_hotplug_pmd_range() as you mentioned：if
>> whole section is freed, proceed as usual. Otherwise, *just clear out struct
>> page content but do not free*.
>> 	free_empty_tables // will be called only if entire section is freed
>>
>> On the populate side,
>> else if (vmemmap_check_pmd(pmd, node, addr, next)) //implement this function
>> 	continue;	//Buffer still exists, just abort..
>>
>> Could you please comment further whether #2 is feasible ?
> 
> vmemmap_free() already gets start/end, so it could at least check the
> alignment and avoid freeing if it's not unplugging a full section. It
> does leave a 2MB vmemmap block in place when freeing the last subsection
> but it's safer than freeing valid struct page entries. In addition, it
> could query the memory hotplug state with something like
> find_memory_block() and figure out whether the section is empty.
> 

Do you mean that we need not clear struct page entries of subsection 
until the entire section fully unplugged ? That seems feasible.

BTW, You're right, I went through codes again, only export 
is_subsection_map_empty() for query is another option..
	page_to_pfn() to get pfn
	__nr_to_section() to get mem_section
	last call is_subsection_map_empty() we can get subsection hotplug 
status per section
w/ this approach, we need not to do changes for func vmemmap_free

> Anyway, I'll be off until the new year, maybe I get other ideas by then.
> 

Sure, Happy Holiday! I will prepare both of patches and wait for your 
further comments :