lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2619166b-13ef-4daa-82c7-1d44035a8d6c@arm.com>
Date: Tue, 20 Jan 2026 08:56:12 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Yeoreum Yun <yeoreum.yun@....com>, Will Deacon <will@...nel.org>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-rt-devel@...ts.linux.dev, catalin.marinas@....com,
 akpm@...ux-oundation.org, david@...nel.org, kevin.brodsky@....com,
 quic_zhenhuah@...cinc.com, dev.jain@....com, yang@...amperecomputing.com,
 chaitanyas.prakash@....com, bigeasy@...utronix.de, clrkwllms@...nel.org,
 rostedt@...dmis.org, lorenzo.stoakes@...cle.com, ardb@...nel.org,
 jackmanb@...gle.com, vbabka@...e.cz, mhocko@...e.com
Subject: Re: [PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting
 the linear mapping

On 19/01/2026 21:24, Yeoreum Yun wrote:
> Hi Will,
> 
>> On Mon, Jan 05, 2026 at 08:23:27PM +0000, Yeoreum Yun wrote:
>>> +static int __init linear_map_prealloc_split_pgtables(void)
>>> +{
>>> +	int ret, i;
>>> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
>>> +	unsigned long lend = PAGE_END;
>>> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
>>> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
>>> +
>>> +	const struct mm_walk_ops collect_to_split_ops = {
>>> +		.pud_entry	= collect_to_split_pud_entry,
>>> +		.pmd_entry	= collect_to_split_pmd_entry
>>> +	};
>>
>> Why do we need to rewalk the page-table here instead of collating the
>> number of block mappings we put down when creating the linear map in
>> the first place?

That's a good point; perhaps we can reuse the counters that this series introduces?

https://lore.kernel.org/all/20260107002944.2940963-1-yang@os.amperecomputing.com/

> 
> First, linear alias of the [_text, __init_begin) is not a target for
> the split and it also seems strange to me to add code inside alloc_init_XXX()
> that both checks an address range and counts to get the number of block mappings.
> 
> Second, for a future feature,
> I hope to add some code to split "specfic" area to be spilt e.x)
> to set a specific pkey for specific area.

Could you give more detail on this? My working assumption is that either the
system supports BBML2 or it doesn't. If it doesn't, we need to split the whole
linear map. If it does, we already have logic to split parts of the linear map
when needed.

> 
> In this case, it's useful to rewalk the page-table with the specific
> range to get the number of block mapping.
> 
>>
>>> +	split_pgtables_idx = 0;
>>> +	split_pgtables_count = 0;
>>> +
>>> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
>>> +						    &collect_to_split_ops,
>>> +						    NULL, NULL);
>>> +	if (!ret)
>>> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
>>> +							    &collect_to_split_ops,
>>> +							    NULL, NULL);
>>> +	if (ret || !split_pgtables_count)
>>> +		goto error;
>>> +
>>> +	ret = -ENOMEM;
>>> +
>>> +	split_pgtables = kvmalloc(split_pgtables_count * sizeof(struct ptdesc *),
>>> +				  GFP_KERNEL | __GFP_ZERO);
>>> +	if (!split_pgtables)
>>> +		goto error;
>>> +
>>> +	for (i = 0; i < split_pgtables_count; i++) {
>>> +		/* The page table will be filled during splitting, so zeroing it is unnecessary. */
>>> +		split_pgtables[i] = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
>>> +		if (!split_pgtables[i])
>>> +			goto error;
>>
>> This looks potentially expensive on the boot path and only gets worse as
>> the amount of memory grows. Maybe we should predicate this preallocation
>> on preempt-rt?
> 
> Agree. then I'll apply pre-allocation with PREEMPT_RT only.

I guess I'm missing something obvious but I don't understand the problem here...
We are only deferring the allocation of all these pgtables, so the cost is
neutral surely? Had we correctly guessed that the system doesn't support BBML2
earlier, we would have had to allocate all these pgtables earlier.

Another way to look at it is that we are still allocating the same number of
pgtables in the existing fallback path, it's just that we are doing it inside
the stop_machine().

My vote would be _not_ to have a separate path for PREEMPT_RT, which will end up
with significantly less testing...

Thanks,
Ryan

> 
> Thanks for your review.
> 
> --
> Sincerely,
> Yeoreum Yun


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ