linux-kernel - Re: [PATCH v5 2/3] arm64: mmu: avoid allocating pages while splitting the linear mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aW9LCX92T9t8iDco@e129823.arm.com>
Date: Tue, 20 Jan 2026 09:29:45 +0000
From: Yeoreum Yun <yeoreum.yun@....com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Will Deacon <will@...nel.org>, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
	catalin.marinas@....com, akpm@...ux-oundation.org, david@...nel.org,
	kevin.brodsky@....com, quic_zhenhuah@...cinc.com, dev.jain@....com,
	yang@...amperecomputing.com, chaitanyas.prakash@....com,
	bigeasy@...utronix.de, clrkwllms@...nel.org, rostedt@...dmis.org,
	lorenzo.stoakes@...cle.com, ardb@...nel.org, jackmanb@...gle.com,
	vbabka@...e.cz, mhocko@...e.com
Subject: Re: [PATCH v5 2/3] arm64: mmu: avoid allocating pages while
 splitting the linear mapping

Hi Ryan
> On 19/01/2026 21:24, Yeoreum Yun wrote:
> > Hi Will,
> >
> >> On Mon, Jan 05, 2026 at 08:23:27PM +0000, Yeoreum Yun wrote:
> >>> +static int __init linear_map_prealloc_split_pgtables(void)
> >>> +{
> >>> +	int ret, i;
> >>> +	unsigned long lstart = _PAGE_OFFSET(vabits_actual);
> >>> +	unsigned long lend = PAGE_END;
> >>> +	unsigned long kstart = (unsigned long)lm_alias(_stext);
> >>> +	unsigned long kend = (unsigned long)lm_alias(__init_begin);
> >>> +
> >>> +	const struct mm_walk_ops collect_to_split_ops = {
> >>> +		.pud_entry	= collect_to_split_pud_entry,
> >>> +		.pmd_entry	= collect_to_split_pmd_entry
> >>> +	};
> >>
> >> Why do we need to rewalk the page-table here instead of collating the
> >> number of block mappings we put down when creating the linear map in
> >> the first place?
>
> That's a good point; perhaps we can reuse the counters that this series introduces?
>
> https://lore.kernel.org/all/20260107002944.2940963-1-yang@os.amperecomputing.com/
>
> >
> > First, linear alias of the [_text, __init_begin) is not a target for
> > the split and it also seems strange to me to add code inside alloc_init_XXX()
> > that both checks an address range and counts to get the number of block mappings.
> >
> > Second, for a future feature,
> > I hope to add some code to split "specfic" area to be spilt e.x)
> > to set a specific pkey for specific area.
>
> Could you give more detail on this? My working assumption is that either the
> system supports BBML2 or it doesn't. If it doesn't, we need to split the whole
> linear map. If it does, we already have logic to split parts of the linear map
> when needed.

This is not for a linear mapping case. but for a "kernel text area".
As a draft, I want to mark some of kernel code can executable
both kernel and eBPF program.
(I'm trying to make eBPF program non-executable kernel code directly
with POE feature).
For this "executable area" both of kernel and eBPF program
-- typical example is exception entry, It need to split that specific
range and mark them with special POE index.

>
> >
> > In this case, it's useful to rewalk the page-table with the specific
> > range to get the number of block mapping.
> >
> >>
> >>> +	split_pgtables_idx = 0;
> >>> +	split_pgtables_count = 0;
> >>> +
> >>> +	ret = walk_kernel_page_table_range_lockless(lstart, kstart,
> >>> +						    &collect_to_split_ops,
> >>> +						    NULL, NULL);
> >>> +	if (!ret)
> >>> +		ret = walk_kernel_page_table_range_lockless(kend, lend,
> >>> +							    &collect_to_split_ops,
> >>> +							    NULL, NULL);
> >>> +	if (ret || !split_pgtables_count)
> >>> +		goto error;
> >>> +
> >>> +	ret = -ENOMEM;
> >>> +
> >>> +	split_pgtables = kvmalloc(split_pgtables_count * sizeof(struct ptdesc *),
> >>> +				  GFP_KERNEL | __GFP_ZERO);
> >>> +	if (!split_pgtables)
> >>> +		goto error;
> >>> +
> >>> +	for (i = 0; i < split_pgtables_count; i++) {
> >>> +		/* The page table will be filled during splitting, so zeroing it is unnecessary. */
> >>> +		split_pgtables[i] = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
> >>> +		if (!split_pgtables[i])
> >>> +			goto error;
> >>
> >> This looks potentially expensive on the boot path and only gets worse as
> >> the amount of memory grows. Maybe we should predicate this preallocation
> >> on preempt-rt?
> >
> > Agree. then I'll apply pre-allocation with PREEMPT_RT only.
>
> I guess I'm missing something obvious but I don't understand the problem here...
> We are only deferring the allocation of all these pgtables, so the cost is
> neutral surely? Had we correctly guessed that the system doesn't support BBML2
> earlier, we would have had to allocate all these pgtables earlier.
>
> Another way to look at it is that we are still allocating the same number of
> pgtables in the existing fallback path, it's just that we are doing it inside
> the stop_machine().
>
> My vote would be _not_ to have a separate path for PREEMPT_RT, which will end up
> with significantly less testing...

IIUC, Will's mention is additional memory allocation for
"split_pgtables" where saved "pre-allocate" page tables.
As the memory increase, definitely this size would increase the cost.

And this cost need not to burden for !PREEMPT_RT since
it can use memory allocation in stop_machine() with GFP_ATOMIC.

But I also agree in the aspect that if that cost not much of huge,
It's also convincing and additionally, as I mentioned in another thread,
It would be good not to give a hallucination GFP_ATOMIC is fine for
everywhere even in the PREEMPT_RT.

--
Sincerely,
Yeoreum Yun