linux-kernel - Re: [PATCH 2/2] arm64: mmu: use pagetable_alloc_nolock() while stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aUFduAz0BxYFQtc+@e129823.arm.com>
Date: Tue, 16 Dec 2025 13:25:12 +0000
From: Yeoreum Yun <yeoreum.yun@....com>
To: Brendan Jackman <jackmanb@...gle.com>
Cc: akpm@...ux-foundation.org, david@...nel.org, lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org,
	surenb@...gle.com, mhocko@...e.com, ast@...nel.org,
	daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
	eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev,
	john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
	haoluo@...gle.com, jolsa@...nel.org, hannes@...xchg.org,
	ziy@...dia.com, bigeasy@...utronix.de, clrkwllms@...nel.org,
	rostedt@...dmis.org, catalin.marinas@....com, will@...nel.org,
	ryan.roberts@....com, kevin.brodsky@....com, dev.jain@....com,
	yang@...amperecomputing.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
	linux-rt-devel@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH 2/2] arm64: mmu: use pagetable_alloc_nolock() while
 stop_machine()

> On Tue Dec 16, 2025 at 12:01 PM UTC, Yeoreum Yun wrote:
> >> On Tue Dec 16, 2025 at 11:03 AM UTC, Yeoreum Yun wrote:
> >> > Hi Brendan,
> >> >
> >> >> On Mon Dec 15, 2025 at 10:06 AM UTC, Yeoreum Yun wrote:
> >> >> [snip]
> >> >> >> Overall I am feeling a bit uncomfortable about this use of _nolock, but
> >> >> >> I am also feeling pretty ignorant about PREEMPT_RT and also about this
> >> >> >> arm64 code, so I am hesitant to suggest alternatives, I hope someone
> >> >> >> else can offer some input here...
> >> >> >
> >> >> > I understand. However, as I mentioned earlier,
> >> >> > my main intention was to hear opinions specifically about memory contention.
> >> >> >
> >> >> > That said, if there is no memory contention,
> >> >> > I don’t think using the _nolock API is necessarily a bad approach.
> >> >>
> >> >>
> >> >> > In fact, I believe a bigger issue is that, under PREEMPT_RT,
> >> >> > code that uses the regular memory allocation APIs may give users the false impression
> >> >> > that those APIs are “safe to use,” even though they are not.
> >> >>
> >> >> Yeah, I share this concern. I would bet I have written code that's
> >> >> broken under PREEMPT_RT (luckily only in Google's kernel fork). The
> >> >> comment for GFP_ATOMIC says:
> >> >>
> >> >>  * %GFP_ATOMIC users can not sleep and need the allocation to succeed. A lower
> >> >>  * watermark is applied to allow access to "atomic reserves".
> >> >>  * The current implementation doesn't support NMI and few other strict
> >> >>  * non-preemptive contexts (e.g. raw_spin_lock). The same applies to %GFP_NOWAIT.
> >> >>
> >> >> It kinda sounds like it's supposed to be OK to use GFP_ATOMIC in a
> >> >> normal preempt_disable() context. So do you know exactly why it's
> >> >> invalid to use it in this stop_machine() context here? Maybe we need to
> >> >> update this comment.
> >> >
> >> > In non-PREEMPT_RT configurations, this is fine to use.
> >> > However, in PREEMPT_RT, it should not be used because
> >> > spin_lock becomes a sleepable lock backed by an rt-mutex.
> >> >
> >> > From Documentation/locking/locktypes.rst:
> >> >
> >> >   The fact that PREEMPT_RT changes the lock category of spinlock_t and
> >> >   rwlock_t from spinning to sleeping.
> >> >
> >> > As you know, all locks related to memory allocation
> >> > (e.g., zone_lock, PCP locks, etc.) use spin_lock,
> >> > which becomes sleepable under PREEMPT_RT.
> >> >
> >> > The callback of stop_machine() is executed in a preemption-disabled context
> >> > (see cpu_stopper_thread()). In this context, if it fails to acquire a spinlock
> >> > during memory allocation,
> >> > the task would be able to go to sleep while preemption is disabled,
> >> > which is an obviously problematic situation.
> >>
> >> But this is what I mean, doesn't this sound like the GFP_ATOMIC comment
> >> I quoted is wrong (or at least, it implies things which are wrong)? The
> >> comment refers specifically to raw_spin_lock() and "strict
> >> non-preemptive contexts". Which sounds like it is being written with
> >> PREEMPT_RT in mind. But that doesn't really match what you've said.
> >
> > No. I think the comment of GFP_ATOMIC is right.
> > It definitely said:
> >   The current implementation *doesn't support* NMI and few other strict
> >   *non-preemptive contexts (e.g. raw_spin_lock)*.
>
> But this phrasing sounds like there are other non-preemptive contexts
> that it _does_ support. I would definitely read this as implying that
> plain old preempt_disable() is OK. I don't understand what those "few
> other strict contexts" are, nor why the stop_machine() context is
> included in them.

I think this phrasing seems to consider non-preeptive case for
the priority or schedule policy but still make me confused too.
But What I worth to say the stop_machine() -- exactly the callback
context (stopper thread context) by stop_machine() is
the same for raw_spin_lock() case where
explictly disable preemption by calling preempt_disable().

The reason why raw_spin_lock() context couldn't call the GFP_ATOMIC
since it explicitly disable preemption by calling preempt_disable().

stop_machine() callback context -- stopper thread's context is also the same.
when it calls the callback by stopper (see cpu_stopper_thread()):

  ...
  preempt_count_inc();
  ret = fn(arg);
  ...
  preempt_count_dec();
  ...

preemption is explicitly disabled like raw_spin_lock()

So it seems to include in "few strict non-preemptive context".

--
Sincerely,
Yeoreum Yun