[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260122075747.uSLrSJez@linutronix.de>
Date: Thu, 22 Jan 2026 08:57:47 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Waiman Long <llong@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mike Rapoport <rppt@...nel.org>,
Clark Williams <clrkwllms@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-rt-devel@...ts.linux.dev,
Wei Yang <richard.weiyang@...il.com>,
David Hildenbrand <david@...nel.org>
Subject: Re: [PATCH] mm/mm_init: Don't call cond_resched() in
deferred_init_memmap_chunk() if rcu_preempt_depth() set
On 2026-01-21 13:27:32 [-0800], Paul E. McKenney wrote:
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -2085,7 +2085,12 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
> > > > spfn = chunk_end;
> > > > - if (irqs_disabled())
> > > > + /*
> > > > + * pgdat_resize_lock() only disables irqs in non-RT
> > > > + * kernels but calls rcu_read_lock() in a PREEMPT_RT
> > > > + * kernel.
> > > > + */
> > > > + if (irqs_disabled() || rcu_preempt_depth())
> > > > touch_nmi_watchdog();
> > > rcu_preempt_depth() seems a fairly internal low-level thing - it's
> > > rarely used.
If you acquire a lock from time to time and you pass a bool the let the
function below know whether scheduling is fine or not then it is
obvious. If you choose to check for symptoms of an acquired lock then
you have to use also the rarely used functions ;)
> > That is true. Beside the scheduler, workqueue also use rcu_preempt_depth().
> > This API is included in "include/linux/rcupdate.h" which is included
> > directly or indirectly by many kernel files. So even though it is rarely
> > used, but it is still a public API.
>
> It is a bit tricky, for example, given a kernel built with both
> CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_DYNAMIC=y, it will never
> invoke touch_nmi_watchdog(), even if it really is in an RCU read-side
> critical section. This is because it was intended for lockdep-like use,
> where (for example) you don't want to complain about sleeping in an RCU
> read-side critical section unless you are 100% sure that you are in fact
> in an RCU read-side critical section.
>
> Maybe something like this?
>
> if (irqs_disabled() || !IS_ENABLED(CONFIG_PREEMPT_RCU) || rcu_preempt_depth())
> touch_nmi_watchdog();
I don't understand the PREEMPT_NONE+DYNAMIC reasoning. irqs_disabled()
should not be affected by this and rcu_preempt_depth() will be 0 for
!CONFIG_PREEMPT_RCU so I don't think this is required.
> This would *always* invoke touch_nmi_watchdog() for such kernels, which
> might or might not be OK.
>
> I freely confesss that I am not sure which of these is appropriate in
> this setting.
What about a more straight forward and obvious approach?
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f1..0b283fd48b282 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2059,7 +2059,7 @@ static unsigned long __init deferred_init_pages(struct zone *zone,
*/
static unsigned long __init
deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
- struct zone *zone)
+ struct zone *zone, bool may_schedule)
{
int nid = zone_to_nid(zone);
unsigned long nr_pages = 0;
@@ -2085,10 +2085,10 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
spfn = chunk_end;
- if (irqs_disabled())
- touch_nmi_watchdog();
- else
+ if (may_schedule)
cond_resched();
+ else
+ touch_nmi_watchdog();
}
}
@@ -2101,7 +2101,7 @@ deferred_init_memmap_job(unsigned long start_pfn, unsigned long end_pfn,
{
struct zone *zone = arg;
- deferred_init_memmap_chunk(start_pfn, end_pfn, zone);
+ deferred_init_memmap_chunk(start_pfn, end_pfn, zone, true);
}
static unsigned int __init
@@ -2216,7 +2216,7 @@ bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
for (spfn = first_deferred_pfn, epfn = SECTION_ALIGN_UP(spfn + 1);
nr_pages < nr_pages_needed && spfn < zone_end_pfn(zone);
spfn = epfn, epfn += PAGES_PER_SECTION) {
- nr_pages += deferred_init_memmap_chunk(spfn, epfn, zone);
+ nr_pages += deferred_init_memmap_chunk(spfn, epfn, zone, false);
}
/*
Wouldn't this work?
> Thanx, Paul
Sebastian
Powered by blists - more mailing lists