[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87d0eae3-e16e-4820-adde-afb519c5dcfc@redhat.com>
Date: Thu, 22 Jan 2026 15:56:39 -0500
From: Waiman Long <llong@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Mike Rapoport <rppt@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Clark Williams <clrkwllms@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-rt-devel@...ts.linux.dev, Wei Yang <richard.weiyang@...il.com>,
David Hildenbrand <david@...nel.org>, "Paul E . McKenney"
<paulmck@...nel.org>
Subject: Re: [PATCH v3] mm/mm_init: Don't cond_resched() in
deferred_init_memmap_chunk() if called from deferred_grow_zone()
On 1/22/26 2:29 PM, Andrew Morton wrote:
> On Thu, 22 Jan 2026 13:43:43 -0500 Waiman Long <longman@...hat.com> wrote:
>
>> Commit 3acb913c9d5b ("mm/mm_init: use deferred_init_memmap_chunk()
>> in deferred_grow_zone()") made deferred_grow_zone() call
>> deferred_init_memmap_chunk() within a pgdat_resize_lock() critical
>> section with irqs disabled.
>>
>> It did check for irqs_disabled() in
>> deferred_init_memmap_chunk() to avoid calling cond_resched(). For a
>> PREEMPT_RT kernel build, however, spin_lock_irqsave() does not disable
>> interrupt but rcu_read_lock() is called. This leads to the following
>> bug report.
>>
>> BUG: sleeping function called from invalid context at mm/mm_init.c:2091
>> in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>> preempt_count: 0, expected: 0
>>
>> @@ -2085,10 +2085,10 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
>>
>> spfn = chunk_end;
>>
>> - if (irqs_disabled())
>> - touch_nmi_watchdog();
>> - else
>> + if (can_resched)
>> cond_resched();
>> + else
>> + touch_nmi_watchdog();
>> }
>> }
> Disables the cond_resched() in some situations. Can this reintroduce
> the watchdog warnings which that cond_resched() was intended to
> prevent?
cond_resched() is disabled only when it is called from
deferred_grow_zone() where a spinlock was acquired with irqs disabled in
the case of non-RT kernel and in a rcu_read_lock() acquired with RT
kernel. In either case, scheduling out should not be allowed or
something bad may happen. I suppose that iterating of pfn's in
deferred_grow_zone() requires pgdat_resize_lock() protection.
>
> The cond_resched() was added by <dig, dig> da97f2d56bbd ("mm: call
> cond_resched() from deferred_init_memmap()").
>
> Pasha's 2020 patch replaced touch_nmi_watchdog() with cond_resched() to
> prevent RCU stall warnings. So I think the answer to my question is
> yes, going back to touch_nmi_watchdog() could reintroduce those RCU
> warnings.
deferred_init_memmap() will still have cond_resched() called in the
iteration loop. It had RCU stall problem before without cond_resched()
because it needs to iterate all the available memory which can takes a
long time if we are talking about TBs of memory.
For deferred_grow_zone(), as long as the number of pfn's that are
iterated are not huge, RCU stall warning shouldn't happen.
Cheers,
Longman
Powered by blists - more mailing lists