lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <h33pscn2orhpb5dapttmo4vy4s3drfsjaahmjp4arsjjfngzno@bzbacqjvfe6r>
Date: Wed, 7 Jan 2026 21:19:20 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>, 
	Brendan Jackman <jackmanb@...gle.com>, Johannes Weiner <hannes@...xchg.org>, Zi Yan <ziy@...dia.com>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Clark Williams <clrkwllms@...nel.org>, 
	Steven Rostedt <rostedt@...dmis.org>, linux-mm@...ck.org, linux-kernel@...r.kernel.org, 
	linux-rt-devel@...ts.linux.dev, stable@...r.kernel.org, 
	kernel test robot <oliver.sang@...el.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH mm-hotfixes] mm/page_alloc: prevent pcp corruption with
 SMP=n

On Mon, Jan 05, 2026 at 04:08:56PM +0100, Vlastimil Babka wrote:
> The kernel test robot has reported:
> 
>  BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28
>   lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0
>  CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT  8cc09ef94dcec767faa911515ce9e609c45db470
>  Call Trace:
>   <IRQ>
>   __dump_stack (lib/dump_stack.c:95)
>   dump_stack_lvl (lib/dump_stack.c:123)
>   dump_stack (lib/dump_stack.c:130)
>   spin_dump (kernel/locking/spinlock_debug.c:71)
>   do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?)
>   _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138)
>   __free_frozen_pages (mm/page_alloc.c:2973)
>   ___free_pages (mm/page_alloc.c:5295)
>   __free_pages (mm/page_alloc.c:5334)
>   tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290)
>   ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289)
>   ? rcu_core (kernel/rcu/tree.c:?)
>   rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861)
>   rcu_core_si (kernel/rcu/tree.c:2879)
>   handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623)
>   __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725)
>   irq_exit_rcu (kernel/softirq.c:741)
>   sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052)
>   </IRQ>
>   <TASK>
>  RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
>   free_pcppages_bulk (mm/page_alloc.c:1494)
>   drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632)
>   __drain_all_pages (mm/page_alloc.c:2731)
>   drain_all_pages (mm/page_alloc.c:2747)
>   kcompactd (mm/compaction.c:3115)
>   kthread (kernel/kthread.c:465)
>   ? __cfi_kcompactd (mm/compaction.c:3166)
>   ? __cfi_kthread (kernel/kthread.c:412)
>   ret_from_fork (arch/x86/kernel/process.c:164)
>   ? __cfi_kthread (kernel/kthread.c:412)
>   ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
>   </TASK>
> 
> Matthew has analyzed the report and identified that in drain_page_zone()
> we are in a section protected by spin_lock(&pcp->lock) and then get an
> interrupt that attempts spin_trylock() on the same lock. The code is
> designed to work this way without disabling IRQs and occasionally fail
> the trylock with a fallback. However, the SMP=n spinlock implementation
> assumes spin_trylock() will always succeed, and thus it's normally a
> no-op. Here the enabled lock debugging catches the problem, but
> otherwise it could cause a corruption of the pcp structure.
> 
> The problem has been introduced by commit 574907741599 ("mm/page_alloc:
> leave IRQs enabled for per-cpu page allocations"). The pcp locking
> scheme recognizes the need for disabling IRQs to prevent nesting
> spin_trylock() sections on SMP=n, but the need to prevent the nesting in
> spin_lock() has not been recognized. Fix it by introducing local
> wrappers that change the spin_lock() to spin_lock_iqsave() with SMP=n
> and use them in all places that do spin_lock(&pcp->lock).
> 

Bah, correct.

> Fixes: 574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations")
> Cc: stable@...r.kernel.org
> Reported-by: kernel test robot <oliver.sang@...el.com>
> Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com
> Analyzed-by: Matthew Wilcox <willy@...radead.org>
> Link: https://lore.kernel.org/all/aUW05pyc9nZkvY-1@casper.infradead.org/
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
> ---
> This fix is intentionally made self-contained and not trying to expand
> upon the existing pcp[u]_spin() helpers. This is to make stable
> backports easier due to recent cleanups to that helpers.
> 
> We could follow up with a proper helpers integration going forward.
> However I think the assumptions SMP=n of the spinlock UP implementation
> are just wrong. It should be valid to do a spin_lock() without disabling
> irq's and rely on a nested spin_trylock() to fail. I will thus try
> proposing the remove the UP implementation first. It should be within
> the current trend of removing stuff that's optimized for a minority
> configuration if it makes maintainability of the majority worse.
> (c.f. recent scheduler SMP=n removal)

It would be fair. Maybe it'll take a performance hit because from a
maintenance perspective, it would be preferable. It's true that
spin_trylock within a lock protected region on UP is somewhat bogus, but
not impossible either. Even if the resulting code is buggy anyway, it
would be preferable to fail early than hide.

> ---
>  mm/page_alloc.c | 45 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 37 insertions(+), 8 deletions(-)
> 

With or without the renaming on top;

Acked-by: Mel Gorman <mgorman@...hsingularity.net>

Thanks.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ