linux-kernel - Re: [Bug] "BUG: soft lockup in perf_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHC9VhSD0VHyNK59991PJFp8qkzM=sAsNE9nG7M42xhER13csw@mail.gmail.com>
Date: Wed, 21 May 2025 13:26:01 -0400
From: Paul Moore <paul@...l-moore.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: John <john.cs.hey@...il.com>, Peter Zijlstra <peterz@...radead.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org, 
	Stephen Smalley <stephen.smalley.work@...il.com>
Subject: Re: [Bug] "BUG: soft lockup in perf_event_open" in Linux kernel v6.14

On Wed, May 21, 2025 at 11:57 AM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> On Wed, May 21, 2025 at 09:49:41PM +0800, John wrote:
> > Dear Linux Kernel Maintainers,
> >
> > I hope this message finds you well.
> >
> > I am writing to report a potential vulnerability I encountered during
> > testing of the Linux Kernel version v6.14.
> >
> > Git Commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557 (tag: v6.14)
> >
> > Bug Location: 0010:orc_find arch/x86/kernel/unwind_orc.c:217
> >
> > Bug report: https://pastebin.com/QzuTF9kT
> >
> > Complete log: https://pastebin.com/XjZYbiCH
> >
> > Entire kernel config: https://pastebin.com/MRWGr3nv
> >
> > Root Cause Analysis:
> >
> > A soft lockup occurred on CPU#0 in the unwind_next_frame() function
> > during stack unwinding triggered by arch_stack_walk().
> > This was called in the middle of __kasan_slab_free() as part of RCU
> > reclamation path (rcu_do_batch()), likely triggered by a SLAB object
> > free in SELinux's avc_reclaim_node().
> > The system was under heavy AVC pressure due to continuous audit and
> > avc_has_perm() calls (e.g., from selinux_perf_event_open), leading to
> > repeated avc_node allocations and reclamations under spinlocks.
>
> I'm out of the office but I couldn't help myself glancing at it.
>
> It looks like a deadlock in the selinux code.  Two of the CPUs are
> waiting for a spinlock in avc_reclaim_node().  A third CPU is running in
> avc code (currently context_struct_compute_av).
>
> Adding a few selinux folks.

Thanks Josh, although I'm looking at the three CPU backtraces you
mentioned and I'm not sure it's a SELinux deadlock.  The two CPUs that
are in avc_reclaim_node() are in the process of dropping their
spinlocks (they are calling spin_unlock_irqrestore()) and the other
CPU which is doing the permission lookup, e.g. the
context_struct_compute_av() CPU, shouldn't be holding any of those
spinlocks, although it should be in a RCU critical section.

-- 
paul-moore.com