linux-kernel - Re: [Bug] "BUG: soft lockup in perf_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHC9VhRyQZRQ+2zWUZzYZyp64FUnVPiL8_rDq3VOowwu+yFB_w@mail.gmail.com>
Date: Wed, 21 May 2025 17:15:52 -0400
From: Paul Moore <paul@...l-moore.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: John <john.cs.hey@...il.com>, Peter Zijlstra <peterz@...radead.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org, 
	Stephen Smalley <stephen.smalley.work@...il.com>
Subject: Re: [Bug] "BUG: soft lockup in perf_event_open" in Linux kernel v6.14

On Wed, May 21, 2025 at 3:05 PM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> On Wed, May 21, 2025 at 01:26:01PM -0400, Paul Moore wrote:
> > On Wed, May 21, 2025 at 11:57 AM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
> > > On Wed, May 21, 2025 at 09:49:41PM +0800, John wrote:
> > > > Dear Linux Kernel Maintainers,
> > > >
> > > > I hope this message finds you well.
> > > >
> > > > I am writing to report a potential vulnerability I encountered during
> > > > testing of the Linux Kernel version v6.14.
> > > >
> > > > Git Commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557 (tag: v6.14)
> > > >
> > > > Bug Location: 0010:orc_find arch/x86/kernel/unwind_orc.c:217
> > > >
> > > > Bug report: https://pastebin.com/QzuTF9kT
> > > >
> > > > Complete log: https://pastebin.com/XjZYbiCH
> > > >
> > > > Entire kernel config: https://pastebin.com/MRWGr3nv
> > > >
> > > > Root Cause Analysis:
> > > >
> > > > A soft lockup occurred on CPU#0 in the unwind_next_frame() function
> > > > during stack unwinding triggered by arch_stack_walk().
> > > > This was called in the middle of __kasan_slab_free() as part of RCU
> > > > reclamation path (rcu_do_batch()), likely triggered by a SLAB object
> > > > free in SELinux's avc_reclaim_node().
> > > > The system was under heavy AVC pressure due to continuous audit and
> > > > avc_has_perm() calls (e.g., from selinux_perf_event_open), leading to
> > > > repeated avc_node allocations and reclamations under spinlocks.
> > >
> > > I'm out of the office but I couldn't help myself glancing at it.
> > >
> > > It looks like a deadlock in the selinux code.  Two of the CPUs are
> > > waiting for a spinlock in avc_reclaim_node().  A third CPU is running in
> > > avc code (currently context_struct_compute_av).
> > >
> > > Adding a few selinux folks.
> >
> > Thanks Josh, although I'm looking at the three CPU backtraces you
> > mentioned and I'm not sure it's a SELinux deadlock.  The two CPUs that
> > are in avc_reclaim_node() are in the process of dropping their
> > spinlocks (they are calling spin_unlock_irqrestore()) and the other
> > CPU which is doing the permission lookup, e.g. the
> > context_struct_compute_av() CPU, shouldn't be holding any of those
> > spinlocks, although it should be in a RCU critical section.
>
> Maybe it's not a deadlock, but avc_reclaim_node() does do a tight loop
> of temporary spinlocks with some condition checks in between, with the
> IRQs happening in spin_unlock_irqrestore() due to IRQs being temporarily
> re-enabled.
>
> I don't pretend to understand that code, but it does look suspicious
> that two of the CPUs are running in that same avc reclaim loop (one of
> which's IRQ is doing avc_node_free(), could that be a race?)

The two CPUs in avc_reclaim_node() don't seem to be that odd, the
system's SELinux access vector/decision cache is full and the code is
purging some old entries so new entries can be added.  There is a
window where the AVC on one CPU can decide it needs to free some room
in the cache and start that process, but before it can actually make
progress on that and make some space another CPU also hits the AVC and
decides it needs more room.

The system does seem to be undergoing churn with respect to the AVC,
and if KASAN is enabled that likely adds an additional burden to all
of the AVC operations.  The avc_reclaim_node() code tries to empty up
to AVC_CACHE_RECLAIM/16 cache entries at a time, presumably to limit
the amount of time the per-bucket spinlock is held.  Without knowing
more about the reporter's system, it is hard to say for certain, but
it is possible that the AVC_CACHE_RECLAIM limit is too high given the
workload, KASAN, and other factors.  The fact that the lockup is
hitting when the CPU is trying to drop the spinlock and the IRQs are
firing tells me that the reclaim limit isn't too far off; it would be
interesting to see how much time the CPU spent processing the IRQs, I
wonder if that is a significant contributor.

We could make the reclaim limit smaller, but that would potentially
create more work for systems that can handle the current
AVC_CACHE_RECLAIM value, and it is worth noting that we haven't seen
widespread problem reports like this so I'm somewhat hesitant to start
messing with this limit.  Has anyone seen similar reports lately?

I imagine we could also do something to limit the number of CPUs
simultaneously trying to reclaim AVC space, but I don't think that's
the real problem here.

> ... and the third CPU is also in AVC code.

It is passing through the AVC/cache code and is querying the SELinux
policy directly because the requested access decision result is not
currently cached in the AVC.  As mentioned previously, this CPU
shouldn't be holding any SELinux/AVC related locks, although it is in
a RCU critical section.

> So if not a deadlock, maybe some race condition.  Or maybe things slowed
> to a crawl thanks to KASAN and the AVC-centric workload.
>
> Regardless I do think it's unlikely to be the unwinder's fault here as
> KASAN happens to do a lot of unwinds for allocations and frees.

-- 
paul-moore.com