lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240731053341.GQ6352@frogsfrogsfrogs>
Date: Tue, 30 Jul 2024 22:33:41 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Chandan Babu R <chandanbabu@...nel.org>,
	Matthew Wilcox <willy@...radead.org>,
	xfs <linux-xfs@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>, x86@...nel.org,
	tglx@...utronix.de
Subject: Re: Are jump labels broken on 6.11-rc1?

On Tue, Jul 30, 2024 at 08:10:33PM -0700, Darrick J. Wong wrote:
> On Tue, Jul 30, 2024 at 05:19:50PM -0700, Darrick J. Wong wrote:
> > On Tue, Jul 30, 2024 at 03:26:26PM +0200, Peter Zijlstra wrote:
> > > On Tue, Jul 30, 2024 at 01:00:02PM +0530, Chandan Babu R wrote:
> > > > On Mon, Jul 29, 2024 at 08:38:49 PM -0700, Darrick J. Wong wrote:
> > > > > Hi everyone,
> > > > >
> > > > > I got the following splat on 6.11-rc1 when I tried to QA xfs online
> > > > > fsck.  Does this ring a bell for anyone?  I'll try bisecting in the
> > > > > morning to see if I can find the culprit.
> > > > 
> > > > xfs/566 on v6.11-rc1 would consistently cause the oops mentioned below.
> > > > However, I was able to get xfs/566 to successfully execute for five times on a
> > > > v6.11-rc1 kernel with the following commits reverted,
> > > > 
> > > > 83ab38ef0a0b2407d43af9575bb32333fdd74fb2
> > > > 695ef796467ed228b60f1915995e390aea3d85c6
> > > > 9bc2ff871f00437ad2f10c1eceff51aaa72b478f
> > > > 
> > > > Reinstating commit 83ab38ef0a0b2407d43af9575bb32333fdd74fb2 causes the kernel
> > > > to oops once again.
> > > 
> > > Durr, does this help?
> > 
> > Yes, it does!  After ~8, a full fstests run completes without incident.
> > 
> > (vs. before where it would blow up within 2 minutes)
> > 
> > Thanks for the fix; you can add
> > Tested-by: Darrick J. Wong <djwong@...nel.org>
> 
> Ofc as soon as this I push it to the whole fleet then things start
> failing again. :(

Sooooo... it turns out that somehow your patch got mismerged on the
first go-round, and that worked.  The second time, there was no
mismerge, which mean that the wrong atomic_cmpxchg() callsite was
tested.

Looking back at the mismerge, it actually changed
__static_key_slow_dec_cpuslocked, which had in 6.10:

	if (atomic_dec_and_test(&key->enabled))
		jump_label_update(key);

Decrement, then return true if the value was set to zero.  With the 6.11
code, it looks like we want to exchange a 1 with a 0, and act only if
the previous value had been 1.

So perhaps we really want this change?  I'll send it out to the fleet
and we'll see what it reports tomorrow morning.

--D

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4ad5ed8adf96..5f80c128e90e 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -289,7 +289,7 @@ static void __static_key_slow_dec_cpuslocked(struct static_key *key)
 		return;
 
 	guard(mutex)(&jump_label_mutex);
-	if (atomic_cmpxchg(&key->enabled, 1, 0))
+	if (atomic_cmpxchg(&key->enabled, 1, 0) == 1)
 		jump_label_update(key);
 	else
 		WARN_ON_ONCE(!static_key_slow_try_dec(key));

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ