lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618170924.Z34OXf1E@linutronix.de>
Date: Wed, 18 Jun 2025 19:09:24 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Calvin Owens <calvin@...nvd.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
	"Lai, Yi" <yi1.lai@...ux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
 hash

On 2025-06-18 09:49:18 [-0700], Calvin Owens wrote:
> Didn't get much out of lockdep unfortunately.
> 
> It notices the corruption in the spinlock:
> 
>     BUG: spinlock bad magic on CPU#2, cargo/4129172
>      lock: 0xffff8881410ecdc8, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1

Yes. Which is what I assumed while I suggested this. But it complains
about bad magic. It says the magic is 0xdead4ead but this is
SPINLOCK_MAGIC. I was expecting any value but this one.

> That was followed by this WARN:
> 
>     ------------[ cut here ]------------
>     rcuref - imbalanced put()
>     WARNING: CPU: 2 PID: 4129172 at lib/rcuref.c:266 rcuref_put_slowpath+0x55/0x70

This is "reasonable". If the lock is broken, the remaining memory is
probably garbage anyway. It complains there that the reference put due
to invalid counter.

…
> The oops after that is from a different task this time, but it just
> looks like slab corruption:
> 
…

The previous complained an invalid free from within the exec.

> No lock/rcu splats at all.
It exploded before that could happen.

> > If it still explodes without LTO, would you mind trying gcc?
> 
> Will do.

Thank you.

> Haven't had much luck isolating what triggers it, but if I run two copies
> of these large build jobs in a loop, it reliably triggers in 6-8 hours.
> 
> Just to be clear, I can only trigger this on the one machine. I ran it
> through memtest86+ yesterday and it passed, FWIW, but I'm a little
> suspicious of the hardware right now too. I double checked that
> everything in the BIOS related to power/perf is at factory settings.

But then it is kind of odd that it happens only with the futex code.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ