lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFMoDcWy-OzE3yoV@mozart.vkv.me>
Date: Wed, 18 Jun 2025 13:56:45 -0700
From: Calvin Owens <calvin@...nvd.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: linux-kernel@...r.kernel.org, "Lai, Yi" <yi1.lai@...ux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
 hash

( Dropping linux-tip-commits from Cc )

On Wednesday 06/18 at 19:09 +0200, Sebastian Andrzej Siewior wrote:
> On 2025-06-18 09:49:18 [-0700], Calvin Owens wrote:
> > Didn't get much out of lockdep unfortunately.
> > 
> > It notices the corruption in the spinlock:
> > 
> >     BUG: spinlock bad magic on CPU#2, cargo/4129172
> >      lock: 0xffff8881410ecdc8, .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
> 
> Yes. Which is what I assumed while I suggested this. But it complains
> about bad magic. It says the magic is 0xdead4ead but this is
> SPINLOCK_MAGIC. I was expecting any value but this one.
> 
> > That was followed by this WARN:
> > 
> >     ------------[ cut here ]------------
> >     rcuref - imbalanced put()
> >     WARNING: CPU: 2 PID: 4129172 at lib/rcuref.c:266 rcuref_put_slowpath+0x55/0x70
> 
> This is "reasonable". If the lock is broken, the remaining memory is
> probably garbage anyway. It complains there that the reference put due
> to invalid counter.
> 
> …
> > The oops after that is from a different task this time, but it just
> > looks like slab corruption:
> > 
> …
> 
> The previous complained an invalid free from within the exec.
> 
> > No lock/rcu splats at all.
> It exploded before that could happen.
> 
> > > If it still explodes without LTO, would you mind trying gcc?
> > 
> > Will do.
> 
> Thank you.
> 
> > Haven't had much luck isolating what triggers it, but if I run two copies
> > of these large build jobs in a loop, it reliably triggers in 6-8 hours.
> > 
> > Just to be clear, I can only trigger this on the one machine. I ran it
> > through memtest86+ yesterday and it passed, FWIW, but I'm a little
> > suspicious of the hardware right now too. I double checked that
> > everything in the BIOS related to power/perf is at factory settings.
> 
> But then it is kind of odd that it happens only with the futex code.

I think the missing ingredient was PREEMPT: the 2nd machine has been
trying for over a day, but I rebuilt its kernel with PREEMPT_FULL this
morning (still llvm), and it just hit a similar oops.

    Oops: general protection fault, probably for non-canonical address 0x74656d2f74696750: 0000 [#1] SMP
    CPU: 10 UID: 1000 PID: 542469 Comm: cargo Not tainted 6.16.0-rc2-00045-g4663747812d1 #1 PREEMPT 
    Hardware name: Gigabyte Technology Co., Ltd. A620I AX/A620I AX, BIOS F3 07/10/2023
    RIP: 0010:futex_hash+0x23/0x90
    Code: 1f 84 00 00 00 00 00 41 57 41 56 53 48 89 fb e8 b3 04 fe ff 48 89 df 31 f6 e8 79 00 00 00 48 8b 78 18 49 89 c6 48 85 ff 74 55 <80> 7f 21 00 75 4f f0 83 07 01 79 49 e8 fc 17 37 00 84 c0 75 40 e8
    RSP: 0018:ffffc9002e46fcd8 EFLAGS: 00010202
    RAX: ffff888a68e25c40 RBX: ffffc9002e46fda0 RCX: 0000000036616534
    RDX: 00000000ffffffff RSI: 0000000910180c00 RDI: 74656d2f7469672f
    RBP: 00000000000000b0 R08: 000000000318dd0d R09: 000000002e117cb0
    R10: 00000000318dd0d0 R11: 000000000000001b R12: 0000000000000000
    R13: 000055e79b431170 R14: ffff888a68e25c40 R15: ffff8881ea0ae900
    FS:  00007f1b6037b580(0000) GS:ffff8898a528b000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000555830170098 CR3: 0000000d73e93000 CR4: 0000000000350ef0
    Call Trace:
     <TASK>
     futex_wait_setup+0x7e/0x1d0
     __futex_wait+0x63/0x120
     ? __futex_wake_mark+0x40/0x40
     futex_wait+0x5b/0xd0
     ? hrtimer_dummy_timeout+0x10/0x10
     do_futex+0x86/0x120
     __x64_sys_futex+0x10a/0x180
     do_syscall_64+0x48/0x4f0
     entry_SYSCALL_64_after_hwframe+0x4b/0x53

I also enabled DEBUG_PREEMPT, but that didn't print any additional info.

I'm testing a GCC kernel on both machines now.

Thanks,
Calvin

> Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ