lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250620103110.xd5CEFDs@linutronix.de>
Date: Fri, 20 Jun 2025 12:31:10 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Calvin Owens <calvin@...nvd.org>
Cc: linux-kernel@...r.kernel.org, "Lai, Yi" <yi1.lai@...ux.intel.com>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
 hash

On 2025-06-19 14:07:30 [-0700], Calvin Owens wrote:
> > Machine #2 oopsed with the GCC kernel after just over an hour:
> > 
> >     BUG: unable to handle page fault for address: ffff88a91eac4458
> >     RIP: 0010:futex_hash+0x16/0x90
…
> >     Call Trace:
> >      <TASK>
> >      futex_wait_setup+0x51/0x1b0
…

The futex_hash_bucket pointer has an invalid ->priv pointer.
This could be use-after-free or double-free. I've been looking through
your config and you don't have CONFIG_SLAB_FREELIST_* set. I don't
remember which one but one of the two has a "primitiv" double free
detection. 

…
> I am not able to reproduce the oops at all with these options:
> 
>     * DEBUG_PAGEALLOC_ENABLE_DEFAULT
>     * SLUB_DEBUG_ON

SLUB_DEBUG_ON is something that would "reliably" notice double free.
If you drop SLUB_DEBUG_ON (but keep SLUB_DEBUG) then you can boot with
slab_debug=f keeping only the consistency checks. The "poison" checks
would be excluded for instance. That allocation is kvzalloc() but it
should be small on your machine to avoid vmalloc() and use only
kmalloc().

> I'm also experimenting with stress-ng as a reproducer, no luck so far.

Not sure what you are using there. I think cargo does:
- lock/ unlock in a threads
- create new thread which triggers auto-resize
- auto-resize gets delayed due to lock/ unlock in other threads (the
  reference is held)

And now something happens leading to what we see.
_Maybe_ the cargo application terminates/ execs before the new struct is
assigned in an unexpected way.
The regular hash bucket has reference counting so it should raise
warnings if it goes wrong. I haven't seen those.

> A third machine with an older Skylake CPU died overnight, but nothing
> was logged over netconsole. Luckily it actually has a serial header on
> the motherboard, so that's wired up and it's running again, maybe it
> dies in a different way that might be a better clue...

So far I *think* that cargo does something that I don't expect and this
leads to a memory double-free. The SLUB_DEBUG_ON hopefully delays the
process long enough that the double free does not trigger.

I think I'm going to look for a random rust packet that is using cargo
for building (unless you have a recommendation) and look what it is
doing. It was always cargo after all. Maybe this brings some light.
 
> > > Thanks,
> > > Calvin

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ