[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618164756.9CeqXYlG@linutronix.de>
Date: Wed, 18 Jun 2025 18:47:56 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Calvin Owens <calvin@...nvd.org>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org,
"Lai, Yi" <yi1.lai@...ux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>, x86@...nel.org
Subject: Re: [tip: locking/urgent] futex: Allow to resize the private local
hash
On 2025-06-17 19:15:37 [-0700], Calvin Owens wrote:
> It takes longer with LTO disabled, but I'm still seeing some crashes.
>
> First this WARN:
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 1866190 at mm/slub.c:4753 free_large_kmalloc+0xa5/0xc0
> CPU: 2 UID: 1000 PID: 1866190 Comm: python3 Not tainted 6.16.0-rc2-nolto-00024-g9afe652958c3 #1 PREEMPT
…
> RIP: 0010:free_large_kmalloc+0xa5/0xc0
…
> Call Trace:
> <TASK>
> futex_hash_free+0x10/0x40
This points me to kernel/futex/core.c:1535, which is futex_phash_new.
Thanks for the provided vmlinux.
This is odd. The assignment happens only under &mm->futex_hash_lock and
it a bad pointer. The kvmalloc() pointer is stored there and only
remains there if a rehash did not happen before the task ended.
> __mmput+0xb4/0xd0
> exec_mmap+0x1e2/0x210
> begin_new_exec+0x491/0x6c0
> load_elf_binary+0x25d/0x1050
…
> ...and then it oopsed (same stack as my last mail) about twenty minutes
> later when I hit Ctrl+C to stop the build:
>
…
> I enabled lockdep and I've got it running again.
>
> I set up a little git repo with a copy of all the traces so far, and the
> kconfigs I'm running:
>
> https://github.com/jcalvinowens/lkml-debug-616
>
> ...and I pushed the actual vmlinux binaries here:
>
> https://github.com/jcalvinowens/lkml-debug-616/releases/tag/20250617
>
> There were some block warnings on another machine running the same
> workload, but of course they aren't necessarily related.
I have no explanation so far.
Sebastian
Powered by blists - more mailing lists