linux-kernel - Re: [GIT PULL] locking/urgent for v6.17-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250825160406.ZVcVPStz@linutronix.de>
Date: Mon, 25 Aug 2025 18:04:06 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Borislav Petkov <bp@...en8.de>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>, x86-ml <x86@...nel.org>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] locking/urgent for v6.17-rc1

On 2025-08-22 17:28:02 [-0700], Sean Christopherson wrote:
> > > https://lore.kernel.org/all/aJ_vEP2EHj6l0xRT@google.com
> > 
> > I somehow missed it. Can you try rc2 with the patch I just sent? 
> 
> No dice, fails with the same signature.
> 
> I got a trimmed down reproduer.  Load KVM, run this in the background (in a loop)
> to constantly trigger try_to_wake_up() on relevant tasks (needs to be run as root):
> 
>   echo Y > /sys/module/kvm/parameters/nx_huge_pages
>   echo N > /sys/module/kvm/parameters/nx_huge_pages
>   sleep .2
> 
> and then run the hardware_disable_test KVM selftest (from
> tools/testing/selftests/kvm/hardware_disable_test.c).

With this information I was able to reproduce what you had in the link a
the top. I don't know why it happens. It hangs and lockdep isn't happy
with the lock - it seems to be a valid task_struct::pi_lock lock for one
of the kvm-nx-lpage-recovery threads.

I got rid of all free_percpu() and kvfree() in futex/core.c (and leak
memory, yes) and this still happens.
I was able to avoid the crash if I skip the assignment of the second
private hash but it turned out that I was not patient enough.

The strange part here is that the private hash is not used. The private
hash gets allocated and resized because hardware_disable_test creates a
lot of threads. But it is not used, it just sits around and waits to be
cleared.

And it also seems to happen if I tell futex_hash_allocate_default() not
to do anything at all.

kvm-nx-lpage-recovery shares the mm but it grabs a reference.
It might be a coincidence but the task, on which the wakeup chokes,
seems to be gone according to my traces. And with

diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -75,7 +84,10 @@ static int vhost_task_fn(void *data)
  */
 void vhost_task_wake(struct vhost_task *vtsk)
 {
-	wake_up_process(vtsk->task);
+	mutex_lock(&vtsk->exit_mutex);
+	if (!test_bit(VHOST_TASK_FLAGS_KILLED, &vtsk->flags))
+		wake_up_process(vtsk->task);
+	mutex_unlock(&vtsk->exit_mutex);
 }
 EXPORT_SYMBOL_GPL(vhost_task_wake);

it doesn't crash anymore. Could it attempts to wake a task that is gone?

> Strace on hardware_disable_test spewed a whole pile of these
> 
>   wait4(32861, 0x7ffc66475dec, WNOHANG, NULL) = 0
>   futex(0x7fb735c43000, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)

That is a shared FUTEX and is probably part pthread_join().

> immediately before the crash.  I assume it corresponds to this:
> 
> 		/* Child is still running, keep waiting. */
> 		if (pid != waitpid(pid, &status, WNOHANG))
> 			continue;
> 
> I also got a new splat on the "WARN_ON_ONCE(ret < 0);" at the end of __futex_ref_atomic_end().
> This happened during boot; AFAICT our userspace was setting up cgroups.  In this
> case, the system hung and I had to reboot.

This is odd

>   ------------[ cut here ]------------
>   WARNING: CPU: 45 PID: 0 at kernel/futex/core.c:1604 futex_ref_rcu+0xbf/0xf0
…
> Heh, and two more when booting a different system.  Guess it's my lucky day.
> This time whatever went sideways didn't appear to be fatal as the system booted
> and I could ssh in.  One is the same WARN as above, and the second WARN on the
> system hit the
> 
>   WARN_ON_ONCE(atomic_long_read(&mm->futex_atomic) != 0);
> 
> in futex_hash_allocate().

This means the counter don't add up after the switch. Not sure how. This
seems to be a random task but it might be part of the previous splat.

Sebastian