linux-kernel - Re: [syzbot] [dri?] WARNING in __ww_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCo+esbJpwqq0boTaKbEL5WBjwtuynH+jcNW1rzj65jJJw@mail.gmail.com>
Date: Wed, 30 Jul 2025 12:11:54 -0700
From: John Stultz <jstultz@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>, 
	syzbot <syzbot+602c4720aed62576cd79@...kaller.appspotmail.com>, airlied@...il.com, 
	dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, 
	mripard@...nel.org, simona@...ll.ch, syzkaller-bugs@...glegroups.com, 
	tzimmermann@...e.de, Valentin Schneider <valentin.schneider@....com>, 
	"Connor O'Brien" <connoro@...gle.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [syzbot] [dri?] WARNING in __ww_mutex_wound

On Wed, Jul 30, 2025 at 2:50 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
> On 7/30/2025 1:57 PM, Maarten Lankhorst wrote:
> > Hey,
> >
> > This warning is introduced in linux-next as a4f0b6fef4b0 ("locking/mutex: Add p->blocked_on wrappers for correctness checks")
> > Adding relevant people from that commit.
> >
...
> >> ------------[ cut here ]------------
> >> WARNING: ./include/linux/sched.h:2173 at __clear_task_blocked_on include/linux/sched.h:2173 [inline], CPU#1: syz.1.8698/395
> >> WARNING: ./include/linux/sched.h:2173 at __ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346, CPU#1: syz.1.8698/395
> >> Modules linked in:
> >> CPU: 1 UID: 0 PID: 395 Comm: syz.1.8698 Not tainted 6.16.0-rc6-next-20250718-syzkaller #0 PREEMPT(full)
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> >> RIP: 0010:__clear_task_blocked_on include/linux/sched.h:2173 [inline]
> >> RIP: 0010:__ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346
>
> When wounding the lock owner, could it be possible that the lock
> owner is blocked on a different nested lock? Lock owner implies it
> is not blocked on the current lock we are trying to wound right?
>
> I remember John mentioning seeing circular chains in find_proxy_task()
> which required this but looking at this call-chain I'm wondering if
> only the __ww_mutex_check_waiters() (or some other path) requires
> __clear_task_blocked_on() for that case.

So yeah, I have tripped over this a few times (fixing and often later
re-introducing the problem) but usually later in my full proxy-exec
series, and somehow missed that the single-rq hit this.

Obviously with __ww_mutex_die() we are clearing the blocked on
relationship for the lock waiter, but in __ww_mutex_wound() we are
waking the lock *owner*, who might be waiting on a different lock, so
passing the held lock to the clear_task_blocked_on() checks trips
these warnings.

Passing NULL instead of lock is the right call here, I'll just need to
loosen the __clear_task_blocked_on() check for null as well.

I'll spin up a quick patch.

thanks
-john