[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANDhNCo+esbJpwqq0boTaKbEL5WBjwtuynH+jcNW1rzj65jJJw@mail.gmail.com>
Date: Wed, 30 Jul 2025 12:11:54 -0700
From: John Stultz <jstultz@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
syzbot <syzbot+602c4720aed62576cd79@...kaller.appspotmail.com>, airlied@...il.com,
dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
mripard@...nel.org, simona@...ll.ch, syzkaller-bugs@...glegroups.com,
tzimmermann@...e.de, Valentin Schneider <valentin.schneider@....com>,
"Connor O'Brien" <connoro@...gle.com>, "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [syzbot] [dri?] WARNING in __ww_mutex_wound
On Wed, Jul 30, 2025 at 2:50 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
> On 7/30/2025 1:57 PM, Maarten Lankhorst wrote:
> > Hey,
> >
> > This warning is introduced in linux-next as a4f0b6fef4b0 ("locking/mutex: Add p->blocked_on wrappers for correctness checks")
> > Adding relevant people from that commit.
> >
...
> >> ------------[ cut here ]------------
> >> WARNING: ./include/linux/sched.h:2173 at __clear_task_blocked_on include/linux/sched.h:2173 [inline], CPU#1: syz.1.8698/395
> >> WARNING: ./include/linux/sched.h:2173 at __ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346, CPU#1: syz.1.8698/395
> >> Modules linked in:
> >> CPU: 1 UID: 0 PID: 395 Comm: syz.1.8698 Not tainted 6.16.0-rc6-next-20250718-syzkaller #0 PREEMPT(full)
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> >> RIP: 0010:__clear_task_blocked_on include/linux/sched.h:2173 [inline]
> >> RIP: 0010:__ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346
>
> When wounding the lock owner, could it be possible that the lock
> owner is blocked on a different nested lock? Lock owner implies it
> is not blocked on the current lock we are trying to wound right?
>
> I remember John mentioning seeing circular chains in find_proxy_task()
> which required this but looking at this call-chain I'm wondering if
> only the __ww_mutex_check_waiters() (or some other path) requires
> __clear_task_blocked_on() for that case.
So yeah, I have tripped over this a few times (fixing and often later
re-introducing the problem) but usually later in my full proxy-exec
series, and somehow missed that the single-rq hit this.
Obviously with __ww_mutex_die() we are clearing the blocked on
relationship for the lock waiter, but in __ww_mutex_wound() we are
waking the lock *owner*, who might be waiting on a different lock, so
passing the held lock to the clear_task_blocked_on() checks trips
these warnings.
Passing NULL instead of lock is the right call here, I'll just need to
loosen the __clear_task_blocked_on() check for null as well.
I'll spin up a quick patch.
thanks
-john
Powered by blists - more mailing lists