linux-kernel - Re: [PATCH] locking/mutex: Add debug code to help catching violation of mutex lifetime rule

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cad83f32-d1ed-4885-8ed1-c65e5683237e@redhat.com>
Date: Fri, 11 Jul 2025 23:16:43 -0400
From: Waiman Long <llong@...hat.com>
To: Boqun Feng <boqun.feng@...il.com>, Waiman Long <llong@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
 Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH] locking/mutex: Add debug code to help catching violation
 of mutex lifetime rule

On 7/11/25 10:24 PM, Boqun Feng wrote:
> On Fri, Jul 11, 2025 at 09:48:13PM -0400, Waiman Long wrote:
>> On 7/11/25 8:42 PM, Waiman Long wrote:
>>> On 7/11/25 7:28 PM, Boqun Feng wrote:
>>>> On Fri, Jul 11, 2025 at 03:30:05PM -0700, Linus Torvalds wrote:
>>>>> On Fri, 11 Jul 2025 at 15:20, Boqun Feng <boqun.feng@...il.com> wrote:
>>>>>> Meta question: are we able to construct a case that shows
>>>>>> this can help
>>>>>> detect the issue?
>>>>> Well, the thing that triggered this was hopefully fixed by
>>>>> 8c2e52ebbe88 ("eventpoll: don't decrement ep refcount while still
>>>>> holding the ep mutex"), but I think Jann figured that one out by code
>>>>> inspection.
>>>>>
>>>>> I doubt it can be triggered in real life without something like
>>>>> Waiman's patch, but *with* Waiman's patch, and commit 8c2e52ebbe88
>>>>> reverted (and obviously with CONFIG_KASAN and CONFIG_DEBUG_MUTEXES
>>>>> enabled), doing lots of concurrent epoll closes would hopefully then
>>>>> trigger the warning.
>>>>>
>>>>> Of course, to then find *other* potential bugs would be the whole
>>>>> point, and some of these kinds of bugs are definitely of the kind
>>>>> where the race condition doesn't actually trigger in any real load,
>>>>> because it's unlikely that real loads end up doing that kind of
>>>>> "release all these objects concurrently".
>>>>>
>>>>> But it might be interesting to try that "can you even recreate the bug
>>>>> fixed by 8c2e52ebbe88" with this. Because if that one *known* bug
>>>>> can't be found by this, then it's obviously unlikely to help find
>>>>> others.
>>>>>
>>>> Yeah, I guess I asked the question because there is no clear link from
>>>> the bug scenario to an extra context switch, that is, even if the
>>>> context switch didn't happen, the bug would trigger if
>>>> __mutex_unlock_slowpath() took too long after giving the ownership to
>>>> someone else. So my instinct was: would cond_resched() be slow enough
>>>> ;-)
>>>>
>>>> But I agree it's a trivel thing to do, and I think another thing we can
>>>> do is adding a kasan_check_byte(lock) at the end of
>>>> __mutex_unlock_slowpath(), because conceptually the mutex should be
>>>> valid throughout the whole __mutex_unlock_slowpath() function, i.e.
>>>>
>>>>      void __mutex_unlock_slowpath(...)
>>>>      {
>>>>          ...
>>>>          raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags,
>>>> &wake_q);
>>>>          // <- conceptually "lock" should still be valid here.
>>>>          // so if anyone free the memory of the mutex, it's going
>>>>          // to be a problem.
>>>>          kasan_check_byte(lock);
>>>>      }
>>>>
>>>> I think this may also give us a good chance of finding more bugs, one of
>>>> the reasons is that raw_spin_unlock_irqrestore_wake() has a
>>>> preempt_enable() at last, which may trigger a context switch.
>>>>
>>>> Regards,
>>>> Boqun
>>> I think this is a good idea. We should extend that to add the check in
>>> rwsem as well. Will a post a patch to do that.
>> Digging into it some more, I think adding kasan_check_byte() may not be
>> necessary. If KASAN is enabled, it will instrument the locking code
>> including __mutex_unlock_slowpath(). I checked the generated assembly code,
>> it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. Adding an
> The point is we want to check the memory at the end of
> __mutex_unlock_slowpath(), so it's an extra checking.

It is likely that the instrumented kasan_check* calls can be invoked 
near the beginning when the lock is first accessed, as I don't see any 
kasan_check*() around the inlined raw_spin_unlock_irqrestore_wake() call.

So if we want a check at the end, we may have to manually add one.

>
> Also since kasan will instrument all memory accesses, what you saw may
> not be the instrument on "lock" but something else, for example,
> wake_q_init() in raw_spin_unlock_irqrestore_wake().

The wake_q memory is from stack which I don't believe the compiler will 
generate kasan_check for that. I also don't see any kasan_check*() call 
when the wake_q is being manipulated.

> Actually, I have 3 extension to the idea:
>
> First it occurs to me that we could just put the kasan_check_byte() at
> the outermost thing, for example, mutex_unlock().
>
> Second I wonder whether kasan has a way to tag a pointer parameter of a
> function, for example for mutex_unlock():
>
> 	void mutex_unlock(struct mutex * __ref lock)
> 	{
> 		...
> 	}
>
> a kasan_check_byte(lock) will auto generate whenever the function
> returns.
>
> I actually tried to use __cleanup to implement __ref, like
>
> 	#define __ref __cleanup(kasan_check_byte)
>
> but seems the "cleanup" attritube doesn't work on function parameters ;(
>
> Third, I went to implement a always_alive():
>
> 	#define always_alive(ptr)                                                              \
> 	       typeof(ptr) __UNIQUE_ID(always_alive_guard) __cleanup(kasan_check_byte) = ptr;
>
> and you can use in mutex_unlock():
>
> 	void mutex_unlock(struct mutex *lock)
> 	{
> 		always_alive(lock);
> 		...
> 	}
>
> This also guarantee we emit a kasan_check_byte() at the very end.

Adding a kasan_check_byte() test at the end of unlock is a locking 
specific problem that we don't have that many instances where a check is 
needed. So it may not be worth the effort to devise a special mechanism 
just for that. Adding a simple macro to abstract it may be enough. 
Anyway, it is your call.

Cheers,
Longman