[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aHHipJuS1yyhhkbT@tardis-2.local>
Date: Fri, 11 Jul 2025 21:20:52 -0700
From: Boqun Feng <boqun.feng@...il.com>
To: Waiman Long <llong@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
linux-kernel@...r.kernel.org, Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH] locking/mutex: Add debug code to help catching violation
of mutex lifetime rule
On Fri, Jul 11, 2025 at 11:16:43PM -0400, Waiman Long wrote:
> > > > > I think this may also give us a good chance of finding more bugs, one of
> > > > > the reasons is that raw_spin_unlock_irqrestore_wake() has a
> > > > > preempt_enable() at last, which may trigger a context switch.
> > > > >
> > > > > Regards,
> > > > > Boqun
> > > > I think this is a good idea. We should extend that to add the check in
> > > > rwsem as well. Will a post a patch to do that.
> > > Digging into it some more, I think adding kasan_check_byte() may not be
> > > necessary. If KASAN is enabled, it will instrument the locking code
> > > including __mutex_unlock_slowpath(). I checked the generated assembly code,
> > > it has 2 __kasan_check_read() and 4 __kasan_check_write() calls. Adding an
> > The point is we want to check the memory at the end of
> > __mutex_unlock_slowpath(), so it's an extra checking.
>
> It is likely that the instrumented kasan_check* calls can be invoked near
> the beginning when the lock is first accessed, as I don't see any
> kasan_check*() around the inlined raw_spin_unlock_irqrestore_wake() call.
>
One thing to notice is that __kasan_check_{read,write}() are the
instrument of atomic operations (because compiler cannot genenarte the
instrument for asm), try_cmpxchg() would instrument two
kasan_check_write()s and atomic_long_read() would instrument one
kasan_check_read(), and we have 2 try_cmpxchg() and 2 atomic_long_read()
in __mutex_unlock_slowpath(), that's why you saw 4 kasan_check_write()
and 2 kasan_check_read().
Compiler intrumentation is using function like:
__asan_report_store8_noabort()
__asan_report_load8_noabort()
And I checked the asmebly code of __mutex_unlock_slowpath(), and it
turns out it does instrument the memory operations on stack:
3b1c: 94000000 bl 0x3b1c <__mutex_unlock_slowpath+0x188>
0000000000003b1c: R_AARCH64_CALL26 wake_up_q
3b20: 52800028 mov w8, #0x1 // =1
3b24: f9000be8 str x8, [sp, #0x10]
3b28: 38776b08 ldrb w8, [x24, x23]
3b2c: 34000068 cbz w8, 0x3b38 <__mutex_unlock_slowpath+0x1a4>
3b30: aa1303e0 mov x0, x19
3b34: 94000000 bl 0x3b34 <__mutex_unlock_slowpath+0x1a0>
0000000000003b34: R_AARCH64_CALL26 __asan_report_store8_noabort
3b38: f9000ff4 str x20, [sp, #0x18]
^ this is the "head->lastp = &head->first;" in wake_q_init()
and x19 is sp + 0x18, if you believe me ;-)
> So if we want a check at the end, we may have to manually add one.
But yes, we want to add a conceptually check at the very end, as if "the
mutex must be valid in the whole mutex_unlock() function"
>
> >
> > Also since kasan will instrument all memory accesses, what you saw may
> > not be the instrument on "lock" but something else, for example,
> > wake_q_init() in raw_spin_unlock_irqrestore_wake().
>
> The wake_q memory is from stack which I don't believe the compiler will
> generate kasan_check for that. I also don't see any kasan_check*() call when
> the wake_q is being manipulated.
>
See above.
> > Actually, I have 3 extension to the idea:
> >
> > First it occurs to me that we could just put the kasan_check_byte() at
> > the outermost thing, for example, mutex_unlock().
> >
> > Second I wonder whether kasan has a way to tag a pointer parameter of a
> > function, for example for mutex_unlock():
> >
> > void mutex_unlock(struct mutex * __ref lock)
> > {
> > ...
> > }
> >
> > a kasan_check_byte(lock) will auto generate whenever the function
> > returns.
> >
I'm more curious whether kasan can support this.
> > I actually tried to use __cleanup to implement __ref, like
> >
> > #define __ref __cleanup(kasan_check_byte)
> >
> > but seems the "cleanup" attritube doesn't work on function parameters ;(
> >
> > Third, I went to implement a always_alive():
> >
> > #define always_alive(ptr) \
> > typeof(ptr) __UNIQUE_ID(always_alive_guard) __cleanup(kasan_check_byte) = ptr;
> >
> > and you can use in mutex_unlock():
> >
> > void mutex_unlock(struct mutex *lock)
> > {
> > always_alive(lock);
> > ...
> > }
> >
> > This also guarantee we emit a kasan_check_byte() at the very end.
>
> Adding a kasan_check_byte() test at the end of unlock is a locking specific
> problem that we don't have that many instances where a check is needed. So
> it may not be worth the effort to devise a special mechanism just for that.
> Adding a simple macro to abstract it may be enough. Anyway, it is your call.
>
Well, we do have a lot of cases where a foo() takes a "struct bar *" and
the expectation is the pointer is always valid in foo(), so maybe there
are other usages.
But sure, we can start using always_alive() in lock primitives only at
first.
Regards,
Boqun
> Cheers,
> Longman
>
Powered by blists - more mailing lists