[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a6993bbd-ec8a-40e1-9ef2-74f920642188@redhat.com>
Date: Wed, 12 Feb 2025 11:57:28 -0500
From: Waiman Long <llong@...hat.com>
To: Marco Elver <elver@...gle.com>, Boqun Feng <boqun.feng@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>, linux-kernel@...r.kernel.org,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>, Dmitry Vyukov <dvyukov@...gle.com>,
Vincenzo Frascino <vincenzo.frascino@....com>, kasan-dev@...glegroups.com
Subject: Re: [PATCH v3 3/3] locking/lockdep: Disable KASAN instrumentation of
lockdep.c
On 2/12/25 6:30 AM, Marco Elver wrote:
> On Wed, 12 Feb 2025 at 06:57, Boqun Feng <boqun.feng@...il.com> wrote:
>> [Cc KASAN]
>>
>> A Reviewed-by or Acked-by from KASAN would be nice, thanks!
>>
>> Regards,
>> Boqun
>>
>> On Sun, Feb 09, 2025 at 11:26:12PM -0500, Waiman Long wrote:
>>> Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
>>> Each of them can significantly slow down the speed of a debug kernel.
>>> Enabling KASAN instrumentation of the LOCKDEP code will further slow
>>> thing down.
>>>
>>> Since LOCKDEP is a high overhead debugging tool, it will never get
>>> enabled in a production kernel. The LOCKDEP code is also pretty mature
>>> and is unlikely to get major changes. There is also a possibility of
>>> recursion similar to KCSAN.
>>>
>>> To evaluate the performance impact of disabling KASAN instrumentation
>>> of lockdep.c, the time to do a parallel build of the Linux defconfig
>>> kernel was used as the benchmark. Two x86-64 systems (Skylake & Zen 2)
>>> and an arm64 system were used as test beds. Two sets of non-RT and RT
>>> kernels with similar configurations except mainly CONFIG_PREEMPT_RT
>>> were used for evaulation.
>>>
>>> For the Skylake system:
>>>
>>> Kernel Run time Sys time
>>> ------ -------- --------
>>> Non-debug kernel (baseline) 0m47.642s 4m19.811s
>>> Debug kernel 2m11.108s (x2.8) 38m20.467s (x8.9)
>>> Debug kernel (patched) 1m49.602s (x2.3) 31m28.501s (x7.3)
>>> Debug kernel
>>> (patched + mitigations=off) 1m30.988s (x1.9) 26m41.993s (x6.2)
>>>
>>> RT kernel (baseline) 0m54.871s 7m15.340s
>>> RT debug kernel 6m07.151s (x6.7) 135m47.428s (x18.7)
>>> RT debug kernel (patched) 3m42.434s (x4.1) 74m51.636s (x10.3)
>>> RT debug kernel
>>> (patched + mitigations=off) 2m40.383s (x2.9) 57m54.369s (x8.0)
>>>
>>> For the Zen 2 system:
>>>
>>> Kernel Run time Sys time
>>> ------ -------- --------
>>> Non-debug kernel (baseline) 1m42.806s 39m48.714s
>>> Debug kernel 4m04.524s (x2.4) 125m35.904s (x3.2)
>>> Debug kernel (patched) 3m56.241s (x2.3) 127m22.378s (x3.2)
>>> Debug kernel
>>> (patched + mitigations=off) 2m38.157s (x1.5) 92m35.680s (x2.3)
>>>
>>> RT kernel (baseline) 1m51.500s 14m56.322s
>>> RT debug kernel 16m04.962s (x8.7) 244m36.463s (x16.4)
>>> RT debug kernel (patched) 9m09.073s (x4.9) 129m28.439s (x8.7)
>>> RT debug kernel
>>> (patched + mitigations=off) 3m31.662s (x1.9) 51m01.391s (x3.4)
>>>
>>> For the arm64 system:
>>>
>>> Kernel Run time Sys time
>>> ------ -------- --------
>>> Non-debug kernel (baseline) 1m56.844s 8m47.150s
>>> Debug kernel 3m54.774s (x2.0) 92m30.098s (x10.5)
>>> Debug kernel (patched) 3m32.429s (x1.8) 77m40.779s (x8.8)
>>>
>>> RT kernel (baseline) 4m01.641s 18m16.777s
>>> RT debug kernel 19m32.977s (x4.9) 304m23.965s (x16.7)
>>> RT debug kernel (patched) 16m28.354s (x4.1) 234m18.149s (x12.8)
>>>
>>> Turning the mitigations off doesn't seems to have any noticeable impact
>>> on the performance of the arm64 system. So the mitigation=off entries
>>> aren't included.
>>>
>>> For the x86 CPUs, cpu mitigations has a much bigger impact on
>>> performance, especially the RT debug kernel. The SRSO mitigation in
>>> Zen 2 has an especially big impact on the debug kernel. It is also the
>>> majority of the slowdown with mitigations on. It is because the patched
>>> ret instruction slows down function returns. A lot of helper functions
>>> that are normally compiled out or inlined may become real function
>>> calls in the debug kernel. The KASAN instrumentation inserts a lot
>>> of __asan_loadX*() and __kasan_check_read() function calls to memory
>>> access portion of the code. The lockdep's __lock_acquire() function,
>>> for instance, has 66 __asan_loadX*() and 6 __kasan_check_read() calls
>>> added with KASAN instrumentation. Of course, the actual numbers may vary
>>> depending on the compiler used and the exact version of the lockdep code.
> For completeness-sake, we'd also have to compare with
> CONFIG_KASAN_INLINE=y, which gets rid of the __asan_ calls (not the
> explicit __kasan_ checks). But I leave it up to you - I'm aware it
> results in slow-downs, too. ;-)
I just realize that my config file for non-RT debug kernel does have
CONFIG_KASAN_INLINE=y set, though the RT debug kernel does not have
this. For the non-RT debug kernel, the _asan_report_load* functions are
still being called because lockdep.c is very big (> 6k lines of code).
So "call_threshold := 10000" in scripts/Makefile.kasan is probably not
enough for lockdep.c.
>
>>> With the newly added rtmutex and lockdep lock events, the relevant
>>> event counts for the test runs with the Skylake system were:
>>>
>>> Event type Debug kernel RT debug kernel
>>> ---------- ------------ ---------------
>>> lockdep_acquire 1,968,663,277 5,425,313,953
>>> rtlock_slowlock - 401,701,156
>>> rtmutex_slowlock - 139,672
>>>
>>> The __lock_acquire() calls in the RT debug kernel are x2.8 times of the
>>> non-RT debug kernel with the same workload. Since the __lock_acquire()
>>> function is a big hitter in term of performance slowdown, this makes
>>> the RT debug kernel much slower than the non-RT one. The average lock
>>> nesting depth is likely to be higher in the RT debug kernel too leading
>>> to longer execution time in the __lock_acquire() function.
>>>
>>> As the small advantage of enabling KASAN instrumentation to catch
>>> potential memory access error in the lockdep debugging tool is probably
>>> not worth the drawback of further slowing down a debug kernel, disable
>>> KASAN instrumentation in the lockdep code to allow the debug kernels
>>> to regain some performance back, especially for the RT debug kernels.
> It's not about catching a bug in the lockdep code, but rather guard
> against bugs in code that allocated the storage for some
> synchronization object. Since lockdep state is embedded in each
> synchronization object, lockdep checking code may be passed a
> reference to garbage data, e.g. on use-after-free (or even
> out-of-bounds if there's an array of sync objects). In that case, all
> bets are off and lockdep may produce random false reports. Sure the
> system is already in a bad state at that point, but it's going to make
> debugging much harder.
>
> Our approach has always been to ensure that as soon as there's an
> error state detected it's reported as soon as we can, before it
> results in random failure as execution continues (e.g. bad lock
> reports).
>
> To guard against that, I would propose adding carefully placed
> kasan_check_byte() in lockdep code.
Will take a look at that.
Cheers,
Longman
Powered by blists - more mailing lists