[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6w4UlCQa_g1OHlN@Mac.home>
Date: Tue, 11 Feb 2025 21:57:38 -0800
From: Boqun Feng <boqun.feng@...il.com>
To: Waiman Long <longman@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>, linux-kernel@...r.kernel.org,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Alexander Potapenko <glider@...gle.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
kasan-dev@...glegroups.com
Subject: Re: [PATCH v3 3/3] locking/lockdep: Disable KASAN instrumentation of
lockdep.c
[Cc KASAN]
A Reviewed-by or Acked-by from KASAN would be nice, thanks!
Regards,
Boqun
On Sun, Feb 09, 2025 at 11:26:12PM -0500, Waiman Long wrote:
> Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
> Each of them can significantly slow down the speed of a debug kernel.
> Enabling KASAN instrumentation of the LOCKDEP code will further slow
> thing down.
>
> Since LOCKDEP is a high overhead debugging tool, it will never get
> enabled in a production kernel. The LOCKDEP code is also pretty mature
> and is unlikely to get major changes. There is also a possibility of
> recursion similar to KCSAN.
>
> To evaluate the performance impact of disabling KASAN instrumentation
> of lockdep.c, the time to do a parallel build of the Linux defconfig
> kernel was used as the benchmark. Two x86-64 systems (Skylake & Zen 2)
> and an arm64 system were used as test beds. Two sets of non-RT and RT
> kernels with similar configurations except mainly CONFIG_PREEMPT_RT
> were used for evaulation.
>
> For the Skylake system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 0m47.642s 4m19.811s
> Debug kernel 2m11.108s (x2.8) 38m20.467s (x8.9)
> Debug kernel (patched) 1m49.602s (x2.3) 31m28.501s (x7.3)
> Debug kernel
> (patched + mitigations=off) 1m30.988s (x1.9) 26m41.993s (x6.2)
>
> RT kernel (baseline) 0m54.871s 7m15.340s
> RT debug kernel 6m07.151s (x6.7) 135m47.428s (x18.7)
> RT debug kernel (patched) 3m42.434s (x4.1) 74m51.636s (x10.3)
> RT debug kernel
> (patched + mitigations=off) 2m40.383s (x2.9) 57m54.369s (x8.0)
>
> For the Zen 2 system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 1m42.806s 39m48.714s
> Debug kernel 4m04.524s (x2.4) 125m35.904s (x3.2)
> Debug kernel (patched) 3m56.241s (x2.3) 127m22.378s (x3.2)
> Debug kernel
> (patched + mitigations=off) 2m38.157s (x1.5) 92m35.680s (x2.3)
>
> RT kernel (baseline) 1m51.500s 14m56.322s
> RT debug kernel 16m04.962s (x8.7) 244m36.463s (x16.4)
> RT debug kernel (patched) 9m09.073s (x4.9) 129m28.439s (x8.7)
> RT debug kernel
> (patched + mitigations=off) 3m31.662s (x1.9) 51m01.391s (x3.4)
>
> For the arm64 system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 1m56.844s 8m47.150s
> Debug kernel 3m54.774s (x2.0) 92m30.098s (x10.5)
> Debug kernel (patched) 3m32.429s (x1.8) 77m40.779s (x8.8)
>
> RT kernel (baseline) 4m01.641s 18m16.777s
> RT debug kernel 19m32.977s (x4.9) 304m23.965s (x16.7)
> RT debug kernel (patched) 16m28.354s (x4.1) 234m18.149s (x12.8)
>
> Turning the mitigations off doesn't seems to have any noticeable impact
> on the performance of the arm64 system. So the mitigation=off entries
> aren't included.
>
> For the x86 CPUs, cpu mitigations has a much bigger impact on
> performance, especially the RT debug kernel. The SRSO mitigation in
> Zen 2 has an especially big impact on the debug kernel. It is also the
> majority of the slowdown with mitigations on. It is because the patched
> ret instruction slows down function returns. A lot of helper functions
> that are normally compiled out or inlined may become real function
> calls in the debug kernel. The KASAN instrumentation inserts a lot
> of __asan_loadX*() and __kasan_check_read() function calls to memory
> access portion of the code. The lockdep's __lock_acquire() function,
> for instance, has 66 __asan_loadX*() and 6 __kasan_check_read() calls
> added with KASAN instrumentation. Of course, the actual numbers may vary
> depending on the compiler used and the exact version of the lockdep code.
>
> With the newly added rtmutex and lockdep lock events, the relevant
> event counts for the test runs with the Skylake system were:
>
> Event type Debug kernel RT debug kernel
> ---------- ------------ ---------------
> lockdep_acquire 1,968,663,277 5,425,313,953
> rtlock_slowlock - 401,701,156
> rtmutex_slowlock - 139,672
>
> The __lock_acquire() calls in the RT debug kernel are x2.8 times of the
> non-RT debug kernel with the same workload. Since the __lock_acquire()
> function is a big hitter in term of performance slowdown, this makes
> the RT debug kernel much slower than the non-RT one. The average lock
> nesting depth is likely to be higher in the RT debug kernel too leading
> to longer execution time in the __lock_acquire() function.
>
> As the small advantage of enabling KASAN instrumentation to catch
> potential memory access error in the lockdep debugging tool is probably
> not worth the drawback of further slowing down a debug kernel, disable
> KASAN instrumentation in the lockdep code to allow the debug kernels
> to regain some performance back, especially for the RT debug kernels.
>
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
> kernel/locking/Makefile | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 0db4093d17b8..a114949eeed5 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -5,7 +5,8 @@ KCOV_INSTRUMENT := n
>
> obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> -# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
> +# Avoid recursion lockdep -> sanitizer -> ... -> lockdep & improve performance.
> +KASAN_SANITIZE_lockdep.o := n
> KCSAN_SANITIZE_lockdep.o := n
>
> ifdef CONFIG_FUNCTION_TRACER
> --
> 2.48.1
>
Powered by blists - more mailing lists