[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+fCnZfaCGhZiHPm1wRMLv7oPsvZ-_dvR33mgYEtLY_ss+g4DQ@mail.gmail.com>
Date: Mon, 17 Feb 2025 17:53:06 +0100
From: Andrey Konovalov <andreyknvl@...il.com>
To: Waiman Long <longman@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>, Boqun Feng <boqun.feng@...il.com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>, Alexander Potapenko <glider@...gle.com>,
Dmitry Vyukov <dvyukov@...gle.com>, Vincenzo Frascino <vincenzo.frascino@....com>,
Marco Elver <elver@...gle.com>, linux-kernel@...r.kernel.org, kasan-dev@...glegroups.com
Subject: Re: [PATCH v4 3/4] locking/lockdep: Disable KASAN instrumentation of lockdep.c
On Thu, Feb 13, 2025 at 9:02 PM Waiman Long <longman@...hat.com> wrote:
>
> Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
> Each of them can significantly slow down the speed of a debug kernel.
> Enabling KASAN instrumentation of the LOCKDEP code will further slow
> thing down.
>
> Since LOCKDEP is a high overhead debugging tool, it will never get
> enabled in a production kernel. The LOCKDEP code is also pretty mature
> and is unlikely to get major changes. There is also a possibility of
> recursion similar to KCSAN.
>
> To evaluate the performance impact of disabling KASAN instrumentation
> of lockdep.c, the time to do a parallel build of the Linux defconfig
> kernel was used as the benchmark. Two x86-64 systems (Skylake & Zen 2)
> and an arm64 system were used as test beds. Two sets of non-RT and RT
> kernels with similar configurations except mainly CONFIG_PREEMPT_RT
> were used for evaulation.
>
> For the Skylake system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 0m47.642s 4m19.811s
>
> [CONFIG_KASAN_INLINE=y]
> Debug kernel 2m11.108s (x2.8) 38m20.467s (x8.9)
> Debug kernel (patched) 1m49.602s (x2.3) 31m28.501s (x7.3)
> Debug kernel
> (patched + mitigations=off) 1m30.988s (x1.9) 26m41.993s (x6.2)
>
> RT kernel (baseline) 0m54.871s 7m15.340s
>
> [CONFIG_KASAN_INLINE=n]
> RT debug kernel 6m07.151s (x6.7) 135m47.428s (x18.7)
> RT debug kernel (patched) 3m42.434s (x4.1) 74m51.636s (x10.3)
> RT debug kernel
> (patched + mitigations=off) 2m40.383s (x2.9) 57m54.369s (x8.0)
>
> [CONFIG_KASAN_INLINE=y]
> RT debug kernel 3m22.155s (x3.7) 77m53.018s (x10.7)
> RT debug kernel (patched) 2m36.700s (x2.9) 54m31.195s (x7.5)
> RT debug kernel
> (patched + mitigations=off) 2m06.110s (x2.3) 45m49.493s (x6.3)
>
> For the Zen 2 system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 1m42.806s 39m48.714s
>
> [CONFIG_KASAN_INLINE=y]
> Debug kernel 4m04.524s (x2.4) 125m35.904s (x3.2)
> Debug kernel (patched) 3m56.241s (x2.3) 127m22.378s (x3.2)
> Debug kernel
> (patched + mitigations=off) 2m38.157s (x1.5) 92m35.680s (x2.3)
>
> RT kernel (baseline) 1m51.500s 14m56.322s
>
> [CONFIG_KASAN_INLINE=n]
> RT debug kernel 16m04.962s (x8.7) 244m36.463s (x16.4)
> RT debug kernel (patched) 9m09.073s (x4.9) 129m28.439s (x8.7)
> RT debug kernel
> (patched + mitigations=off) 3m31.662s (x1.9) 51m01.391s (x3.4)
>
> For the arm64 system:
>
> Kernel Run time Sys time
> ------ -------- --------
> Non-debug kernel (baseline) 1m56.844s 8m47.150s
> Debug kernel 3m54.774s (x2.0) 92m30.098s (x10.5)
> Debug kernel (patched) 3m32.429s (x1.8) 77m40.779s (x8.8)
>
> RT kernel (baseline) 4m01.641s 18m16.777s
>
> [CONFIG_KASAN_INLINE=n]
> RT debug kernel 19m32.977s (x4.9) 304m23.965s (x16.7)
> RT debug kernel (patched) 16m28.354s (x4.1) 234m18.149s (x12.8)
>
> Turning the mitigations off doesn't seems to have any noticeable impact
> on the performance of the arm64 system. So the mitigation=off entries
> aren't included.
>
> For the x86 CPUs, cpu mitigations has a much bigger
> impact on performance, especially the RT debug kernel with
> CONFIG_KASAN_INLINE=n. The SRSO mitigation in Zen 2 has an especially
> big impact on the debug kernel. It is also the majority of the slowdown
> with mitigations on. It is because the patched ret instruction slows
> down function returns. A lot of helper functions that are normally
> compiled out or inlined may become real function calls in the debug
> kernel.
>
> With CONFIG_KASAN_INLINE=n, the KASAN instrumentation inserts a
> lot of __asan_loadX*() and __kasan_check_read() function calls to memory
> access portion of the code. The lockdep's __lock_acquire() function,
> for instance, has 66 __asan_loadX*() and 6 __kasan_check_read() calls
> added with KASAN instrumentation. Of course, the actual numbers may vary
> depending on the compiler used and the exact version of the lockdep code.
>
> With the Skylake test system, the parallel kernel build times reduction
> of the RT debug kernel with this patch are:
>
> CONFIG_KASAN_INLINE=n: -37%
> CONFIG_KASAN_INLINE=y: -22%
>
> The time reduction is less with CONFIG_KASAN_INLINE=y, but it is still
> significant.
>
> Setting CONFIG_KASAN_INLINE=y can result in a significant performance
> improvement. The major drawback is a significant increase in the size
> of kernel text. In the case of vmlinux, its text size increases from
> 45997948 to 67606807. That is a 47% size increase (about 21 Mbytes). The
> size increase of other kernel modules should be similar.
>
> With the newly added rtmutex and lockdep lock events, the relevant
> event counts for the test runs with the Skylake system were:
>
> Event type Debug kernel RT debug kernel
> ---------- ------------ ---------------
> lockdep_acquire 1,968,663,277 5,425,313,953
> rtlock_slowlock - 401,701,156
> rtmutex_slowlock - 139,672
>
> The __lock_acquire() calls in the RT debug kernel are x2.8 times of the
> non-RT debug kernel with the same workload. Since the __lock_acquire()
> function is a big hitter in term of performance slowdown, this makes
> the RT debug kernel much slower than the non-RT one. The average lock
> nesting depth is likely to be higher in the RT debug kernel too leading
> to longer execution time in the __lock_acquire() function.
>
> As the small advantage of enabling KASAN instrumentation to catch
> potential memory access error in the lockdep debugging tool is probably
> not worth the drawback of further slowing down a debug kernel, disable
> KASAN instrumentation in the lockdep code to allow the debug kernels
> to regain some performance back, especially for the RT debug kernels.
>
> Signed-off-by: Waiman Long <longman@...hat.com>
> ---
> kernel/locking/Makefile | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 0db4093d17b8..a114949eeed5 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -5,7 +5,8 @@ KCOV_INSTRUMENT := n
>
> obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> -# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
> +# Avoid recursion lockdep -> sanitizer -> ... -> lockdep & improve performance.
> +KASAN_SANITIZE_lockdep.o := n
> KCSAN_SANITIZE_lockdep.o := n
>
> ifdef CONFIG_FUNCTION_TRACER
> --
> 2.48.1
>
Reviewed-by: Andrey Konovalov <andreyknvl@...il.com>
Powered by blists - more mailing lists