lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNPRZNTX2BKufHU16ybfcCvDaJmOSgihP7d0r9bgNZtGaQ@mail.gmail.com>
Date: Mon, 17 Feb 2025 08:00:00 +0100
From: Marco Elver <elver@...gle.com>
To: Waiman Long <longman@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Will Deacon <will.deacon@....com>, Boqun Feng <boqun.feng@...il.com>, 
	Andrey Ryabinin <ryabinin.a.a@...il.com>, Alexander Potapenko <glider@...gle.com>, 
	Andrey Konovalov <andreyknvl@...il.com>, Dmitry Vyukov <dvyukov@...gle.com>, 
	Vincenzo Frascino <vincenzo.frascino@....com>, linux-kernel@...r.kernel.org, 
	kasan-dev@...glegroups.com
Subject: Re: [PATCH v4 3/4] locking/lockdep: Disable KASAN instrumentation of lockdep.c

On Thu, 13 Feb 2025 at 21:02, Waiman Long <longman@...hat.com> wrote:
>
> Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
> Each of them can significantly slow down the speed of a debug kernel.
> Enabling KASAN instrumentation of the LOCKDEP code will further slow
> thing down.
>
> Since LOCKDEP is a high overhead debugging tool, it will never get
> enabled in a production kernel. The LOCKDEP code is also pretty mature
> and is unlikely to get major changes. There is also a possibility of
> recursion similar to KCSAN.
>
> To evaluate the performance impact of disabling KASAN instrumentation
> of lockdep.c, the time to do a parallel build of the Linux defconfig
> kernel was used as the benchmark. Two x86-64 systems (Skylake & Zen 2)
> and an arm64 system were used as test beds. Two sets of non-RT and RT
> kernels with similar configurations except mainly CONFIG_PREEMPT_RT
> were used for evaulation.
>
> For the Skylake system:
>
>   Kernel                        Run time            Sys time
>   ------                        --------            --------
>   Non-debug kernel (baseline)   0m47.642s             4m19.811s
>
>   [CONFIG_KASAN_INLINE=y]
>   Debug kernel                  2m11.108s (x2.8)     38m20.467s (x8.9)
>   Debug kernel (patched)        1m49.602s (x2.3)     31m28.501s (x7.3)
>   Debug kernel
>   (patched + mitigations=off)   1m30.988s (x1.9)     26m41.993s (x6.2)
>
>   RT kernel (baseline)          0m54.871s             7m15.340s
>
>   [CONFIG_KASAN_INLINE=n]
>   RT debug kernel               6m07.151s (x6.7)    135m47.428s (x18.7)
>   RT debug kernel (patched)     3m42.434s (x4.1)     74m51.636s (x10.3)
>   RT debug kernel
>   (patched + mitigations=off)   2m40.383s (x2.9)     57m54.369s (x8.0)
>
>   [CONFIG_KASAN_INLINE=y]
>   RT debug kernel               3m22.155s (x3.7)     77m53.018s (x10.7)
>   RT debug kernel (patched)     2m36.700s (x2.9)     54m31.195s (x7.5)
>   RT debug kernel
>   (patched + mitigations=off)   2m06.110s (x2.3)     45m49.493s (x6.3)
>
> For the Zen 2 system:
>
>   Kernel                        Run time            Sys time
>   ------                        --------            --------
>   Non-debug kernel (baseline)   1m42.806s            39m48.714s
>
>   [CONFIG_KASAN_INLINE=y]
>   Debug kernel                  4m04.524s (x2.4)    125m35.904s (x3.2)
>   Debug kernel (patched)        3m56.241s (x2.3)    127m22.378s (x3.2)
>   Debug kernel
>   (patched + mitigations=off)   2m38.157s (x1.5)     92m35.680s (x2.3)
>
>   RT kernel (baseline)           1m51.500s           14m56.322s
>
>   [CONFIG_KASAN_INLINE=n]
>   RT debug kernel               16m04.962s (x8.7)   244m36.463s (x16.4)
>   RT debug kernel (patched)      9m09.073s (x4.9)   129m28.439s (x8.7)
>   RT debug kernel
>   (patched + mitigations=off)    3m31.662s (x1.9)    51m01.391s (x3.4)
>
> For the arm64 system:
>
>   Kernel                        Run time            Sys time
>   ------                        --------            --------
>   Non-debug kernel (baseline)   1m56.844s             8m47.150s
>   Debug kernel                  3m54.774s (x2.0)     92m30.098s (x10.5)
>   Debug kernel (patched)        3m32.429s (x1.8)     77m40.779s (x8.8)
>
>   RT kernel (baseline)           4m01.641s           18m16.777s
>
>   [CONFIG_KASAN_INLINE=n]
>   RT debug kernel               19m32.977s (x4.9)   304m23.965s (x16.7)
>   RT debug kernel (patched)     16m28.354s (x4.1)   234m18.149s (x12.8)
>
> Turning the mitigations off doesn't seems to have any noticeable impact
> on the performance of the arm64 system. So the mitigation=off entries
> aren't included.
>
> For the x86 CPUs, cpu mitigations has a much bigger
> impact on performance, especially the RT debug kernel with
> CONFIG_KASAN_INLINE=n. The SRSO mitigation in Zen 2 has an especially
> big impact on the debug kernel. It is also the majority of the slowdown
> with mitigations on. It is because the patched ret instruction slows
> down function returns. A lot of helper functions that are normally
> compiled out or inlined may become real function calls in the debug
> kernel.
>
> With CONFIG_KASAN_INLINE=n, the KASAN instrumentation inserts a
> lot of __asan_loadX*() and __kasan_check_read() function calls to memory
> access portion of the code. The lockdep's __lock_acquire() function,
> for instance, has 66 __asan_loadX*() and 6 __kasan_check_read() calls
> added with KASAN instrumentation. Of course, the actual numbers may vary
> depending on the compiler used and the exact version of the lockdep code.
>
> With the Skylake test system, the parallel kernel build times reduction
> of the RT debug kernel with this patch are:
>
>  CONFIG_KASAN_INLINE=n: -37%
>  CONFIG_KASAN_INLINE=y: -22%
>
> The time reduction is less with CONFIG_KASAN_INLINE=y, but it is still
> significant.
>
> Setting CONFIG_KASAN_INLINE=y can result in a significant performance
> improvement. The major drawback is a significant increase in the size
> of kernel text. In the case of vmlinux, its text size increases from
> 45997948 to 67606807. That is a 47% size increase (about 21 Mbytes). The
> size increase of other kernel modules should be similar.
>
> With the newly added rtmutex and lockdep lock events, the relevant
> event counts for the test runs with the Skylake system were:
>
>   Event type            Debug kernel    RT debug kernel
>   ----------            ------------    ---------------
>   lockdep_acquire       1,968,663,277   5,425,313,953
>   rtlock_slowlock            -            401,701,156
>   rtmutex_slowlock           -                139,672
>
> The __lock_acquire() calls in the RT debug kernel are x2.8 times of the
> non-RT debug kernel with the same workload. Since the __lock_acquire()
> function is a big hitter in term of performance slowdown, this makes
> the RT debug kernel much slower than the non-RT one. The average lock
> nesting depth is likely to be higher in the RT debug kernel too leading
> to longer execution time in the __lock_acquire() function.
>
> As the small advantage of enabling KASAN instrumentation to catch
> potential memory access error in the lockdep debugging tool is probably
> not worth the drawback of further slowing down a debug kernel, disable
> KASAN instrumentation in the lockdep code to allow the debug kernels
> to regain some performance back, especially for the RT debug kernels.
>
> Signed-off-by: Waiman Long <longman@...hat.com>

Reviewed-by: Marco Elver <elver@...gle.com>

> ---
>  kernel/locking/Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 0db4093d17b8..a114949eeed5 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -5,7 +5,8 @@ KCOV_INSTRUMENT         := n
>
>  obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> -# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
> +# Avoid recursion lockdep -> sanitizer -> ... -> lockdep & improve performance.
> +KASAN_SANITIZE_lockdep.o := n
>  KCSAN_SANITIZE_lockdep.o := n
>
>  ifdef CONFIG_FUNCTION_TRACER
> --
> 2.48.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ