lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250205012411.1010817-3-longman@redhat.com>
Date: Tue,  4 Feb 2025 20:24:11 -0500
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Will Deacon <will.deacon@....com>,
	Boqun Feng <boqun.feng@...il.com>
Cc: linux-kernel@...r.kernel.org,
	Waiman Long <longman@...hat.com>
Subject: [PATCH v2 2/2] locking/lockdep: Disable KASAN instrumentation of lockdep.c

Both KASAN and LOCKDEP are commonly enabled in building a debug kernel.
Each of them can significantly slow down the speed of a debug kernel.
Enabling KASAN instrumentation of the LOCKDEP code will further slow
thing down.

Since LOCKDEP is a high overhead debugging tool, it will never get
enabled in a production kernel. The LOCKDEP code is also pretty mature
and is unlikely to get major changes. There is also a possibility of
recursion similar to KCSAN. As the small advantage of enabling KASAN
instrumentation to catch potential memory access error is probably not
worth the drawback of further slowing down a debug kernel, disable KASAN
instrumentation to enable a debug kernel to gain some performance back.

With debug kernels with both LOCKDEP and KASAN enabled running on a
2-socket 128-thread x86-64 system and a 80-core arm64 system, the times
(real and sys with the time command) to do a parallel kernel build are
shown below.

  Kernel type			Real Time	Sys Time
  -----------			---------	--------
  x86-64:
  Non-debug kernel		 9m38.528s	 304m17.007s
  Debug kernel before patch	16m38.765s	1086m34.930s
  Debug kernel after patch	16m4.758s	1025m26.335s
  Before/after % change		  -3.4%		   -5.6%

  Non-debug RT kernel		11m32.804s	 121m52.835s
  Debug RT kernel before patch	59m29.618s	1772m30.699s
  Debug RT kernel after patch	37m47.089s	 937m56.856s
  Before/after % change		  -36.5%	  -47.1%

  arm64:
  Debug RT kernel before patch	46m9.385s	676m13.605s
  Debug RT kernel after patcha	33m41.428s	436m3.430s
  Before/after % change		  -27.0%	  -35.5%

It looks like the KASAN instrumentation overhead is less on arm64. While
the performance benefit for non-RT debug kernel is modest, the
performance gain for RT debug kernel is significant.

Looking at the RT kernel locking event data for the x86-64 system, we have

  Event type	    Non-debug   Debug before patch   Debug after patch
  ----------	    ---------   ------------------   -----------------
  rtlock_slowlock   66,593,828    2,868,760,165        2,832,990,386
  rtlock_slow_acq1  43,705,130    2,833,575,907        2,800,928,283
  rtlock_slow_acq2  22,888,698       35,177,418           32,055,592
  rtlock_slow_sleep 22,568,560       29,206,559           27,833,274

  rtmutex_slowlock     468,207          560,080              549,080
  rtmutex_slow_acq1     11,840           67,208               39,353
  rtmutex_slow_block   456,367          492,872              509,727
  rtmutex_slow_sleep   258,071          208,019              220,480

The profile of the debug kernel before and after patch are similar.
Compared with the non-debug kernel, the number of rtlock_slowlock() has
increased significantly by more than 40x. That means the corresponding
wait_lock has to be acquired that many more times with the associated
lockdep overhead. The average lock nesting depth will also be higher.
The non-RT debug kernel doesn't have this extra overhead.

Signed-off-by: Waiman Long <longman@...hat.com>
---
 kernel/locking/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 0db4093d17b8..a114949eeed5 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,7 +5,8 @@ KCOV_INSTRUMENT		:= n
 
 obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
 
-# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep & improve performance.
+KASAN_SANITIZE_lockdep.o := n
 KCSAN_SANITIZE_lockdep.o := n
 
 ifdef CONFIG_FUNCTION_TRACER
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ