[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1538511560-10090-1-git-send-email-longman@redhat.com>
Date: Tue, 2 Oct 2018 16:19:15 -0400
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Will Deacon <will.deacon@....com>
Cc: linux-kernel@...r.kernel.org, Waiman Long <longman@...hat.com>
Subject: [PATCH v2 0/5] locking/lockdep: Improve lockdep performance
v1->v2:
- Minor twists to incorporate Ingo's comments.
- Move class->ops from the lock_class structure to percpu array under
CONFIG_DEBUG_LOCKDEP. That moves the increased memory consumption
to CONFIG_DEBUG_LOCKDEP only.
Enabling CONFIG_LOCKDEP and other related debug options will greatly
reduce system performance. This patchset aims to reduce the performance
slowdown caused by the lockdep code.
Patch 1 just removes an inline function that wasn't used.
Patches 2 and 3 are minor twists to optimize the code.
Patch 4 makes class->ops a per-cpu counter and moves the stat counter
under CONFIG_DEBUG_LOCKDEP again.
Patch 5 moves the lock_release() call outside of the lock critical
section.
Parallel kernel compilation tests (make -j <#cpu>, best of 3 runs)
with gcc8 were performed on 2 different systems:
1) an 1-socket 22-core 44-thread Skylake system
2) a 4-socket 72-core 144-thread Broadwell system
Four different kernel variants based on the 4.19-rc5 kernel were used:
1) non-debug kernel (with minimal debug options enabled)
2) pre-patch debug kernel (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
3) post-patch debug kernel (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
4) post-patch debug kernel (CONFIG_LOCKDEP, CONFIG_DEBUG_LOCKDEP)
Note that the debug kernels had more debug options enabled than just
LOCKDEP.
The build times with pre-patch and post-patch debug kernels were:
System Kernel 1 Kernel 2 Kernel 3 Kernel 4
------ -------- -------- -------- --------
1-socket 6m06.0s 8m54.7s 8m34.9s 9m28.1s
4-socket 4m09.2s 7m36.0s 5m38.8s 6m17.8s
Using the non-debug kernel execution times as the baseline, the %
runtime increase of the other 3 kernel variants were:
System Kernel 2 Kernel 3 Kernel 4
------ -------- -------- --------
1-socket +46.1% +40.7% +55.2%
4-socket +83.0% +36.0% +51.6%
Comparing just kernels 2 and 3, the patch reduced the execution times
by 3.7% and 25.7% for the 1-socket and 4-socket systems respectively.
I think the last 2 patches yield most of the performance improvement.
Waiman Long (5):
locking/lockdep: Remove add_chain_cache_classes()
locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
locking/lockdep: Add a faster path in __lock_release()
locking/lockdep: Make class->ops a percpu counter
locking/lockdep: Call lock_release() after releasing the lock
include/linux/lockdep.h | 7 +-
include/linux/rwlock_api_smp.h | 16 ++--
include/linux/spinlock_api_smp.h | 8 +-
kernel/locking/lockdep.c | 113 ++++++++---------------------
kernel/locking/lockdep_internals.h | 23 ++++++
kernel/locking/lockdep_proc.c | 2 +-
6 files changed, 66 insertions(+), 103 deletions(-)
--
2.18.0
Powered by blists - more mailing lists