lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1538511560-10090-1-git-send-email-longman@redhat.com>
Date:   Tue,  2 Oct 2018 16:19:15 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Will Deacon <will.deacon@....com>
Cc:     linux-kernel@...r.kernel.org, Waiman Long <longman@...hat.com>
Subject: [PATCH v2 0/5] locking/lockdep: Improve lockdep performance

 v1->v2:
  - Minor twists to incorporate Ingo's comments.
  - Move class->ops from the lock_class structure to percpu array under
    CONFIG_DEBUG_LOCKDEP. That moves the increased memory consumption
    to CONFIG_DEBUG_LOCKDEP only.

Enabling CONFIG_LOCKDEP and other related debug options will greatly
reduce system performance. This patchset aims to reduce the performance
slowdown caused by the lockdep code.

Patch 1 just removes an inline function that wasn't used.

Patches 2 and 3 are minor twists to optimize the code.

Patch 4 makes class->ops a per-cpu counter and moves the stat counter
under CONFIG_DEBUG_LOCKDEP again.

Patch 5 moves the lock_release() call outside of the lock critical 
section.

Parallel kernel compilation tests (make -j <#cpu>, best of 3 runs)
with gcc8 were performed on 2 different systems:

 1) an 1-socket 22-core 44-thread Skylake system
 2) a 4-socket 72-core 144-thread Broadwell system

Four different kernel variants based on the 4.19-rc5 kernel were used:

 1) non-debug kernel (with minimal debug options enabled)
 2) pre-patch debug kernel  (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 3) post-patch debug kernel (CONFIG_LOCKDEP, !CONFIG_DEBUG_LOCKDEP)
 4) post-patch debug kernel (CONFIG_LOCKDEP,  CONFIG_DEBUG_LOCKDEP)

Note that the debug kernels had more debug options enabled than just
LOCKDEP.

The build times with pre-patch and post-patch debug kernels were:

   System    Kernel 1    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    --------
  1-socket    6m06.0s     8m54.7s     8m34.9s     9m28.1s
  4-socket    4m09.2s     7m36.0s     5m38.8s     6m17.8s

Using the non-debug kernel execution times as the baseline, the % 
runtime increase of the other 3 kernel variants were:

   System    Kernel 2    Kernel 3    Kernel 4
   ------    --------    --------    --------    
  1-socket    +46.1%      +40.7%      +55.2%
  4-socket    +83.0%      +36.0%      +51.6%

Comparing just kernels 2 and 3, the patch reduced the execution times 
by 3.7% and 25.7% for the 1-socket and 4-socket systems respectively.

I think the last 2 patches yield most of the performance improvement.

Waiman Long (5):
  locking/lockdep: Remove add_chain_cache_classes()
  locking/lockdep: Eliminate redundant irqs check in __lock_acquire()
  locking/lockdep: Add a faster path in __lock_release()
  locking/lockdep: Make class->ops a percpu counter
  locking/lockdep: Call lock_release() after releasing the lock

 include/linux/lockdep.h            |   7 +-
 include/linux/rwlock_api_smp.h     |  16 ++--
 include/linux/spinlock_api_smp.h   |   8 +-
 kernel/locking/lockdep.c           | 113 ++++++++---------------------
 kernel/locking/lockdep_internals.h |  23 ++++++
 kernel/locking/lockdep_proc.c      |   2 +-
 6 files changed, 66 insertions(+), 103 deletions(-)

-- 
2.18.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ