linux-kernel - [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1372902738-30693-1-git-send-email-Waiman.Long@hp.com>
Date:	Wed,  3 Jul 2013 21:52:14 -0400
From:	Waiman Long <Waiman.Long@...com>
To:	Alexander Viro <viro@...iv.linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>
Cc:	Waiman Long <Waiman.Long@...com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>
Subject: [PATCH 0/4] seqlock: Add new blocking reader type & use rwlock

During some perf-record sessions of the kernel running the high_systime
workload of the AIM7 benchmark, it was found that quite a large portion
of the spinlock contention was due to the perf_event_mmap_event()
function itself. This perf kernel function calls d_path() which,
in turn, call path_get() and dput() indirectly. These 3 functions
were the hottest functions shown in the perf-report output of
the _raw_spin_lock() function in an 8-socket system with 80 cores
(hyperthreading off) with a 3.10-rc1 kernel:

-  13.91%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
   - _raw_spin_lock
      + 35.54% path_get
      + 34.85% dput
      + 19.49% d_path

In fact, the output of the "perf record -s -a" (without call-graph)
showed:

 13.37%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  7.61%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  3.54%            true  [kernel.kallsyms]     [k] _raw_spin_lock

Without using the perf monitoring tool, the actual execution profile
will be quite different. In fact, with this patch set and my other
lockless reference count update patch applied, the output of the same
"perf record -s -a" command became:

  2.82%           reaim  [kernel.kallsyms]     [k] _raw_spin_lock
  1.11%              ls  [kernel.kallsyms]     [k] _raw_spin_lock
  0.26%            true  [kernel.kallsyms]     [k] _raw_spin_lock

So the time spent on _raw_spin_lock() function went down from 24.52%
to 4.19%. It can be seen that the performance data collected by the
perf-record command can be heavily skewed in some cases on a system
with a large number of CPUs. This set of patches enables the perf
command to give a more accurate and reliable picture of what is really
happening in the system. At the same time, they can also improve the
general performance of systems especially those with a large number
of CPUs.

The d_path() function takes the following two locks:

1. dentry->d_lock [spinlock] from dget()/dget_parent()/dput()
2. rename_lock    [seqlock]  from d_path()

This set of patches address the rename_lock bottleneck by changing the
way seqlock is implemented so that we can optionally use a read/write
lock as the underlying implementation instead of the default spinlock.

Incidentally, this patch also provides slight 5% performance boost
over just the the lockless reference count update patch in the short
workload of the AIM7 benchmark running on a 8-socket 80-core DL980
machine on a 3.10-based kernel. There were still a few percentage
points of contention in d_path() and getcwd syscall left due to their
use of the rename_lock.

Signed-off-by: Waiman Long <Waiman.Long@...com>

Waiman Long (4):
  seqlock: Add a new blocking reader type
  dcache: Use blocking reader seqlock when protected data are not
    changed
  seqlock: Allow the use of rwlock in seqlock
  dcache: Use rwlock as the underlying lock in rename_lock

 fs/dcache.c             |   28 ++++----
 include/linux/seqlock.h |  167 ++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 158 insertions(+), 37 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/