lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1430428337-16802-1-git-send-email-Waiman.Long@hp.com>
Date:	Thu, 30 Apr 2015 17:12:15 -0400
From:	Waiman Long <Waiman.Long@...com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Cc:	linux-kernel@...r.kernel.org, Jason Low <jason.low2@...com>,
	Davidlohr Bueso <dave@...olabs.net>,
	Scott J Norton <scott.norton@...com>,
	Douglas Hatch <doug.hatch@...com>,
	Waiman Long <Waiman.Long@...com>
Subject: [PATCH v4 0/2] locking/rwsem: optimize rwsem_wakeup()

 v3->v4:
   - Break out the active writer check into a separate patch and move
     it from __rwsem_do_wake() to rwsem_wake().
   - Use smp_rmb() instead of the incorrect smp_mb__after_atomic() as
     suggested by PeterZ.

 v2->v3:
   - Fix errors in commit log.

 v1->v2:
   - Add a memory barrier before calling spin_trylock for proper memory
     ordering.

This patch set aims to reduce spinlock contention in the wait_lock
due to excessive activity in the rwsem_wake code path. This, in turn,
reduces up_write/up_read latency and improve performance when the
rwsem is heavily contended.

On an 8-socket Westmere-EX server (80 cores, HT off), running AIM7's
high_systime workload (1000 users) on a vanilla 4.0 kernel produced
the following perf profile for spinlock contention:

  9.23%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
              |--97.39%-- rwsem_wake
              |--0.69%-- try_to_wake_up
              |--0.52%-- release_pages
               --1.40%-- [...]

  1.70%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irq
              |--96.61%-- rwsem_down_write_failed
              |--2.03%-- __schedule
              |--0.50%-- run_timer_softirq
               --0.86%-- [...]

Here the contended rwsems are the mmap_sem (mm_struct) and the
i_mmap_rwsem (address_space) with mostly write locking.  With a
patched 4.0 kernel, the perf profile became:

  1.87%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irqsave
              |--87.64%-- rwsem_wake
              |--2.80%-- release_pages
              |--2.56%-- try_to_wake_up
              |--1.10%-- __wake_up
              |--1.06%-- pagevec_lru_move_fn
              |--0.93%-- prepare_to_wait_exclusive
              |--0.71%-- free_pid
              |--0.58%-- get_page_from_freelist
              |--0.57%-- add_device_randomness
               --2.04%-- [...]

  0.80%    reaim  [kernel.kallsyms]    [k] _raw_spin_lock_irq
              |--92.49%-- rwsem_down_write_failed
              |--4.24%-- __schedule
              |--1.37%-- run_timer_softirq
               --1.91%-- [...]

The table below shows the % improvement in throughput (1100-2000 users)
in the various AIM7's workloads:

	Workload	% increase in throughput
	--------	------------------------
	custom			 3.8%
	five-sec		 3.5%
	fserver			 4.1%
	high_systime		22.2%
	shared			 2.1%
	short			10.1%

Waiman Long (2):
  locking/rwsem: reduce spinlock contention in wakeup after
    up_read/up_write
  locking/rwsem: check for active writer before wakeup

 include/linux/osq_lock.h    |    5 +++
 kernel/locking/rwsem-xadd.c |   65 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ