lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <cover.1308259496.git.jeremy.fitzhardinge@citrix.com>
Date:	Thu, 16 Jun 2011 14:40:47 -0700
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...e.hu>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Nick Piggin <npiggin@...nel.dk>,
	Jeremy Fitzhardinge <jeremy.fitzhardinge@...rix.com>
Subject: [PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code

From: Jeremy Fitzhardinge <jeremy.fitzhardinge@...rix.com>

Hi all,

I'm proposing this series for 3[.0].1.

This is a repost of a series to clean up the x86 ticket lock
code by converting it to a mostly C implementation and removing
lots of duplicate code relating to the ticket size.

The last time I posted this series, the only significant comments
were from Nick Piggin, specifically relating to:

 1. A wrongly placed barrier on unlock (which may have allowed the
    compiler to move things out of the locked region.  I went
    belt-and-suspenders by having two barriers to prevent motion
    into or out of the locked region.

 2. With NR_CPUS < 256 the ticket size is 8 bits.  The compiler doesn't
    use the same trick as the hand-coded asm to directly compare the high
    and low bytes in the word, but does a bit of extra shuffling around.
    However, the Intel optimisation guide and several x86 experts have
    opined that its best to avoid the high-byte operations anyway, since
    they will cause a partial word stall, and the gcc-generated code should
    be better.

    Overall the compiler-generated code is very similar to the hand-coded
    versions, with the partial byte operations being the only significant
    difference. (Curiously, gcc does generate a high-byte compare for me
    in trylock, so it can if it wants to.)

I've been running with this code in place for several months on 4 core
systems without any problems.

I couldn't measure a consistent performance difference between the two
implemenations; there seemed to be +/- ~1% +/-, which is the level of
variation I see from simply recompiling the kernel with slightly
different code alignment.

Overall, I think the large reduction in code size is a big win.

Thanks,
	J

Jeremy Fitzhardinge (7):
  x86/ticketlock: clean up types and accessors
  x86/ticketlock: convert spin loop to C
  x86/ticketlock: Use C for __ticket_spin_unlock
  x86/ticketlock: make large and small ticket versions of spin_lock the
    same
  x86/ticketlock: make __ticket_spin_lock common
  x86/ticketlock: make __ticket_spin_trylock common
  x86/ticketlock: prevent memory accesses from reordered out of lock
    region

 arch/x86/include/asm/spinlock.h       |  147 ++++++++++++---------------------
 arch/x86/include/asm/spinlock_types.h |   22 +++++-
 2 files changed, 74 insertions(+), 95 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ