linux-kernel - Re: [rfc][patch 4a/6] brlock: "fast" brlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 15 Oct 2009 13:05:21 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Nick Piggin <npiggin@...e.de>
Cc:	linux-arch@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Ian Kent <raven@...maw.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, David Miller <davem@...emloft.net>,
	Al Viro <viro@...IV.linux.org.uk>
Subject: Re: [rfc][patch 4a/6] brlock: "fast" brlocks

On Thu, 2009-10-15 at 08:58 +0200, Nick Piggin wrote:
> [Not for merge. Stop reading if you're not interested in locking minutiae.]
> 
> OK, this is untested but I think the theory is right. Basically it is taking
> the idea from Dave M's cool brlock optimisation stuff with one further
> optimisation in that the read locker does not check the spinlock but
> rather we keep another wlocked variable together inthe same cacheline per
> CPU, so the read locker only has to touch one cacheline rather than 2.
> 
> This actually will reduce the number of atomics by 2 per path lookup,
> however we have an smp_mb() there now which is really nasty on some
> architectures (like ia64 and ppc64), and not that nice on x86 either.
> We can probably do something interesting on ia64 and ppc64 so that we
> take advantage of the fact rlocked and wlocked are in the same cacheline
> so cache coherency (rather than memory consistency) should always provide
> a strict ordering there. We still do need an acquire barrier -- but it is
> a much nicer lwsync or st.acq on ppc and ia64.
> 
> But: is the avoidance of the atomic RMW a big win? On x86 cores I've tested
> IIRC mfence is about as costly as a locked instruction which includes the
> mfence...
> 
> So long story short: it might be a small win but it is going to be very
> arch specific and will require arch specific code to do the barriers and
> things. The generic spinlock brlock isn't bad at all, so I'll just post
> this as a curiosity for the time being.
>  

fwiw, I rather like this implementation better, and adding lockdep
annotations to this one shouldn't be hard.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/