linux-kernel - Re: [patch][rfc] rwsem: generic rwsem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061208022259.GB11551@wotan.suse.de>
Date:	Fri, 8 Dec 2006 03:22:59 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	David Howells <dhowells@...hat.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-arch@...r.kernel.org, Linus Torvalds <torvalds@...l.org>,
	Andrew Morton <akpm@...l.org>
Subject: Re: [patch][rfc] rwsem: generic rwsem

On Tue, Dec 05, 2006 at 08:36:22PM +0000, David Howells wrote:
> Nick Piggin <npiggin@...e.de> wrote:
> 
> > Either approach works, and one is better than the current two approaches.
> 
> >From one point of view that's true.  But from other points of view, it isn't.
> 
> > > have be implemented by spinlock on some archs, and so your approach is
> > > really not optimal in such cases.  Not all archs have cmpxchg or any way
> > > of doing an
> > 
> > We do lots of things that aren't optimal for everyone.
> 
> In this case it can be particularly suboptimal.  The approach you've chosen to
> generalise on is specifically for XADD-based rwsems.  XADD is only available
> on i386, x86_64 and ia64 (FETCHADD), and of those only i386 and ia64 actually
> use the XADD-optimised rwsems.  Andi wants x86_64 to use the spinlock version.
> 
> Everywhere else, XADD is emulated.
> 
> If you have to emulate XADD by using CMPXCHG or LL/SC equivalents, then
> there're actually better ways of doing things than either of these approaches
> that the kernel currently has.  I just need to find the time to explore them.
> 
> Look at how the counter works in the XADD-based version.  That's the way it is
> *because* I'm using XADD.  That's quite limiting.

Not really. ll/sc architectures "emulate" xadd the same as they would
emulate a spinlock. There is nothing suboptimal about it.

Look at alpha or powerpc, for example.

> > If it hurts, we can special case it. This really doesn't, considering the
> > rarity of rwsems.
> 
> If you're willing to special case it, then what's the point in generalising?

Because I don't like the way it is currently done.

> And in case you're wondering, FS-Cache uses a _lot_ of rwsem operations, so I
> do have an interest in making it quick, and that includes undoing what Ingo
> did when it's not necessary.

And I bet you couldn't measure a difference, even on FRV.

> > I honestly couldn't see why you went on this tangent last time and I
> > can't see your point now.  If FRV only has an SMP safe xchg, then it
> > will need to implement SMP atomics with spinlocks anyway, to get things
> > like add_return. 
> 
> That's _exactly_ my point.
> 
> Think!  If you _require_ the use of atomic ops in implementing rwsems, then
> you introduce _extra_, _general_ spinlocks into the XADD-based algorithm
> everywhere you call an atomic_xxx() op that has a side effect.

No, only where you also call down into the contended path. I never
claimed this is optimal for atomics-with-spinlocks architectures, and
I know how they are going to be impacted.

However, it is not 100% clear that they will be slower. For starters,
you are using 2 spinlocks, so if there is a lot of contention, you
can have concurrent down_read()s go through while the rwsem spinlock
is held for a long time while readers are being worken off the wait
list.

> And if you're going to have to be getting general spinlocks then the
> XADD-based rwsem really is worse by quite a bit than the current
> spinlock-based rwsem as the latter _doesn't_ need general spinlocks as it
> doesn't use atomic ops.

It is mostly worse in the contended case, in that it needs to take 2
spinlocks where 1 would do. But for *all* other SMP architectures (ie.
all - sparc32 - parisc) spinlock design is slower when there is
contention. I don't know how you could be advocating the spinlock
version on this basis.

FYI, x86-64 uses spinlocks because of the complexity and task limit
of the xadd algorithm, not because it was measured to be any faster.

> > And this is relevant how? Why do you imagine that atomic_cmpxchg
> > is so much harder than atomic_add_unless, atomic_add_return (or
> > even atomic_add) on an architecture like this?
> 
> It isn't, but atomic_xxx() will _have_ to be implemented with spinlocks.
> 
> My point here was that last time I believe you said that FRV managed to
> implement atomic ops without spinlocks, and so the case wasn't worth
> considering.  My points are that (a) FRV would have to to do SMP, and (b) we
> have another arch lurking in the wings that does have to use spinlocks because
> I can't do the clever tricks there that I can on FRV.

So let's take another look when these are in the kernel.

> > Nor did I understand this objection last time. You must have just read the
> > description, rather than the code. I *do* use xadd on x86 for down read (ie.
> > atomic_add_return)
> 
> I missed the atomic_add_return(), so I'll forgive you that bit.

You mean I'll forgive you ;) I've been trying to tell you this.

> > And the place where I use cmpxchg *already* uses cmpxchg on i386. Ditto for
> > alpha and powerpc (which are the ones implementing their own rwsem.h of your
> > above list).
> > 
> > Actually all these architectures will generate basically identical code
> > (ignoring ool vs inline). So will all the UP architectures that supress
> > interrupts for atomic ops.
> 
> Alpha and powerpc emulate XADD, so see above.

They *implement* atomic_add_return.

> > Which is why I'd prefer not to use it. The current rwsem stuff is much
> > worse though, which is why I'd consider it.
> 
> No, it isn't.  It's much more straightforward; or at least it was until Ingo
> decided to rearrange things.

Well I'm working with the mainline kernel.

Anyway, IMO it is cleaner to have down_read be a generic function which calls
into an arch helper to perform some simple procedure, rather than have
any architecture able implement a completely different algorithm.

> > 2 designs, 8 or so implementations (some subtly different) implemented in
> > arch specific asm.
> 
> Yes, and?

Ah, that explains why you think this is pointless: you don't think that is
a problem. OK... if you think it is not then I won't be able to convince
you otherwise.

> > I moved to the implementation on the most widely used and tested arch,
> > as well as the ones which have the most parallelism and weakly ordered
> > memory (i386, powerpc, ia64). I figure this is likely to be adequately
> > scalable and have less chance of being buggy.
> 
> The spinlock-based approach is easier to verify and has been tested on more
> archs than the other.  I don't think either of them is more buggy than the
> other.  The spinlock-based rwsems are also used on i386 under some
> circumstances.

On UP architectures, maybe. That's not too interesting.

> > > Actually, the code isn't anywhere near as difficult to read as the mutex
> > > code or the spinlock code.  The fastcall "junk" is quite important wrt the
> > > i386 code and permitted a small speedup (though if we compile with
> > > regparms=3 nowadays, then the advantage is actually bypassed).
> > 
> > No objections, then?
> 
> WRONG!
> 
> Remove the fastcall and the i386 rwsems will cease to function if
> CONFIG_REGPARM=n.

WRONG! Because they aren't called by assembly anymore. But if it means
that much to you, the fastcalls can stay.

> > Of course the code is slower when it is in L0 icache. I'm sure you know
> > that smaller code is the whole point...
> 
> Not necessarily.  Faster code may also be the whole point; it depends on what

Do you have to argue everything? I mean the whole point of OOLing spinlocks
and mutexes is to generate smaller code (with associated benefits like
reuse of branch predictor).

Anyway, this is not a big deal. If you want to continue arguing the point,
we can if we get past the other points.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/