lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080605042908.GC11811@wotan.suse.de>
Date:	Thu, 5 Jun 2008 06:29:08 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Ingo Molnar <mingo@...e.hu>, David Howells <dhowells@...hat.com>,
	Ulrich Drepper <drepper@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 0/3] 64-bit futexes: Intro

On Wed, Jun 04, 2008 at 08:08:37PM -0700, Linus Torvalds wrote:
> 
> 
> On Thu, 5 Jun 2008, Nick Piggin wrote:
> > 
> > I'd have thought that for a case like this, you'd simply hit the store
> > alias logic and store forwarding would stall because it doesn't have
> > the full data.
> 
> That's _one_ possible implementation. 
> 
> Quite frankly, I think it's the less likely one. It's much more likely 
> that the cache read access and the store buffer probe happen in parallel 
> (this is a really important hotpath for any CPU, but even more so x86 
> where there are more of loads and stores that are spills). And then the 
> store buffer logic would return the data and a bytemask mask (where the 
> mask would be all zeroes for a miss), and the returned value is just the 
> appropriate mix of the two.
> 
> > I'd like to know for sure.
> 
> You'd have to ask somebody very knowledgeable inside Intel and AMD, and it 
> is quite likely that different microarchitectures have different 
> approaches...

Well, it would just be nice to hear a "no we'll never do that", "we
already do", or "you can't rely on it" ;)

 
> > The other thing that could be possible, and I'd imagine maybe more likely
> > to be implemented in a real CPU because it should give more imrpovement
> > (and which does break my algorithm) is just that the load to the cacheline
> > may get to execute first, then if the cacheline gets evicted and
> > modified by another CPU before our store completes, we effectively see
> > store/load reordering again.
> 
> Oh, absolutely, the perfect algorithm would actually get the right answer 
> and notice that the cacheline got evicted, and retried the whole sequence 
> such that it is coherent. 
> 
> But we do know that Intel expressly documents loads and stores to pass 
> each other and documents the fact that the store buffer is there. So I bet 
> that this is visible in some micro-architecture, even if it's not 
> necessarily visible in _all_ of them.
> 
> The recent Intel memory ordering whitepaper makes it very clear that loads 
> can pass earlier stores and in particular that the store buffer allows 
> intra-processor forwarding to subsequent loads (2.4 in their whitepaper). 
> It _could_ be just a "for future CPU's", but quite frankly, I'm 100% sure 
> it isn't. The store->load forwarding is such a critical performance issue 
> that I can pretty much guarantee that it doesn't always hit the cacheline.

Well I have a simple test case to show loads pass earlier non conflicting
stores in the case that loads do not come from the store buffer (ie.
*inter* processor forwarding).

And store forwarding, by definition means that the load can complete before
the store can possibly be visible to another CPU I'd say. So yes, I'm
sure this does happen too.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ