lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimjiwxC8ryiLpmd=jCjBD62ZZ0G5A@mail.gmail.com>
Date:	Fri, 8 Apr 2011 13:59:29 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, x86@...nel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: [RFT/PATCH v2 2/6] x86-64: Optimize vread_tsc's barriers

On Thu, Apr 7, 2011 at 5:26 PM, Andrew Lutomirski <luto@....edu> wrote:
> On Thu, Apr 7, 2011 at 2:30 PM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>> On Thu, Apr 7, 2011 at 11:15 AM, Andi Kleen <andi@...stfloor.org> wrote:
>>>
>>> I would prefer to be safe than sorry.
>>
>> There's a difference between "safe" and "making up theoretical
>> arguments for the sake of an argument".
>>
>> If Intel _documented_ the "barriers on each side", I think you'd have a point.
>>
>> As it is, we're not doing the "safe" thing, we're doing the "extra
>> crap that costs us and nobody has ever shown is actually worth it".
>
> Speaking as both a userspace programmer who wants to use clock_gettime
> and as the sucker who has to test this thing, I'd like to agree on
> what clock_gettime is *supposed* to do.  I propose:
>
> For the purposes of ordering, clock_gettime acts as though there is a
> volatile variable that contains the time and is kept up-to-date by
> some thread.  clock_gettime reads that variable.  This means that
> clock_gettime is not a barrier but is ordered at least as strongly* as
> a read to a volatile variable.  If code that calls clock_gettime needs
> stronger ordering, it should add additional barriers as appropriate.
>
> * Modulo errata, BIOS bugs, implementation bugs, etc.

As far as I can tell, on Sandy Bridge and Bloomfield, I can't get the
sequence lfence;rdtsc to violate the rule above.  That the case even
if I stick random arithmetic and branches right before the lfence.  If
I remove the lfence, though, it starts to fail.  (This is without the
evil fake barrier.)

However, as expected, I can see stores getting reordered after
lfence;rdtsc and rdtscp but not mfence;rdtsc.

So... do you think that the rule is sensible?

I'll post the test case somewhere when it's a little less ugly.  I'd
like to see test results on AMD.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ