lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 7 Apr 2011 07:25:37 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	x86@...nel.org, Thomas Gleixner <tglx@...utronix.de>,
	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org
Subject: Re: [RFT/PATCH v2 3/6] x86-64: Don't generate cmov in vread_tsc

On Thu, Apr 7, 2011 at 3:54 AM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Andy Lutomirski <luto@....edu> wrote:
>
>> vread_tsc checks whether rdtsc returns something less than
>> cycle_last, which is an extremely predictable branch.  GCC likes
>> to generate a cmov anyway, which is several cycles slower than
>> a predicted branch.  This saves a couple of nanoseconds.
>>
>> Signed-off-by: Andy Lutomirski <luto@....edu>
>> ---
>>  arch/x86/kernel/tsc.c |   19 +++++++++++++++----
>>  1 files changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>> index 858c084..69ff619 100644
>> --- a/arch/x86/kernel/tsc.c
>> +++ b/arch/x86/kernel/tsc.c
>> @@ -794,14 +794,25 @@ static cycle_t __vsyscall_fn vread_tsc(void)
>>        */
>>
>>       /*
>> -      * This doesn't multiply 'zero' by anything, which *should*
>> -      * generate nicer code, except that gcc cleverly embeds the
>> -      * dereference into the cmp and the cmovae.  Oh, well.
>> +      * This doesn't multiply 'zero' by anything, which generates
>> +      * very slightly nicer code than multiplying it by 8.
>>        */
>>       last = *( (cycle_t *)
>>                 ((char *)&VVAR(vsyscall_gtod_data).clock.cycle_last + zero) );
>>
>> -     return ret >= last ? ret : last;
>> +     if (likely(ret >= last))
>> +             return ret;
>> +
>> +     /*
>> +      * GCC likes to generate cmov here, but this branch is extremely
>> +      * predictable (it's just a funciton of time and the likely is
>> +      * very likely) and there's a data dependence, so force GCC
>> +      * to generate a branch instead.  I don't barrier() because
>> +      * we don't actually need a barrier, and if this function
>> +      * ever gets inlined it will generate worse code.
>> +      */
>> +     asm volatile ("");
>
> Hm, you have not addressed the review feedback i gave in:
>
>  Message-ID: <20110329061546.GA27398@...e.hu>

I can change that, but if anyone ever inlines this function (and Andi
suggested that as another future optimization), then they'd want to
undo it, because it will generate worse code.  (barrier() has the
unnecessary memory clobber.)

--Andy

>
> Thanks,
>
>        Ingo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ