lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 9 Mar 2016 08:08:18 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Tom Herbert <tom@...bertland.com>
Cc:	Joe Perches <joe@...ches.com>,
	Alexander Duyck <aduyck@...antis.com>,
	Netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>
Subject: Re: [net-next PATCH] csum: Update csum_block_add to use rotate
 instead of byteswap

On Tue, Mar 8, 2016 at 10:31 PM, Tom Herbert <tom@...bertland.com> wrote:
> On Tue, Mar 8, 2016 at 10:08 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Tue, Mar 8, 2016 at 9:50 PM, Joe Perches <joe@...ches.com> wrote:
>>> On Tue, 2016-03-08 at 21:23 -0800, Alexander Duyck wrote:
>>>> On Tue, Mar 8, 2016 at 3:25 PM, Joe Perches <joe@...ches.com> wrote:
>>>> > On Tue, 2016-03-08 at 14:42 -0800, Alexander Duyck wrote:
>>>> > > The code for csum_block_add was doing a funky byteswap to swap the even and
>>>> > > odd bytes of the checksum if the offset was odd.  Instead of doing this we
>>>> > > can save ourselves some trouble and just shift by 8 as this should have the
>>>> > > same effect in terms of the final checksum value and only requires one
>>>> > > instruction.
>>>> > 3 instructions?
>>>> I was talking about just the one ror vs mov, shl, shr, and ,and, add.
>>>>
>>>> I assume when you say 3 you are including the test and either some
>>>> form of conditional move or jump?
>>>
>>> Yeah, instruction count also depends on architecture (arm/x86/ppc...)
>>
>> Right.  But the general idea is that rotate is an instruction most
>> architectures have.  I haven't heard of an instruction that swaps even
>> and odd bytes of a 32 bit word.
>>
> Yes, I took a look inlining these.
>
> #define rol32(V, X) ({                          \
>         int word = V;                           \
>         if (__builtin_constant_p(X))            \
>                 asm("roll $" #X ",%[word]\n\t"  \
>                     : [word] "=r" (word));      \
>         else                                    \
>                 asm("roll %%cl,%[word]\n\t"     \
>                     : [word] "=r" (word)        \
>                     : "c" (X));                 \
>         word;                                   \
> })
>
> With this I'm seeing a nice speedup in jhash which uses a lot of rol32s...

Is gcc really not converting the rol32 calls into rotates?

If we need this type of code in order to get the rotates to occur as
expected then maybe we need to look at doing arch specific versions of
the functions in bitops.h in order to improve the performance since I
know these calls are used in some performance critical paths such as
crypto and hashing.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ