lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Mar 2016 22:08:07 -0800
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Joe Perches <joe@...ches.com>
Cc:	Alexander Duyck <aduyck@...antis.com>,
	Netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>
Subject: Re: [net-next PATCH] csum: Update csum_block_add to use rotate
 instead of byteswap

On Tue, Mar 8, 2016 at 9:50 PM, Joe Perches <joe@...ches.com> wrote:
> On Tue, 2016-03-08 at 21:23 -0800, Alexander Duyck wrote:
>> On Tue, Mar 8, 2016 at 3:25 PM, Joe Perches <joe@...ches.com> wrote:
>> > On Tue, 2016-03-08 at 14:42 -0800, Alexander Duyck wrote:
>> > > The code for csum_block_add was doing a funky byteswap to swap the even and
>> > > odd bytes of the checksum if the offset was odd.  Instead of doing this we
>> > > can save ourselves some trouble and just shift by 8 as this should have the
>> > > same effect in terms of the final checksum value and only requires one
>> > > instruction.
>> > 3 instructions?
>> I was talking about just the one ror vs mov, shl, shr, and ,and, add.
>>
>> I assume when you say 3 you are including the test and either some
>> form of conditional move or jump?
>
> Yeah, instruction count also depends on architecture (arm/x86/ppc...)

Right.  But the general idea is that rotate is an instruction most
architectures have.  I haven't heard of an instruction that swaps even
and odd bytes of a 32 bit word.

>> > > diff --git a/include/net/checksum.h b/include/net/checksum.h
> []
>> > > @@ -88,8 +88,10 @@ static inline __wsum
>> > >  csum_block_add(__wsum csum, __wsum csum2, int offset)
>> > >  {
>> > >       u32 sum = (__force u32)csum2;
>> > > -     if (offset&1)
>> > > -             sum = ((sum&0xFF00FF)<<8)+((sum>>8)&0xFF00FF);
>> > > +
>> > > +     if (offset & 1)
>> > > +             sum = (sum << 24) + (sum >> 8);
>> > Maybe use ror32(sum, 8);
>> I was actually thinking I could use something like this.  I didn't
>> realize it was even available.
>
> Now you know: bitops.h
>
>> > or maybe something like:
>> >
>> > {
>> >         u32 sum;
>> >
>> >         /* rotated csum2 of odd offset will be the right checksum */
>> >         if (offset & 1)
>> >                 sum = ror32((__force u32)csum2, 8);
>> >         else
>> >                 sum = (__force u32)csum2;
>> >
>> Any specific reason for breaking it up like this?  It seems like it
>> was easier to just have sum be assigned first and then rotating it if
>> needed.  What is gained by splitting the assignment up over two
>> different calls?
>
> It's only for reader clarity where a comment could be useful.
> The compiler output shouldn't change.

Okay, well I can add a one line comment about aligning to a 16b
boundary for clarity.

- Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ