linux-kernel - Re: [PATCH v8 1/5] asm-generic: Improve csum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZTxQHVplimd4tquE@ghost>
Date:   Fri, 27 Oct 2023 17:04:45 -0700
From:   Charlie Jenkins <charlie@...osinc.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Palmer Dabbelt <palmer@...belt.com>,
        Conor Dooley <conor@...nel.org>,
        Samuel Holland <samuel.holland@...ive.com>,
        David Laight <David.Laight@...lab.com>,
        Xiao Wang <xiao.w.wang@...el.com>,
        Evan Green <evan@...osinc.com>,
        linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org,
        Paul Walmsley <paul.walmsley@...ive.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v8 1/5] asm-generic: Improve csum_fold

On Sat, Oct 28, 2023 at 12:10:36AM +0100, Al Viro wrote:
> On Fri, Oct 27, 2023 at 03:43:51PM -0700, Charlie Jenkins wrote:
> >  /*
> >   * computes the checksum of a memory block at buff, length len,
> >   * and adds in "sum" (32-bit)
> > @@ -31,9 +33,7 @@ extern __sum16 ip_fast_csum(const void *iph, unsigned int ihl);
> >  static inline __sum16 csum_fold(__wsum csum)
> >  {
> >  	u32 sum = (__force u32)csum;
> > -	sum = (sum & 0xffff) + (sum >> 16);
> > -	sum = (sum & 0xffff) + (sum >> 16);
> > -	return (__force __sum16)~sum;
> > +	return (__force __sum16)((~sum - ror32(sum, 16)) >> 16);
> >  }
> 
> Will (~(sum + ror32(sum, 16))>>16 produce worse code than that?
> Because at least with recent gcc this will generate the exact thing
> you get from arm inline asm...

Yes that will produce worse code because an out-of-order processor will be able to
leverage that ~sum and ror32(sum, 16) can be computed independently of
each other. There are more strict data dependencies in (~(sum +
ror32(sum, 16))>>16.

- Charlie