linux-kernel - Re: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHmME9rqG56i+TfOhY-yt52XZrFRRTv0Vwr27qn5yL0=OpQa-A@mail.gmail.com>
Date:   Wed, 22 Apr 2020 14:17:28 -0600
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Ard Biesheuvel <ardb@...nel.org>
Cc:     Eric Biggers <ebiggers@...nel.org>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to
 PAGE_SIZE chunks

On Wed, Apr 22, 2020 at 1:51 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> On Wed, Apr 22, 2020 at 1:39 AM Ard Biesheuvel <ardb@...nel.org> wrote:
> >
> > On Wed, 22 Apr 2020 at 09:32, Jason A. Donenfeld <Jason@...c4.com> wrote:
> > >
> > > On Tue, Apr 21, 2020 at 10:04 PM Eric Biggers <ebiggers@...nel.org> wrote:
> > > > Seems this should just be a 'while' loop?
> > > >
> > > >         while (bytes) {
> > > >                 unsigned int todo = min_t(unsigned int, PAGE_SIZE, bytes);
> > > >
> > > >                 kernel_neon_begin();
> > > >                 chacha_doneon(state, dst, src, todo, nrounds);
> > > >                 kernel_neon_end();
> > > >
> > > >                 bytes -= todo;
> > > >                 src += todo;
> > > >                 dst += todo;
> > > >         }
> > >
> > > The for(;;) is how it's done elsewhere in the kernel (that this patch
> > > doesn't touch), because then we can break out of the loop before
> > > having to increment src and dst unnecessarily. Likely a pointless
> > > optimization as probably the compiler can figure out how to avoid
> > > that. But maybe it can't. If you have a strong preference, I can
> > > reactor everything to use `while (bytes)`, but if you don't care,
> > > let's keep this as-is. Opinion?
> > >
> >
> > Since we're bikeshedding, I'd prefer 'do { } while (bytes);' here,
> > given that bytes is guaranteed to be non-zero before we enter the
> > loop. But in any case, I'd prefer avoiding for(;;) or while(1) where
> > we can.
>
> Okay, will do-while it up for v2.

I just sent v2 containing do-while, and I'm fine with that going in
that way. But just in the interest of curiosity in the pan-tone
palette, check this out:

https://godbolt.org/z/VxXien

It looks like on mine, the compiler avoids unnecessarily calling those
adds on the last iteration, but on the other hand, it results in an
otherwise unnecessary unconditional jump for the < 4096 case. Sort of
interesting. Arm64 code is more or less the same difference too.