[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHmME9qOZEBNr0yndDWmfGMwuFSsN16n4KCDUSdHCPtCVe+Afw@mail.gmail.com>
Date: Wed, 19 Sep 2018 04:02:51 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Netdev <netdev@...r.kernel.org>,
Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Samuel Neves <sneves@....uc.pt>,
Andrew Lutomirski <luto@...nel.org>,
Jean-Philippe Aumasson <jeanphilippe.aumasson@...il.com>
Subject: Re: [PATCH net-next v5 03/20] zinc: ChaCha20 generic C implementation
and selftest
On Wed, Sep 19, 2018 at 3:08 AM Eric Biggers <ebiggers@...nel.org> wrote:
> Does this consistently perform as well as an implementation that organizes the
> operations such that the quarterrounds for all columns/diagonals are
> interleaved? As-is, there are tight dependencies in QUARTER_ROUND() (as well as
> in the existing chacha20_block() in lib/chacha20.c, for that matter), so we're
> heavily depending on the compiler to do the needed interleaving so as to not get
> potentially disastrous performance. Making it explicit could be a good idea.
It does perform as well, and the compiler outputs good code, even on
older compilers. Notably that's all a single statement (via the comma
operator).
> > +}
> > +
> > +static void chacha20_generic(u8 *out, const u8 *in, u32 len, const u32 key[8],
> > + const u32 counter[4])
> > +{
> > + __le32 buf[CHACHA20_BLOCK_WORDS];
> > + u32 x[] = {
> > + EXPAND_32_BYTE_K,
> > + key[0], key[1], key[2], key[3],
> > + key[4], key[5], key[6], key[7],
> > + counter[0], counter[1], counter[2], counter[3]
> > + };
> > +
> > + if (out != in)
> > + memmove(out, in, len);
> > +
> > + while (len >= CHACHA20_BLOCK_SIZE) {
> > + chacha20_block_generic(buf, x);
> > + crypto_xor(out, (u8 *)buf, CHACHA20_BLOCK_SIZE);
> > + len -= CHACHA20_BLOCK_SIZE;
> > + out += CHACHA20_BLOCK_SIZE;
> > + }
> > + if (len) {
> > + chacha20_block_generic(buf, x);
> > + crypto_xor(out, (u8 *)buf, len);
> > + }
> > +}
>
> If crypto_xor_cpy() is used instead of crypto_xor(), and 'in' is incremented
> along with 'out', then the memmove() is not needed.
Nice idea, thanks. Implemented.
Jason
Powered by blists - more mailing lists