lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHmME9qOZEBNr0yndDWmfGMwuFSsN16n4KCDUSdHCPtCVe+Afw@mail.gmail.com>
Date:   Wed, 19 Sep 2018 04:02:51 +0200
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Eric Biggers <ebiggers@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Samuel Neves <sneves@....uc.pt>,
        Andrew Lutomirski <luto@...nel.org>,
        Jean-Philippe Aumasson <jeanphilippe.aumasson@...il.com>
Subject: Re: [PATCH net-next v5 03/20] zinc: ChaCha20 generic C implementation
 and selftest

On Wed, Sep 19, 2018 at 3:08 AM Eric Biggers <ebiggers@...nel.org> wrote:
> Does this consistently perform as well as an implementation that organizes the
> operations such that the quarterrounds for all columns/diagonals are
> interleaved?  As-is, there are tight dependencies in QUARTER_ROUND() (as well as
> in the existing chacha20_block() in lib/chacha20.c, for that matter), so we're
> heavily depending on the compiler to do the needed interleaving so as to not get
> potentially disastrous performance.  Making it explicit could be a good idea.

It does perform as well, and the compiler outputs good code, even on
older compilers. Notably that's all a single statement (via the comma
operator).

> > +}
> > +
> > +static void chacha20_generic(u8 *out, const u8 *in, u32 len, const u32 key[8],
> > +                          const u32 counter[4])
> > +{
> > +     __le32 buf[CHACHA20_BLOCK_WORDS];
> > +     u32 x[] = {
> > +             EXPAND_32_BYTE_K,
> > +             key[0], key[1], key[2], key[3],
> > +             key[4], key[5], key[6], key[7],
> > +             counter[0], counter[1], counter[2], counter[3]
> > +     };
> > +
> > +     if (out != in)
> > +             memmove(out, in, len);
> > +
> > +     while (len >= CHACHA20_BLOCK_SIZE) {
> > +             chacha20_block_generic(buf, x);
> > +             crypto_xor(out, (u8 *)buf, CHACHA20_BLOCK_SIZE);
> > +             len -= CHACHA20_BLOCK_SIZE;
> > +             out += CHACHA20_BLOCK_SIZE;
> > +     }
> > +     if (len) {
> > +             chacha20_block_generic(buf, x);
> > +             crypto_xor(out, (u8 *)buf, len);
> > +     }
> > +}
>
> If crypto_xor_cpy() is used instead of crypto_xor(), and 'in' is incremented
> along with 'out', then the memmove() is not needed.

Nice idea, thanks. Implemented.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ