netdev - Re: [PATCH] arm64: do_csum: implement accelerated scalar version

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKv+Gu-oH44z16com1+c__7UoipJA-1ZpThKuvTpLdR6kjgyDA@mail.gmail.com>
Date:   Thu, 28 Feb 2019 15:16:33 +0100
From:   Ard Biesheuvel <ard.biesheuvel@...aro.org>
To:     Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Catalin Marinas <catalin.marinas@....com>
Cc:     linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>,
        Will Deacon <will.deacon@....com>,
        Steve Capper <steve.capper@....com>,
        "<netdev@...r.kernel.org>" <netdev@...r.kernel.org>,
        "huanglingyan (A)" <huanglingyan2@...wei.com>
Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version

(+ Catalin)

On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas
<ilias.apalodimas@...aro.org> wrote:
>
> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote:
> > It turns out that the IP checksumming code is still exercised often,
> > even though one might expect that modern NICs with checksum offload
> > have no use for it. However, as Lingyan points out, there are
> > combinations of features where the network stack may still fall back
> > to software checksumming, and so it makes sense to provide an
> > optimized implementation in software as well.
> >
> > So provide an implementation of do_csum() in scalar assembler, which,
> > unlike C, gives direct access to the carry flag, making the code run
> > substantially faster. The routine uses overlapping 64 byte loads for
> > all input size > 64 bytes, in order to reduce the number of branches
> > and improve performance on cores with deep pipelines.
> >
> > On Cortex-A57, this implementation is on par with Lingyan's NEON
> > implementation, and roughly 7x as fast as the generic C code.
> >
> > Cc: "huanglingyan (A)" <huanglingyan2@...wei.com>
> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@...aro.org>
...
>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@...aro.org>

Full patch here

https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/

This was a follow-up to some discussions about Lingyan's NEON code,
CC'ed to netdev@ so people could chime in as to whether we need
accelerated checksumming code in the first place.