[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHmME9oL-pOWWXXFhJz1vSm5ftnSfmYrquGbH0acapgEL=c4Ew@mail.gmail.com>
Date: Thu, 3 Nov 2016 08:24:57 +0100
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Herbert Xu <herbert@...dor.apana.org.au>
Cc: "David S. Miller" <davem@...emloft.net>,
linux-crypto@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
Martin Willi <martin@...ongswan.org>
Subject: Re: [PATCH] poly1305: generic C can be faster on chips with slow
unaligned access
Hi Herbert,
On Thu, Nov 3, 2016 at 1:49 AM, Herbert Xu <herbert@...dor.apana.org.au> wrote:
> FWIW I'd rather live with a 6% slowdown than having two different
> code paths in the generic code. Anyone who cares about 6% would
> be much better off writing an assembly version of the code.
Please think twice before deciding that the generic C "is allowed to
be slow". It turns out to be used far more often than might be
obvious. For example, crypto is commonly done on the netdev layer --
like the case with mac80211-based drivers. At this layer, the FPU on
x86 isn't always available, depending on the path used. Some
combinations of drivers, packet family, and workload can result in the
generic C being used instead of the vectorized assembly for a massive
percentage of time. So, I think we do have a good motivation for
wanting the generic C to be as fast as possible.
In the particular case of poly1305, these are the only spots where
unaligned accesses take place, and they're rather small, and I think
it's pretty obvious what's happening in the two different cases of
code from a quick glance. This isn't the "two different paths case" in
which there's a significant future-facing maintenance burden.
Jason
Powered by blists - more mailing lists