lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <OFDEA4FB6B.CBB97DA3-ONC125791E.006E5FDA-C125791E.006F14E4@transmode.se>
Date:	Mon, 3 Oct 2011 22:13:18 +0200
From:	Joakim Tjernlund <joakim.tjernlund@...nsmode.se>
To:	djwong@...ibm.com
Cc:	linux-crypto <linux-crypto@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] crc32c: Implement CRC32c with slicing-by-8 algorithm

"Darrick J. Wong" <djwong@...ibm.com> wrote on 2011/10/03 18:00:36:
>
> On Sat, Oct 01, 2011 at 03:52:00PM +0200, Joakim Tjernlund wrote:
> > "Darrick J. Wong" <djwong@...ibm.com> wrote on 2011/09/30 18:12:23:
> > >
> > > [putting mailing lists on cc]

[SNIP]

> > >
> > > <shrug> I suppose I could make CRC32C_BITS configurable.  What is the hardware
> > > profile of your ppc32 processor?  How much L1D/L2 cache?  slice-by-8 does have
> > > a big cache footprint.  On the other hand it's faster than the slice-by-4
> > > (crc32) and Sarwate (crc32c) code in the kernel, even on old slow 32-bit x86
> > > processors (PII, PIII, P4).
> >
> > It is a low end embedded 333 MHz CPU with only L1 cache. How much faster
> > is slice by 8 than slice by 4 on these old x86 machines?
>
> How much L1 cache?  Or, if you'd rather not give away specifics, has the CPU
> more than 8KB L1 cache?  I'm willing to concede that with little cache the
> added memory pressure could be painful.
>
> As for the old x86 machines, please have a look at:
> http://djwong.org/docs/ext4_metadata_checksums.html#Benchmarking
>
> ~15% faster on a 2GHz Via C7
> ~20% faster on a 2.7GHz P4
> ~25% faster on a 500MHz P3
>
> I vaguely recall it was ~20% faster on a 400MHz P2, but all the kernel.org
> wikis are still down. :(
>
> So I suspect the key factor here is memory hierachy, since all of those systems
> have at least 16K of L1 cache.  Slice by 8 might actually suck on a Pentium
> Proor earlier.  Unfortunately I don't have anything older than a PII...

It is 16KB cache on this CPU. I don't know why it was so much slower. Could be a
gcc thing as gcc does a fairly lame job at optimizing crc32. Still think making this
configurable is a good thing. At least until the verdict is in from other CPUs.

  Jocke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ