lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+rthh8TaYHJyuQ_kDz+DrXe2YK927dedpYTzSenuAhj7a8UNQ@mail.gmail.com>
Date:	Sun, 14 Aug 2011 21:06:43 +0200
From:	Mathias Krause <minipli@...glemail.com>
To:	"Locktyukhin, Maxim" <maxim.locktyukhin@...el.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	"David S. Miller" <davem@...emloft.net>,
	"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andrew Lutomirski <luto@....edu>
Subject: Re: [PATCH v2 2/2] crypto, x86: SSSE3 based SHA1 implementation for x86-64

Hi Max,

2011/8/8 Locktyukhin, Maxim <maxim.locktyukhin@...el.com>:
> I'd like to note that at Intel we very much appreciate Mathias effort to port/integrate this implementation into Linux kernel!
>
>
> $0.02 re tcrypt perf numbers below: I believe something must be terribly broken with the tcrypt measurements
>
> 20 (and more) cycles per byte shown below are not reasonable numbers for SHA-1 - ~6 c/b (as can be seen in some of the results for Core2) is the expected results ... so, while relative improvement seen is sort of consistent, the absolute performance numbers are very much off (and yes Sandy Bridge on AVX code is expected to be faster than Core2/SSSE3 - ~5.2 c/b vs. ~5.8 c/b on the level of the sha1_update() call to me more precise)
>
> this does not affect the proposed patch in any way, it looks like tcrypt's timing problem to me - I'd even venture a guess that it may be due to the use of RDTSC (that gets affected significantly by Turbo/EIST, TSC is isotropic in time but not with the core clock domain, i.e. RDTSC cannot be used to measure core cycles without at least disabling EIST and Turbo, or doing runtime adjustment of actual bus/core clock ratio vs. the standard ratio always used by TSC - I could elaborate more if someone is interested)

I found the Sandy Bridge numbers odd too but suspected, it might be
because of the laptop platform. The SSSE3 numbers on this platform
were slightly lower than the AVX numbers and that for still way off
the ones for the Core2 system. But your explanation fits well, too. It
might be EIST or Turbo mode that tampered with the numbers. Another,
maybe more likely point might be the overhead Andy mentioned.

> thanks again,
> -Max
>

Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ