lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 Sep 2015 08:45:57 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Jaime Arrocha' <jarr@...ercoder.com>,
	'Austin S Hemmelgarn' <ahferroin7@...il.com>,
	Steve Calfee <stevecalfee@...il.com>,
	Eric Curtin <ericcurtin17@...il.com>
CC:	Valentina Manea <valentina.manea.m@...il.com>,
	"shuah.kh@...sung.com" <shuah.kh@...sung.com>,
	USB list <linux-usb@...r.kernel.org>,
	"Kernel development list" <linux-kernel@...r.kernel.org>
Subject: RE: First kernel patch (optimization)

From: Jaime Arrocha
> Sent: 17 September 2015 02:50
..
> One interesting observation I found was that in O0 and O2, it does make
> a call to strlen while in O1 it calculates
> the length of the string using:
> 

You want an 'xor %rcx,%rcx' here.
> repnz scas    %es:(%rdi),%al
> not                %rcx
> sub               $0x2,%rcx
> 
> Why does it do that? Is the code above faster? If yes, why not do it in
> O2 too?

Because 'repnz scasb' is slow, especially on some cpu types.
It may win for -Os on 32 bit systems.
Pentium 4 netburst have about 40 clocks setup for all the 'rep' instructions,
later cpus are better but you might still be talking double figures.
On 64 bit cpu there are much faster ways of detecting a zero byte in a
64 bit word by using shifts and masks - so the function call can be a win.

	David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ