lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 07 Feb 2010 22:50:26 -0500
From:	Michael Breuer <mbreuer@...jas.com>
To:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc:	Mike Galbraith <efault@....de>,
	Arjan van de Ven <arjan@...radead.org>,
	Joerg Roedel <joro@...tes.org>
Subject: Re: x86 - cpu_relax - why nop vs. pause?

On 2/7/2010 4:15 PM, Michael Breuer wrote:
> On 02/07/2010 03:08 PM, Michael Breuer wrote:
>> On 2/7/2010 1:14 PM, Mike Galbraith wrote:
>> ...
>> Case1 - asm volatile("pause" ::: "memory");
>> 0000000000400480 <main>:
>>   400480:    f3 90                    pause
>>   400482:    c3                       retq
>>   400483:    90                       nop
>>
>> ...
>>
>> Case3 - asm volitile("rep;pause" ::: "memory")
>> 0000000000400480 <main>:
>>   400480:    f3 f3 90                 pause
>>   400483:    c3                       retq
>>   400484:    90                       nop
>> _______
>> Note the difference between opcodes case 1 and case 3, and the mess 
>> made by the compiler in case 2.
>>
>> As to benchmarks  - I've checked a few things, no formal or lasting 
>> stuff... but striking at first glance:
>>
>> 1) At idle, perf top shows time spent in _raw_spin_lock dropping from 
>> ~35% to ~25%.
>> 2) Running a media transcode (single core - handbrakecli): frame rate 
>> increased by about 5-10%.
>> 3) During file-intensive operations (#2, above, or copying large 
>> files - ext4 on software raid6) - latencytop shows a decerase on 
>> writing a page to disc from about 120ms to about 90ms.
>>
> Disregard case 2 - was missing -O3. With -O3 or -O2 rep;nop and pause 
> are identical. The interesting case is rep;pause which is different 
> and seems more efficient.
Just to move away from this... totally perplexed, I retested a bit. 
Seems something else had gone wrong causing me to try 'rep;pause' vs. 
'pause'. The resulting opcode is f3 f3 90, as noted above.

I do see what seems to be a small but noticeable performance improvement 
- no idea if there's a downside, and also no idea what f3 f3 90 does vs. 
f3 90. Might be something interesting, or maybe not.
Test scenario:

Boot clean to single user mode. perform tiotest -8 five times.
%cpu is %usr + %sys as reported by tiotest.

Results:
Writes
pause:         1.14 sec; 72.01MB/sec; 322.44%cpu
rep;pause:   1.12 sec; 70.4MB/sec; 311.58%cpu
Random Writes
pause:         3.7 sec; 8.51MB/sec; 66.48%cpu
rep;pause    3.46sec; 9.04MB/sec; 72.34%cpu
Reads
pause:         11557.48MB/sec; 6040.74%cpu
rep;pause    11620.15MB/sec; 5974.90%cpu
Random Reads
pause:          11416.9MB/sec; 5330.50%cpu
rep;pause     11786.99MB/sec; 5118.66%cpu


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ