lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 01 Jul 2013 07:30:14 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...nel.org>
CC:	Wedson Almeida Filho <wedsonaf@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64

Unconditional branches don't need prediction.  The branch predictor is used for conditional branches and in some hardware designs for indirect branches.  Unconditional direct branches never go through the branch predictor simply because the front end can know with 100% certainty where the flow of control will be.

Borislav Petkov <bp@...en8.de> wrote:

>On Mon, Jul 01, 2013 at 09:50:46AM +0200, Ingo Molnar wrote:
>> Not sure - the main thing we want to know is whether it gets faster.
>> The _amount_ will depend on things like precise usage patterns,
>> caching, etc. - but rarely does a real workload turn a win like this
>> into a loss.
>
>Yep, and it does get faster by a whopping 6 seconds!
>
>Almost all standard counters go down a bit.
>
>Interestingly, branch misses get a slight increase and the asm goto
>thing does actually jump to the fail_fn from within the asm so maybe
>this could puzzle the branch predictor a bit. Although the instructions
>look the same and jumps are both forward.
>
>Oh well, we don't know where those additional misses happened so it
>could be somewhere else entirely, or it is simply noise.
>
>In any case, we're getting faster, so not worth investigating I guess.
>
>
>plain 3.10
>==========
>
> Performance counter stats for '../build-kernel.sh' (5 runs):
>
>1312558.712266 task-clock                #    5.961 CPUs utilized      
>     ( +-  0.02% )
>1,036,629 context-switches          #    0.790 K/sec                   
>( +-  0.24% )
>55,118 cpu-migrations            #    0.042 K/sec                    (
>+-  0.25% )
>46,505,184 page-faults               #    0.035 M/sec                  
> ( +-  0.00% )
>4,768,420,289,997 cycles                    #    3.633 GHz             
>        ( +-  0.02% ) [83.79%]
>3,424,161,066,397 stalled-cycles-frontend   #   71.81% frontend cycles
>idle     ( +-  0.02% ) [83.78%]
>2,483,143,574,419 stalled-cycles-backend    #   52.07% backend  cycles
>idle     ( +-  0.04% ) [67.40%]
> 3,091,612,061,933 instructions              #    0.65  insns per cycle
>             #    1.11  stalled cycles per insn  ( +-  0.01% ) [83.93%]
>677,787,215,988 branches                  #  516.386 M/sec             
>      ( +-  0.01% ) [83.77%]
>25,438,736,368 branch-misses             #    3.75% of all branches    
>     ( +-  0.02% ) [83.78%]
>
>220.191740778 seconds time elapsed                                     
>    ( +-  0.32% )
>
> + patch
>========
>
> Performance counter stats for '../build-kernel.sh' (5 runs):
>
>1309995.427337 task-clock                #    6.106 CPUs utilized      
>     ( +-  0.09% )
>1,033,446 context-switches          #    0.789 K/sec                   
>( +-  0.23% )
>55,228 cpu-migrations            #    0.042 K/sec                    (
>+-  0.28% )
>46,484,992 page-faults               #    0.035 M/sec                  
> ( +-  0.00% )
>4,759,631,961,013 cycles                    #    3.633 GHz             
>        ( +-  0.09% ) [83.78%]
>3,415,933,806,156 stalled-cycles-frontend   #   71.77% frontend cycles
>idle     ( +-  0.12% ) [83.78%]
>2,476,066,765,933 stalled-cycles-backend    #   52.02% backend  cycles
>idle     ( +-  0.10% ) [67.38%]
> 3,089,317,073,397 instructions              #    0.65  insns per cycle
>             #    1.11  stalled cycles per insn  ( +-  0.02% ) [83.95%]
>677,623,252,827 branches                  #  517.271 M/sec             
>      ( +-  0.01% ) [83.79%]
>25,444,376,740 branch-misses             #    3.75% of all branches    
>     ( +-  0.02% ) [83.79%]
>
>214.533868029 seconds time elapsed                                     
>    ( +-  0.36% )

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ