lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 17 Apr 2009 10:09:46 -0700
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Steven Rostedt <rostedt@...dmis.org>
CC:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@....de>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: [patch 2/3] RCU move trace defines to rcupdate_types.h

Steven Rostedt wrote:
> I was talking with Arjan about this in San Francisco. The expense of doing 
> function calls. He told me (and he can correct me if I'm wrong here) that 
> function calls are like branch predictions. The branch part is the fact 
> that a retq is a jmp that can go to different locations. There's logic in 
> the CPU to match calls with retqs to speed this up.
>   

Right.  The call is to a fixed address, so there's no prediction needed 
at all; the CPU can immediately start fetching instructions at the call 
target without missing a beat.  When it hits the ret in the function, 
assuming nobody has been playing games with the stack pointer or 
modifying the return address on the stack, it can just look up the 
return address from its cache and start fetching from there, again with 
no bubbles.  It should be very close to a pair of jumps, aside from one 
extra memory write (for the return address on stack) - and that 
shouldn't be too bad, because the chances are the cache is hot for the 
stack.

> He also told me that the "mcount" retq that I do is actually more 
> expensive. The logic does not expect a function to return immediately. 
> (for stubs, I'm not sure that was a good design).
>
> Hence,
>
> 	call mcount
>
> [...]
>
> mcount:
> 	retq
>
>
> is expensive, compared to a call to a function that actually does 
> something.
>
> Again, Arjan can correct me here, since I'm just trying to paraphrase what 
> he told me.
>   

Sounds reasonable; it takes a little while for the CPU to work out what 
the return address will be, even though its cached, so doing an 
immediate ret will cause a bubble while it sorts itself out.  But that 
shouldn't be an issue for the calls I'm talking about.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ