lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1238014906.2132.373.camel@calx>
Date:	Wed, 25 Mar 2009 16:01:46 -0500
From:	Matt Mackall <mpm@...enic.com>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	Pekka Enberg <penberg@...helsinki.fi>,
	Hua Zhong <hzhong@...il.com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Roland McGrath <roland@...hat.com>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Steven Rostedt <srostedt@...hat.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH 2/5] mm: remove unlikly NULL from kfree

On Wed, 2009-03-25 at 12:34 -0400, Steven Rostedt wrote:
> On Wed, 25 Mar 2009, Matt Mackall wrote:
> > > 
> > > I think the theory is that gcc and the CPU can handle normal branch 
> > > predictions well. But if you use one of the prediction macros, gcc 
> > > (and some archs) behaves differently, such that taking the wrong branch 
> > > can cost more than the time saved with all the other correct hits.
> > > 
> > > Again, I'm not sure. I haven't done the bench marks. Perhaps someone else 
> > > is more apt at knowing the details here.
> > 
> > >From first principles, we can make a reasonable model of branch
> > prediction success with a branch cache:
> > 
> >                hot cache     cold cache  cold cache  cold cache 
> >                w|w/o hint                good hint   bad hint
> > p near 0       +             +           +           -
> > p near .5      0             0           0           0
> > p near 1       +             -           +           -
> > 
> > (this assumes the CPU is biased against branching in the cold cache
> > case)
> > 
> > Branch prediction miss has a penalty measured in some smallish number of
> > cycles. So the impact in cycles/sec[1] is (p(miss) * penalty) * (calls /
> > sec). Because the branch cache kicks in and hides our unlikely hint with
> > a hot cache, we can't get a high calls/sec, so to have much impact,
> > we've got to have a very high probably of a missed branch (p near 1)
> > _and_ cold cache. 
> > 
> > So for CPUs with a branch cache, unlikely hints only make sense in
> > fairly extreme cases. And I think that includes most CPU families these
> > days as it's pretty much required to get much advantage from making the
> > CPU clock faster than the memory bus. 
> > 
> > We'd have a lot of trouble benchmarking this meaningfully as hot caches
> > kill the effect. And it would of course depend directly on a given CPU's
> > branch cache size and branch miss penalty, numbers that vary from model
> > to model. So I think unless we can confidently state that a branch is
> > rarely taken, there's very little upside to adding unlikely.
> > 
> > On the other hand, there's also very little downside until our hint is
> > grossly inaccurate. So there's a huge hysteresis here: if p is < .99,
> > not much point adding unlikely. If p is > .1, not much point removing
> > it.
> > 
> > [1] Note that cycles/sec is dimensionless as cycles and seconds are both
> > measures of time. So impact here is in units of very small fractions of
> > a percent.
> 
> Hi Matt,
> 
> Thanks for this info!
> 
> Although gcc plays a role too. That is, if we have
> 
> 	if (x)
> 		do something small;
> 
> 	do something large;
> 
> 
> this can be broken into:
> 
> 	cmp x
> 	beq 1f
> 	do something small
> 1:
> 	do something large
> 
> Which plays nice with the cache. But, by adding a unlikely(x), gcc will 
> probably choose to do:
> 
> 	cmp x
> 	bne 2f
> 
> 1:
> 	do something large
> 
> 	ret;
> 
> 2:
> 	do something small
> 	b 1b
> 
> which hurts in a number of ways.

The cost is an unconditional branch; the ret already exists. There's a
slightly larger cache footprint for the small branch and a slightly
smaller footprint for the large branch. If p is close to .5 and
calls/sec is high, the cache footprint is the sum of the footprint of
both branches. But if calls/sec is close to low, cache footprint is also
low.

So, yeah, I think this is a good additional argument to err on the side
of not adding these things at all. And I certainly wasn't intending to
defend the ones in kfree. 

But I'm also skeptical of whether it's worth spending much time actively
routing out the moderately incorrect instances. It's going to be nearly
immune to performance benchmarking. We should instead just actively
discourage using unlikely in new code.

-- 
http://selenic.com : development and support for Mercurial and Linux


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ