lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Aug 2013 14:09:01 -0700
From:	Kent Overstreet <kmo@...erainc.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	Christoph Lameter <cl@...two.org>,
	linux-kernel@...r.kernel.org, Oleg Nesterov <oleg@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	Andi Kleen <andi@...stfloor.org>, Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida

On Wed, Aug 21, 2013 at 07:59:41AM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Tue, Aug 20, 2013 at 07:31:51PM -0700, Kent Overstreet wrote:
> > All this for a performance improvement of 10x to 50x (or more), for the
> > ida sizes I measured.
> 
> That's misleading, isn't it? 

It's comparing it to the existing version that actually exists, instead
of comparing it to a hypothetical approach that doesn't exist yet. I
don't see how that's misleading.

> We should see large performance
> improvements even without the large pages.  What matters more is the
> leaf node performance for vast majority of cases and an extra radix
> tree layer on top would cover most of whatever is left.  Whether to
> use high order pages or not only affects the extreme niche use cases
> and I don't think going for high order pages to micro optimize those
> extreme use cases is the right trade off.
> 
> > So I could see your point if we were allocating gobs of vmalloc memory,
> > or high order allocations big enough to realistically be problematic (I
> > honestly don't think these will be) - but to me, this seems like a
> > pretty reasonable tradeoff for those performance gains.
> 
> The trade off is made up as the bulk of the performance benefit can be
> gained without resorting to high order allocations.

I'm more and more skeptical that that's true, and it's a given that it
wouldn't be as fast.

> > (And the performance gains do substantially come from using more
> > contiguous memory and treating the whole data structure as an array, and
> > doing less pointer chasing/looping)
> 
> I really have hard time buying that.  Let's say you go with single
> page leaf node and an extra single page layer on top.  How many IDs
> are we talking about?  For the cases which are most performance
> sensitive, this doesn't even matter a bit as percpu caching layer
> would be on top anyway.  I really don't think the micro optimization
> is called for at the cost of high order allocations from low level
> tool library.

These "micro optimizations" mean either less pointer chasing or less
branching in the _common_ case; you'd trade common case performance for
avoiding ever doing higher order allocations (and 2 with COMPACTION=n
and 4 with COMPACTION=y is not particularly high order!).

I don't buy that that's a good tradeoff. If you're convinced radix trees
are the way to go and it can be done without much performance cost, why
not code it up and show us?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists