lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140501224744.GA2285@kmo-pixel>
Date:	Thu, 1 May 2014 15:47:44 -0700
From:	Kent Overstreet <kmo@...erainc.com>
To:	Jens Axboe <axboe@...nel.dk>
Cc:	Ming Lei <tom.leiming@...il.com>,
	Alexander Gordeev <agordeev@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Shaohua Li <shli@...nel.org>,
	Nicholas Bellinger <nab@...ux-iscsi.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH RFC 0/2] percpu_ida: Take into account CPU topology when
 stealing tags

On Tue, Apr 29, 2014 at 03:13:38PM -0600, Jens Axboe wrote:
> On 04/29/2014 05:35 AM, Ming Lei wrote:
> > On Sat, Apr 26, 2014 at 10:03 AM, Jens Axboe <axboe@...nel.dk> wrote:
> >> On 2014-04-25 18:01, Ming Lei wrote:
> >>>
> >>> Hi Jens,
> >>>
> >>> On Sat, Apr 26, 2014 at 5:23 AM, Jens Axboe <axboe@...nel.dk> wrote:
> >>>>
> >>>> On 04/25/2014 03:10 AM, Ming Lei wrote:
> >>>>
> >>>> Sorry, I did run it the other day. It has little to no effect here, but
> >>>> that's mostly because there's so much other crap going on in there. The
> >>>> most effective way to currently make it work better, is just to ensure
> >>>> the caching pool is of a sane size.
> >>>
> >>>
> >>> Yes, that is just what the patch is doing, :-)
> >>
> >>
> >> But it's not enough.
> > 
> > Yes, the patch is only for cases of mutli hw queue and having
> > offline CPUs existed.
> > 
> >> For instance, my test case, it's 255 tags and 64 CPUs.
> >> We end up in cross-cpu spinlock nightmare mode.
> > 
> > IMO, the scaling problem for the above case might be
> > caused by either current percpu ida design or blk-mq's
> > usage on it.
> 
> That is pretty much my claim, yes. Basically I don't think per-cpu tag
> caching is ever going to be the best solution for the combination of
> modern machines and the hardware that is out there (limited tags).

Sorry for not being more active in the discussion earlier, but anyways - I'm in
100% agreement with this.

Percpu freelists are _fundamentally_ only _useful_ when you don't need to be
using all your available tags, because percpu sharding requires wasting your tag
space. I could write a mathematical proof of this if I cared enough.

Otherwise what happens is on alloc failure you're touching all the other
cachelines every single time and now you're bouncing _more_ cachelines than if
you just had a single global freelist.

So yeah, for small tag spaces just use a single simple bit vector on a single
cacheline.

BTW, Shaohua Li's patch d835502f3dacad1638d516ab156d66f0ba377cf5 that changed
when steal_tags() runs was fundamentally wrong and broken in this respect, and
should be reverted, whatever usage it was that was expecting to be able to
allocate the entire tag space was the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ