netdev - Re: [PATCH 7/7] slub: initial bulk free implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150616175231.427499ae@redhat.com>
Date:	Tue, 16 Jun 2015 17:52:31 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Joonsoo Kim <js1304@...il.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-Netdev <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>, brouer@...hat.com
Subject: Re: [PATCH 7/7] slub: initial bulk free implementation

On Tue, 16 Jun 2015 10:10:25 -0500 (CDT)
Christoph Lameter <cl@...ux.com> wrote:

> On Tue, 16 Jun 2015, Joonsoo Kim wrote:
> 
> > So, in your test, most of objects may come from one or two slabs and your
> > algorithm is well optimized for this case. But, is this workload normal case?
> 
> It is normal if the objects were bulk allocated because SLUB ensures that
> all objects are first allocated from one page before moving to another.

Yes, exactly.  Maybe SLAB is different? If so, then we can handle that
in the SLAB specific bulk implementation.


> > If most of objects comes from many different slabs, bulk free API does
> > enabling/disabling interrupt very much so I guess it work worse than
> > just calling __kmem_cache_free_bulk(). Could you test this case?
> 
> In case of SLAB this would be an issue since the queueing mechanism
> destroys spatial locality. This is much less an issue for SLUB.

I think Kim is worried about the cost of the enable/disable calls, when
the slowpath gets called.  But it is not a problem because the cost of
local_irq_{disable,enable} is very low (total cost 7 cycles).

It is very important that everybody realizes that the save+restore
variant is very expensive, this is key:

CPU: i7-4790K CPU @ 4.00GHz
 * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
 * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns

Even if EVERY object need to call slowpath/__slab_free() it will be
faster than calling the fallback.  Because I've demonstrated the call
this_cpu_cmpxchg_double() costs 9 cycles.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. for comparison[1] a function call cost is 5-6 cycles, and a function
pointer call cost is 6-10 cycles, depending on CPU.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html