lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Jun 2015 17:52:31 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Joonsoo Kim <js1304@...il.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-Netdev <netdev@...r.kernel.org>,
	Alexander Duyck <alexander.duyck@...il.com>, brouer@...hat.com
Subject: Re: [PATCH 7/7] slub: initial bulk free implementation

On Tue, 16 Jun 2015 10:10:25 -0500 (CDT)
Christoph Lameter <cl@...ux.com> wrote:

> On Tue, 16 Jun 2015, Joonsoo Kim wrote:
> 
> > So, in your test, most of objects may come from one or two slabs and your
> > algorithm is well optimized for this case. But, is this workload normal case?
> 
> It is normal if the objects were bulk allocated because SLUB ensures that
> all objects are first allocated from one page before moving to another.

Yes, exactly.  Maybe SLAB is different? If so, then we can handle that
in the SLAB specific bulk implementation.


> > If most of objects comes from many different slabs, bulk free API does
> > enabling/disabling interrupt very much so I guess it work worse than
> > just calling __kmem_cache_free_bulk(). Could you test this case?
> 
> In case of SLAB this would be an issue since the queueing mechanism
> destroys spatial locality. This is much less an issue for SLUB.

I think Kim is worried about the cost of the enable/disable calls, when
the slowpath gets called.  But it is not a problem because the cost of
local_irq_{disable,enable} is very low (total cost 7 cycles).

It is very important that everybody realizes that the save+restore
variant is very expensive, this is key:

CPU: i7-4790K CPU @ 4.00GHz
 * local_irq_{disable,enable}:  7 cycles(tsc) - 1.821 ns
 * local_irq_{save,restore}  : 37 cycles(tsc) - 9.443 ns

Even if EVERY object need to call slowpath/__slab_free() it will be
faster than calling the fallback.  Because I've demonstrated the call
this_cpu_cmpxchg_double() costs 9 cycles.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. for comparison[1] a function call cost is 5-6 cycles, and a function
pointer call cost is 6-10 cycles, depending on CPU.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_sample.c
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ