lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 5 Sep 2015 13:18:25 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	netdev@...r.kernel.org, akpm@...ux-foundation.org,
	linux-mm@...ck.org, aravinda@...ux.vnet.ibm.com,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	iamjoonsoo.kim@....com, brouer@...hat.com
Subject: Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache
 bulk free API.

On Fri, 4 Sep 2015 18:45:13 -0500 (CDT)
Christoph Lameter <cl@...ux.com> wrote:

> On Fri, 4 Sep 2015, Alexander Duyck wrote:
> > Right, but one of the reasons for Jesper to implement the bulk alloc/free is
> > to avoid the cmpxchg that is being used to get stuff into or off of the per
> > cpu lists.
> 
> There is no full cmpxchg used for the per cpu lists. Its a cmpxchg without
> lock semantics which is very cheap.

The double_cmpxchg without lock prefix still cost 9 cycles, which is
very fast but still a cost (add approx 19 cycles for a lock prefix).

It is slower than local_irq_disable + local_irq_enable that only cost
7 cycles, which the bulking call uses.  (That is the reason bulk calls
with 1 object can almost compete with fastpath).


> > In the case of network drivers they are running in softirq context almost
> > exclusively.  As such it is useful to have a set of buffers that can be
> > acquired or freed from this context without the need to use any
> > synchronization primitives.  Then once the softirq context ends then we can
> > free up some or all of the resources back to the slab allocator.
> 
> That is the case in the slab allocators.

There is a potential for taking advantage of this softirq context,
which is basically what my qmempool implementation did.

But we have now optimized the slub allocator to an extend that (in case
of slab-tuning or slab_nomerge) is faster than my qmempool implementation.

Thus, I would like a smaller/slimmer layer than qmempool.  We do need
some per CPU cache for allocations, like Alex suggests, but I'm not
sure we need that for the free side.  For now I'm returning
objects/skbs directly to slub, and is hoping enough objects can be
merged in a detached freelist, which allow me to return several objects
with a single locked double_cmpxchg.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ