netdev - Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150905131825.6c04837d@redhat.com>
Date:	Sat, 5 Sep 2015 13:18:25 +0200
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	netdev@...r.kernel.org, akpm@...ux-foundation.org,
	linux-mm@...ck.org, aravinda@...ux.vnet.ibm.com,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	iamjoonsoo.kim@....com, brouer@...hat.com
Subject: Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache
 bulk free API.

On Fri, 4 Sep 2015 18:45:13 -0500 (CDT)
Christoph Lameter <cl@...ux.com> wrote:

> On Fri, 4 Sep 2015, Alexander Duyck wrote:
> > Right, but one of the reasons for Jesper to implement the bulk alloc/free is
> > to avoid the cmpxchg that is being used to get stuff into or off of the per
> > cpu lists.
> 
> There is no full cmpxchg used for the per cpu lists. Its a cmpxchg without
> lock semantics which is very cheap.

The double_cmpxchg without lock prefix still cost 9 cycles, which is
very fast but still a cost (add approx 19 cycles for a lock prefix).

It is slower than local_irq_disable + local_irq_enable that only cost
7 cycles, which the bulking call uses.  (That is the reason bulk calls
with 1 object can almost compete with fastpath).

> > In the case of network drivers they are running in softirq context almost
> > exclusively.  As such it is useful to have a set of buffers that can be
> > acquired or freed from this context without the need to use any
> > synchronization primitives.  Then once the softirq context ends then we can
> > free up some or all of the resources back to the slab allocator.
> 
> That is the case in the slab allocators.

There is a potential for taking advantage of this softirq context,
which is basically what my qmempool implementation did.

But we have now optimized the slub allocator to an extend that (in case
of slab-tuning or slab_nomerge) is faster than my qmempool implementation.

Thus, I would like a smaller/slimmer layer than qmempool.  We do need
some per CPU cache for allocations, like Alex suggests, but I'm not
sure we need that for the free side.  For now I'm returning
objects/skbs directly to slub, and is hoping enough objects can be
merged in a detached freelist, which allow me to return several objects
with a single locked double_cmpxchg.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html