[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55E9DE51.7090109@gmail.com>
Date: Fri, 4 Sep 2015 11:09:21 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>, netdev@...r.kernel.org,
akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, aravinda@...ux.vnet.ibm.com,
Christoph Lameter <cl@...ux.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
iamjoonsoo.kim@....com
Subject: Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk
free API.
On 09/04/2015 10:00 AM, Jesper Dangaard Brouer wrote:
> During TX DMA completion cleanup there exist an opportunity in the NIC
> drivers to perform bulk free, without introducing additional latency.
>
> For an IPv4 forwarding workload the network stack is hitting the
> slowpath of the kmem_cache "slub" allocator. This slowpath can be
> mitigated by bulk free via the detached freelists patchset.
>
> Depend on patchset:
> http://thread.gmane.org/gmane.linux.kernel.mm/137469
>
> Kernel based on MMOTM tag 2015-08-24-16-12 from git repo:
> git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
> Also contains Christoph's patch "slub: Avoid irqoff/on in bulk allocation"
>
>
> Benchmarking: Single CPU IPv4 forwarding UDP (generator pktgen):
> * Before: 2043575 pps
> * After : 2090522 pps
> * Improvements: +46947 pps and -10.99 ns
>
> In the before case, perf report shows slub free hits the slowpath:
> 1.98% ksoftirqd/6 [kernel.vmlinux] [k] __slab_free.isra.72
> 1.29% ksoftirqd/6 [kernel.vmlinux] [k] cmpxchg_double_slab.isra.71
> 0.95% ksoftirqd/6 [kernel.vmlinux] [k] kmem_cache_free
> 0.95% ksoftirqd/6 [kernel.vmlinux] [k] kmem_cache_alloc
> 0.20% ksoftirqd/6 [kernel.vmlinux] [k] __cmpxchg_double_slab.isra.60
> 0.17% ksoftirqd/6 [kernel.vmlinux] [k] ___slab_alloc.isra.68
> 0.09% ksoftirqd/6 [kernel.vmlinux] [k] __slab_alloc.isra.69
>
> After the slowpath calls are almost gone:
> 0.22% ksoftirqd/6 [kernel.vmlinux] [k] __cmpxchg_double_slab.isra.60
> 0.18% ksoftirqd/6 [kernel.vmlinux] [k] ___slab_alloc.isra.68
> 0.14% ksoftirqd/6 [kernel.vmlinux] [k] __slab_free.isra.72
> 0.14% ksoftirqd/6 [kernel.vmlinux] [k] cmpxchg_double_slab.isra.71
> 0.08% ksoftirqd/6 [kernel.vmlinux] [k] __slab_alloc.isra.69
>
>
> Extra info, tuning SLUB per CPU structures gives further improvements:
> * slub-tuned: 2124217 pps
> * patched increase: +33695 pps and -7.59 ns
> * before increase: +80642 pps and -18.58 ns
>
> Tuning done:
> echo 256 > /sys/kernel/slab/skbuff_head_cache/cpu_partial
> echo 9 > /sys/kernel/slab/skbuff_head_cache/min_partial
>
> Without SLUB tuning, same performance comes with kernel cmdline "slab_nomerge":
> * slab_nomerge: 2121824 pps
>
> Test notes:
> * Notice very fast CPU i7-4790K CPU @ 4.00GHz
> * gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC)
> * kernel 4.1.0-mmotm-2015-08-24-16-12+ #271 SMP
> * Generator pktgen UDP single flow (pktgen_sample03_burst_single_flow.sh)
> * Tuned for forwarding:
> - unloaded netfilter modules
> - Sysctl settings:
> - net/ipv4/conf/default/rp_filter = 0
> - net/ipv4/conf/all/rp_filter = 0
> - (Forwarding performance is affected by early demux)
> - net/ipv4/ip_early_demux = 0
> - net.ipv4.ip_forward = 1
> - Disabled GRO on NICs
> - ethtool -K ixgbe3 gro off tso off gso off
>
> ---
This is an interesting start. However I feel like it might work better
if you were to create a per-cpu pool for skbs that could be freed and
allocated in NAPI context. So for example we already have
napi_alloc_skb, why not just add a napi_free_skb and then make the array
of objects to be freed part of a pool that could be used for either
allocation or freeing? If the pool runs empty you just allocate
something like 8 or 16 new skb heads, and if you fill it you just free
half of the list?
- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists