linux-kernel - Re: hackbench regression due to commit 9dfc6e68bfe6e

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1270631267.2078.380.camel@ymzhang.sh.intel.com>
Date:	Wed, 07 Apr 2010 17:07:47 +0800
From:	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Christoph Lameter <cl@...ux-foundation.org>,
	netdev <netdev@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
	Pekka Enberg <penberg@...helsinki.fi>, alex.shi@...el.com,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Ma, Ling" <ling.ma@...el.com>,
	"Chen, Tim C" <tim.c.chen@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: hackbench regression due to commit 9dfc6e68bfe6e

On Wed, 2010-04-07 at 08:39 +0200, Eric Dumazet wrote:
> Le mercredi 07 avril 2010 à 10:34 +0800, Zhang, Yanmin a écrit :
> 
> > I collected retired instruction, dtlb miss and LLC miss.
> > Below is data of LLC miss.
> > 
> > Kernel 2.6.33:
> > # Samples: 11639436896 LLC-load-misses
> > #
> > # Overhead          Command                                           Shared Object  Symbol
> > # ........  ...............  ......................................................  ......
> > #
> >     20.94%        hackbench  [kernel.kallsyms]                                       [k] copy_user_generic_string
> >     14.56%        hackbench  [kernel.kallsyms]                                       [k] unix_stream_recvmsg
> >     12.88%        hackbench  [kernel.kallsyms]                                       [k] kfree
> >      7.37%        hackbench  [kernel.kallsyms]                                       [k] kmem_cache_free
> >      7.18%        hackbench  [kernel.kallsyms]                                       [k] kmem_cache_alloc_node
> >      6.78%        hackbench  [kernel.kallsyms]                                       [k] kfree_skb
> >      6.27%        hackbench  [kernel.kallsyms]                                       [k] __kmalloc_node_track_caller
> >      2.73%        hackbench  [kernel.kallsyms]                                       [k] __slab_free
> >      2.21%        hackbench  [kernel.kallsyms]                                       [k] get_partial_node
> >      2.01%        hackbench  [kernel.kallsyms]                                       [k] _raw_spin_lock
> >      1.59%        hackbench  [kernel.kallsyms]                                       [k] schedule
> >      1.27%        hackbench  hackbench                                               [.] receiver
> >      0.99%        hackbench  libpthread-2.9.so                                       [.] __read
> >      0.87%        hackbench  [kernel.kallsyms]                                       [k] unix_stream_sendmsg
> > 
> > 
> > 
> > 
> > Kernel 2.6.34-rc3:
> > # Samples: 13079611308 LLC-load-misses
> > #
> > # Overhead          Command                                                         Shared Object  Symbol
> > # ........  ...............  ....................................................................  ......
> > #
> >     18.55%        hackbench  [kernel.kallsyms]                                                     [k] copy_user_generic_str
> > ing
> >     13.19%        hackbench  [kernel.kallsyms]                                                     [k] unix_stream_recvmsg
> >     11.62%        hackbench  [kernel.kallsyms]                                                     [k] kfree
> >      8.54%        hackbench  [kernel.kallsyms]                                                     [k] kmem_cache_free
> >      7.88%        hackbench  [kernel.kallsyms]                                                     [k] __kmalloc_node_track_
> > caller
> >      6.54%        hackbench  [kernel.kallsyms]                                                     [k] kmem_cache_alloc_node
> >      5.94%        hackbench  [kernel.kallsyms]                                                     [k] kfree_skb
> >      3.48%        hackbench  [kernel.kallsyms]                                                     [k] __slab_free
> >      2.15%        hackbench  [kernel.kallsyms]                                                     [k] _raw_spin_lock
> >      1.83%        hackbench  [kernel.kallsyms]                                                     [k] schedule
> >      1.82%        hackbench  [kernel.kallsyms]                                                     [k] get_partial_node
> >      1.59%        hackbench  hackbench                                                             [.] receiver
> >      1.37%        hackbench  libpthread-2.9.so                                                     [.] __read
> > 
> > 
> 
> Please check values of /proc/sys/net/core/rmem_default
> and /proc/sys/net/core/wmem_default on your machines.
> 
> Their values can also change hackbench results, because increasing
> wmem_default allows af_unix senders to consume much more skbs and stress
> slab allocators (__slab_free), way beyond slub_min_order can tune them.
> 
> When 2000 senders are running (and 2000 receivers), we might consume
> something like 2000 * 100.000 bytes of kernel memory for skbs. TLB
> trashing is expected, because all these skbs can span many 2MB pages.
> Maybe some node imbalance happens too.
It's a good pointer. rmem_default and wmem_default are about 116k on my machine.
I changed them to 52K and it seems there is no improvement.

> 
> 
> 
> You could try to boot your machine with less ram per node and check :
> 
> # cat /proc/buddyinfo 
> Node 0, zone      DMA      2      1      2      2      1      1      1      0      1      1      3 
> Node 0, zone    DMA32    219    298    143    584    145     57     44     41     31     26    517 
> Node 1, zone    DMA32      4      1     17      1      0      3      2      2      2      2    123 
> Node 1, zone   Normal    126    169     83      8      7      5     59     59     49     28    459 
> 
> 
> One experiment on your Nehalem machine would be to change hackbench so
> that each group (20 senders/ 20 receivers) run on a particular NUMA
> node.
I expect process scheduler to work well in scheduling different groups
to different nodes.

I suspected dynamic percpu data didn't take care of NUMA, but kernel dump shows
it does take care of NUMA.

> 
> x86info -c ->
> 
> CPU #1
> EFamily: 0 EModel: 1 Family: 6 Model: 26 Stepping: 5
> CPU Model: Core i7 (Nehalem)
> Processor name string: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
> Type: 0 (Original OEM)	Brand: 0 (Unsupported)
> Number of cores per physical package=8
> Number of logical processors per socket=16
> Number of logical processors per core=2
> APIC ID: 0x10	Package: 0  Core: 1   SMT ID 0
> Cache info
>  L1 Instruction cache: 32KB, 4-way associative. 64 byte line size.
>  L1 Data cache: 32KB, 8-way associative. 64 byte line size.
>  L2 (MLC): 256KB, 8-way associative. 64 byte line size.
> TLB info
>  Data TLB: 4KB pages, 4-way associative, 64 entries
>  64 byte prefetching.
> Found unknown cache descriptors: 55 5a b2 ca e4 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/