lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 11 Dec 2015 15:17:50 +0100
From:	Jirka Hladky <jhladky@...hat.com>
To:	linux-kernel@...r.kernel.org
Subject: sched : performance regression 24% between 4.4rc4 and 4.3 kernel

Hello,

we are doing performance testing of the new kernel scheduler (commit
53528695ff6d8b77011bc818407c13e30914a946). In most cases we see
performance improvements compared to 4.3 kernel with the exception of
stream benchmark when running on 4 NUMA node server.

When we run 4 stream benchmark processes on 4 NUMA node server and we
compare the total performance we see drop about 24% compared to 4.3
kernel. This is caused by the fact that 2 stream benchmarks are
running on the same NUMA node while 1 NUMA node does not run any
stream benchmark. With kernel 4.3, load is distributed evenly among
all 4 NUMA nodes. When two stream benchmarks are running on the same
NUMA node then the runtime is almost twice as long compared to one
stream bench running on one NUMA node. See log files [1] bellow.

Please see the graph comparing stream benchmark results between kernel
4.3 and 4.4rc4 (for legend see [2] bellow).
https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/Stream_benchmark_on_4_NUMA_node_server_4.3vs4.4rc4_kernel.png

Could you please help us to identify the root cause of this
regression? We don't have the skills to fix the problem ourselves but
we will be more than happy to test any proposed patch for this issue.

Thanks a lot for your help on that!
Jirka

Further details:

[1] Log files can be downloaded here:
https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/4.4RC4_stream_log_files.tar.bz2

$grep "User time" *log
stream.defaultRun.004streams.loop01.instance001.log:User time:  12.370 seconds
stream.defaultRun.004streams.loop01.instance002.log:User time:  10.560 seconds
stream.defaultRun.004streams.loop01.instance003.log:User time:  19.330 seconds
stream.defaultRun.004streams.loop01.instance004.log:User time:  17.820 seconds


$grep "NUMA nodes:" *log
stream.defaultRun.004streams.loop01.instance001.log:NUMA nodes:     2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2
stream.defaultRun.004streams.loop01.instance002.log:NUMA nodes:     0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
stream.defaultRun.004streams.loop01.instance003.log:NUMA nodes:     3
3 3 3 3 3 3 3 3 0 0 0 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3
stream.defaultRun.004streams.loop01.instance004.log:NUMA nodes:     3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 0 0 0 0 0 0 0 0 0 0 0 0

=> please note that NO bench is running on NUMA node #1 and instances
#3 and #4 are running both on NUMA node #3. This has huge performance
impact as stream instances on node #3 need 19 and 17 seconds to finish
compared to 10 and 12 seconds for instances running alone on one NUMA
node.

[2] Graph:
https://jhladky.fedorapeople.org/sched_stream_kernel_4.3vs4.4rc4/Stream_benchmark_on_4_NUMA_node_server_4.3vs4.4rc4_kernel.png

Graph Legend:
GREEN line => kernel 4.3
BLUE line =>    kernel 4.4rc4
x-axis      =>     number of parallel stream instances
y-axis      =>     Sum [1/runtime] over all stream instances


Details on server: DELL PowerEdge R820, 4x E5-4607 0 @ 2.20GHz and 128GB RAM
http://ark.intel.com/products/64604
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ