linux-kernel - Re: [lkp-robot] [sched/fair] 6d46bd3d97: netperf.Throughput

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1505448381.10814.64.camel@gmx.de>
Date:   Fri, 15 Sep 2017 06:06:21 +0200
From:   Mike Galbraith <efault@....de>
To:     Rik van Riel <riel@...hat.com>, Joel Fernandes <joelaf@...gle.com>
Cc:     kernel test robot <xiaolong.ye@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Josef Bacik <jbacik@...com>, Juri Lelli <Juri.Lelli@....com>,
        Brendan Jackman <brendan.jackman@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Ingo Molnar <mingo@...hat.com>, lkp@...org
Subject: Re: [lkp-robot] [sched/fair] 6d46bd3d97: netperf.Throughput_tps
 -11.3% regression

On Thu, 2017-09-14 at 11:56 -0400, Rik van Riel wrote:
> 
> On systems with SMT, it may make more sense for
> sync wakeups to look for idle threads of the same
> core, than to have the woken task end up on the 
> same thread, and wait for the current task to stop
> running.

Depends.

homer:/root # taskset -c 3 pipe-test
1.412185 usecs/loop -- avg 1.412185 1416.2 KHz
homer:/root # taskset -c 2,3 pipe-test
2.298820 usecs/loop -- avg 2.298820 870.0 KHz
homer:/root # taskset -c 3,7 pipe-test
1.899164 usecs/loop -- avg 1.899164 1053.1 KHz

For pipe-test, having ~zero overlap as well as ~zero footprint, that's
a good choice, but..

homer:/root # taskset -c 3 tbench.sh 1 10 2>&1|grep Throughput
Throughput 844.04 MB/sec  1 clients  1 procs  max_latency=0.042 ms
homer:/root # taskset -c 2,3 tbench.sh 1 10 2>&1|grep Throughput
Throughput 713.25 MB/sec  1 clients  1 procs  max_latency=0.324 ms
homer:/root # taskset -c 3,7 tbench.sh 1 10 2>&1|grep Throughput
Throughput 512.866 MB/sec  1 clients  1 procs  max_latency=0.454 ms

..for tbench, where my crusty ole Q6600 turns in a win by scheduling
the pair on separate L2 sharing cores, for the more modern SMT equipped
i4790, targeting shared L2 is the worst choice.

Bigger issue is that while microbenchmark behavior is consistant,
applications tend to process data and react to it (vs merely batting it
about like playful kittens, cute, but not all that productive), likely
mucking up any heuristic anyone invents with depressing regularity.

	-Mike