linux-kernel - Re: [sched/fair] 8d86968ac3: netperf.Throughput

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201126121351.GJ3371@techsingularity.net>
Date:   Thu, 26 Nov 2020 12:13:51 +0000
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     "Li, Aubrey" <aubrey.li@...ux.intel.com>
Cc:     kernel test robot <rong.a.chen@...el.com>,
        0day robot <lkp@...el.com>, Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Qais Yousef <qais.yousef@....com>,
        Valentin Schneider <valentin.schneider@....com>,
        Jiang Biao <benbjiang@...il.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        ying.huang@...el.com, feng.tang@...el.com, zhengjun.xing@...el.com,
        mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        Aubrey Li <aubrey.li@...el.com>, yu.c.chen@...el.com
Subject: Re: [sched/fair] 8d86968ac3: netperf.Throughput_tps -29.5% regression

On Thu, Nov 26, 2020 at 02:57:07PM +0800, Li, Aubrey wrote:
> Hi Robot,
> 
> On 2020/11/25 17:09, kernel test robot wrote:
> > Greeting,
> > 
> > FYI, we noticed a -29.5% regression of netperf.Throughput_tps due to commit:
> > 
> > 
> > commit: 8d86968ac36ea5bff487f70b5ffc252a87d44c51 ("[RFC PATCH v4] sched/fair: select idle cpu from idle cpumask for task wakeup")
> > url: https://github.com/0day-ci/linux/commits/Aubrey-Li/sched-fair-select-idle-cpu-from-idle-cpumask-for-task-wakeup/20201118-115145
> > base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 09162bc32c880a791c6c0668ce0745cf7958f576
> 
> I tried to replicate this on my side on a 192 threads(with SMT) machine as well and didn't see the regression.
> 
> nr_threads		v5.9.8		+patch
> 96(50%)			1 (+/- 2.499%)	1.007672(+/- 3.0872%)
> 
> I also tested another 100% case and see similar improvement as what I saw on uperf benchmark
> 
> nr_threads		v5.9.8		+patch
> 192(100%)		1 (+/- 45.32%)	1.864917(+/- 23.29%)
> 
> My base is v5.9.8 BTW.
> 
> > 	ip: ipv4
> > 	runtime: 300s
> > 	nr_threads: 50%
> > 	cluster: cs-localhost
> > 	test: UDP_RR
> > 	cpufreq_governor: performance
> > 	ucode: 0x5003003
> > 

Note that I suspect that regressions with this will be tricky to reproduce
because it'll depend on the timing of when the idle mask gets updated. With
this configuration there are 50% "threads" which likely gets translates
into 1 client/server per thread or 100% of CPUs active but as it's a
ping-pong workload, the pairs are rapidly idling for very short periods.

If the idle mask is not getting cleared then select_idle_cpu() is
probably returning immediately. select_idle_core() is almost certainly
failing so that just leaves select_idle_smt() to find a potentially idle
CPU. That's a limited search space so tasks may be getting stacked and
missing CPUs that are idling for short periods.

On the flip side, I expect cases like hackbench to benefit because it
can saturate a machine to such a degree that select_idle_cpu() is a waste
of time.

That said, I haven't followed the different versions closely. I know v5
got a lot of feedback so will take a closer look at v6. Fundamentally
though I expect that using the idle mask will be a mixed bag. At low
utilisation or over-saturation, it'll be a benefit. At the point where
the machine is almost fully busy, some workloads will benefit (lightly
communicating workloads that occasionally migrate) and others will not
(ping-pong workloads looking for CPUs that are idle for very brief
periods).

It's tricky enough that it might benefit from a sched_feat() check that
is default true so it gets tested. For regressions that show up, it'll
be easy enough to ask for the feature to be disabled to see if it fixes
it. Over time, that might give an idea of exactly what sort of workloads
benefit and what suffers.

Note that the cost of select_idle_cpu() can also be reduced by enabling
SIS_AVG_CPU so it would be interesting to know if the idle mask is superior
or inferior to SIS_AVG_CPU for workloads that show regressions.

-- 
Mel Gorman
SUSE Labs