linux-kernel - Re: [PATCH] fix scheduler regression from "sched/fair: Rework load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtBiOFXwV9SkZ=YBw16xoS6LSrKVR4sFX6r2hZPZ9_5-+A@mail.gmail.com>
Date:   Mon, 26 Oct 2020 09:39:32 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Chris Mason <clm@...com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Rik van Riel <riel@...riel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework load_balance()"

Hi Chris

On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@...com> wrote:
>
> Hi everyone,
>
> We’re validating a new kernel in the fleet, and compared with v5.2,

Which version are you using ?
several improvements have been added since v5.5 and the rework of load_balance

> performance is ~2-3% lower for some of our workloads.  After some
> digging, Johannes found that our involuntary context switch rate was ~2x
> higher, and we were leaving a CPU idle a higher percentage of the time,
> even though the workload was trying to saturate the system.
>
> We were able to reproduce the problem with schbench, and Johannes
> bisected down to:
>
> commit 0b0695f2b34a4afa3f6e9aa1ff0e5336d8dad912
> Author: Vincent Guittot <vincent.guittot@...aro.org>
> Date:   Fri Oct 18 15:26:31 2019 +0200
>
>      sched/fair: Rework load_balance()
>
> Our working theory is the load balancing changes are leaving processes
> behind busy CPUs instead of moving them onto idle ones.  I made a few
> schbench modifications to make this easier to demonstrate:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git/
>
> My VM has 40 cpus (20 cores, 2 threads per core), and my schbench
> command line is:

What is the topology ? are they all part of the same LLC ?

>
> schbench -t 20 -r 0 -c 1000000 -s 1000 -i 30 -z 120
>
> This has two message threads, and 20 workers per message thread.  Once
> woken up, the workers think for a full second, which means you’ll have
> some long latencies if you’re stuck behind one of these workers in the
> runqueue.  The message thread does a little bit of work and then sleeps,
> so we end up with 40 threads hammering full blast on the CPU and 2
> threads popping in and out of idle.
>
> schbench times the delay from when a message thread wakes a worker to
> when the worker runs.  On a good kernel, the output looks like this:
>
> Latency percentiles (usec) runtime 1290 (s) (3280 total samples)
>          50.0th: 155 (1653 samples)
>          75.0th: 189 (808 samples)
>          90.0th: 216 (501 samples)
>          95.0th: 227 (163 samples)
>          *99.0th: 256 (123 samples)
>          99.5th: 1510 (16 samples)
>          99.9th: 3132 (13 samples)
>          min=21, max=3286
>
> With 0b0695f2b34a, we get this:
>
> Latency percentiles (usec) runtime 1440 (s) (4480 total samples)
>          50.0th: 147 (2261 samples)
>          75.0th: 182 (1116 samples)
>          90.0th: 205 (671 samples)
>          95.0th: 224 (215 samples)
>          *99.0th: 12240 (173 samples) <—— much higher p99 and up
>          99.5th: 12752 (22 samples)
>          99.9th: 13104 (18 samples)
>          min=21, max=13172
>
> Since the idea is to fully load the machine with schbench, use schbench
> -t <your_num_cpus/2>, and make sure the box doesn’t have other stuff
> running in the background.  I used a VM because it ended up giving more
> consistent results on our kernel test machines, which have some periodic
> noise running in the background.
>
> We’ve tried a few different approaches, but don’t quite have a solid
> fix yet.  I thought I’d kick off the discussion with my most useful
> hunks so far:
>
> diff a/kernel/sched/fair.c b/kernel/sched/fair.c
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
>
> -chris