linux-kernel - Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <030b279a19a2c5929972b9b56cce40f4f919ed20.camel@gmx.de>
Date:   Tue, 12 Sep 2023 08:32:13 +0200
From:   Mike Galbraith <efault@....de>
To:     K Prateek Nayak <kprateek.nayak@....com>,
        Chen Yu <yu.c.chen@...el.com>
Cc:     Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [RFC PATCH 2/2] sched/fair: skip the cache hot CPU in
 select_idle_cpu()

On Mon, 2023-09-11 at 13:59 +0530, K Prateek Nayak wrote:
>
> Speaking of cache-hot idle CPU, is netperf actually more happy with
> piling on current CPU?

Some tests would be happier, others not at all, some numbers below.

I doubt much in the real world can perform better stacked, to be a win,
stacked task overlap induced service latency and utilization loss has
to be less than cache population cost of an idle CPU, something that
modern CPUs have become darn good at, making for a high bar.

> I ask this because the logic seems to be
> reserving the previous CPU for a task that dislikes migration but I
> do not see anything in the wake_affine_idle() path that would make the
> short sleeper proactively choose the previous CPU when the wakeup is
> marked with the WF_SYNC flag. Let me know if I'm missing something?

If select_idle_sibling() didn't intervene, the wake affine logic would
indeed routinely step all over working sets, and at one time briefly
did so due to a silly bug. (see kernel/sched/fair.c.today:7292)

The sync hint stems from the bad old days of SMP when cross-cpu latency
was horrid, and has lost much of its value, but its bias toward the
waker CPU still helps reduce man-in-the-middle latency in a busy box,
which can do even more damage than that done by stacking of not really
synchronous tasks that can be seen below.

The TCP/UDP_RR tests are very close to synchronous, and the numbers
reflect that, stacking is unbeatable for them [1], but for the other
tests, hopefully doing something a bit more realistic than tiny ball
ping-pong, stacking is a demonstrable loser.

Not super carefully run script output:

homer:/root # netperf.sh
TCP_SENDFILE-1  unbound    Avg:  87889  Sum:    87889
TCP_SENDFILE-1  stacked    Avg:  62885  Sum:    62885
TCP_SENDFILE-1  cross-smt  Avg:  58887  Sum:    58887
TCP_SENDFILE-1  cross-core Avg:  90673  Sum:    90673

TCP_STREAM-1    unbound    Avg:  71858  Sum:    71858
TCP_STREAM-1    stacked    Avg:  58883  Sum:    58883
TCP_STREAM-1    cross-smt  Avg:  49345  Sum:    49345
TCP_STREAM-1    cross-core Avg:  72346  Sum:    72346

TCP_MAERTS-1    unbound    Avg:  73890  Sum:    73890
TCP_MAERTS-1    stacked    Avg:  60682  Sum:    60682
TCP_MAERTS-1    cross-smt  Avg:  49868  Sum:    49868
TCP_MAERTS-1    cross-core Avg:  73343  Sum:    73343

UDP_STREAM-1    unbound    Avg:  99442  Sum:    99442
UDP_STREAM-1    stacked    Avg:  85319  Sum:    85319
UDP_STREAM-1    cross-smt  Avg:  63239  Sum:    63239
UDP_STREAM-1    cross-core Avg:  99102  Sum:    99102

TCP_RR-1        unbound    Avg: 200833  Sum:   200833
TCP_RR-1        stacked    Avg: 243733  Sum:   243733
TCP_RR-1        cross-smt  Avg: 138507  Sum:   138507
TCP_RR-1        cross-core Avg: 210404  Sum:   210404

UDP_RR-1        unbound    Avg: 252575  Sum:   252575
UDP_RR-1        stacked    Avg: 273081  Sum:   273081
UDP_RR-1        cross-smt  Avg: 168448  Sum:   168448
UDP_RR-1        cross-core Avg: 264124  Sum:   264124

1. nearly unbeatable - shared L2 CPUS can by a wee bit.