linux-kernel - Re: [RFC PATCH v3 2/3] sched: Introduce cpus_share

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20230906063843.GA273182@ziqianlu-dell>
Date:   Wed, 6 Sep 2023 14:38:43 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        "Julien Desfossez" <jdesfossez@...italocean.com>, <x86@...nel.org>
Subject: Re: [RFC PATCH v3 2/3] sched: Introduce cpus_share_l2c

On Tue, Sep 05, 2023 at 08:46:42AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 03:21, Aaron Lu wrote:
> > Looks like the reduction in task migration is due to SIS_UTIL, i.e.
> > select_idle_cpu() aborts a lot more after applying this series because
> > system utilization increased.
> > 
> > Here are some numbers:
> >                   @sis       @sic     @migrate_idle_cpu  @abort
> > vanilla:       24640640   15883958     11913588          4148649
> > this_series:   22345434   18597564      4294995         14319284
> > 
> > note:
> > - @sis: number of times select_idle_sibling() called;
> > - @sic: number of times select_idle_cpu() called;
> > - @migrate_idle_cpu: number of times task migrated due to
> >    select_idle_cpu() found an idle cpu that is different from prev_cpu;
> > - @abort: number of times select_idle_cpu() aborts the search due to
> >    SIS_UTIL.
> > 
> > All numbers are captured during a 5s window while running the below
> > workload on a 2 sockets Intel SPR(56 cores, 112 threads per socket):
> > hackbench -g 20 -f 20 --pipe --threads -l 480000 -s 100
> > 
> > So for this workload, I think this series is doing something good: it
> > increased system utilization and due to SIS_UTIL, it also reduced task
> > migration where task migration isn't very useful since system is already
> > overloaded.
> 
> This is interesting. Did you also profile the impact of the patches on
> wake_affine(), especially wake_affine_idle() ? Its behavior did change very

For group=20 case, wake_affine() and wake_affine_idle() don't appear to
change much on this Intel machine, in that target received by sis() is
mostly prev_cpu instead of waker(this) cpu for both kernels.

But I do notice for group=32 case, in vanilla kernel, the chance of target
as received by sis() becoming to waker cpu increased a lot while with
this series, targer remains mostly prev_cpu and that is the reason why
migration dropped with this series for group=32 case becasue when sis()
fallback to use target, this series has a higher chance of not mirgating
the task. And my profile shows for vanilla kernel, when it choose target
as waker cpu, it's mostly due to wake_affine_weight(), not wake_affine_idle().

Thanks,
Aaron

> significantly in my tests, and this impacts the target cpu number received
> by select_idle_sibling(). But independently of what wake_affine() returns as
> target (waker cpu or prev_cpu), if select_idle_cpu() is trigger-happy and
> finds idle cores near that target, this will cause lots of migrations.
> 
> Based on your metrics, the ttwu-queued-l2 approach (in addition to reduce
> lock contention) appear to decrease the SIS_UTIL idleless level of the cpus
> enough to completely change the runqueue selection and migration behavior.
> 
> I fear that we hide a bad scheduler behavior under the rug by changing the
> idleless level of a specific workload pattern, while leaving the underlying
> root cause unfixed.
> 
> I'm currently working on a different approach: rate limit migrations.
> Basically, the idea is to detect when a task is migrated too often for its
> own good, and prevent the scheduler from migrating it for a short while. I
> get about 30% performance improvement with this approach as well (limit
> migration to 1 per 2ms window per task). I'll finish polishing my commit
> messages and send a series as RFC soon.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>