lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230905072141.GA253439@ziqianlu-dell>
Date:   Tue, 5 Sep 2023 15:21:41 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        "Mel Gorman" <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Swapnil Sapkal <Swapnil.Sapkal@....com>,
        "Julien Desfossez" <jdesfossez@...italocean.com>, <x86@...nel.org>
Subject: Re: [RFC PATCH v3 2/3] sched: Introduce cpus_share_l2c

On Fri, Sep 01, 2023 at 09:45:28PM +0800, Aaron Lu wrote:
> On Mon, Aug 28, 2023 at 07:19:45PM +0800, Aaron Lu wrote:
> > On Fri, Aug 25, 2023 at 09:51:19AM -0400, Mathieu Desnoyers wrote:
> > > On 8/25/23 02:49, Aaron Lu wrote:
> > > > On Thu, Aug 24, 2023 at 10:40:45AM -0400, Mathieu Desnoyers wrote:
> > > [...]
> > > > > > - task migrations dropped with this series for nr_group=20 and 32
> > > > > >     according to 'perf stat'. migration number didn't drop for nr_group=10
> > > > > >     but the two update functions' cost dropped which means fewer access to
> > > > > >     tg->load_avg and thus, fewer task migrations. This is contradictory
> > > > > >     and I can not explain yet;
> > > > > 
> > > > > Neither can I.
> > > > > 
> > > 
> > > [...]
> > > 
> > > > > 
> > > > > > It's not clear to me why this series can reduce task migrations. I doubt
> > > > > > it has something to do with more wakelist style wakeup becasue for this
> > > > > > test machine, only a single core with two SMT threads share L2 so more
> > > > > > wakeups are through wakelist. In wakelist style wakeup, the target rq's
> > > > > > ttwu_pending is set and that will make the target cpu as !idle_cpu();
> > > > > > This is faster than grabbing the target rq's lock and then increase
> > > > > > target rq's nr_running or set target rq's curr to something else than
> > > > > > idle. So wakelist style wakeup can make target cpu appear as non idle
> > > > > > faster, but I can't connect this with reduced migration yet, I just feel
> > > > > > this might be the reason why task migration reduced.
> > > > > 
> > > > 
> > > [...]
> > > > > I've tried adding checks for rq->ttwu_pending in those code paths on top of
> > > > > my patch and I'm still observing the reduction in number of migrations, so
> > > > > it's unclear to me how doing more queued wakeups can reduce migrations the
> > > > > way it does.
> > > > 
> > > > An interesting puzzle.
> > > 
> > > One metric that can help understand the impact of my patch: comparing
> > > hackbench from a baseline where only your load_avg patch is applied
> > > to a kernel with my l2c patch applied, I notice that the goidle
> > > schedstat is cut in half. For a given CPU (they are pretty much alike),
> > > it goes from 650456 to 353487.
> > > 
> > > So could it be that by doing queued wakeups, we end up batching
> > > execution of the woken up tasks for a given CPU, rather than going
> > > back and forth between idle and non-idle ? One important thing that
> > > this changes is to reduce the number of newidle balance triggered.
> > 
> > I noticed the majority(>99%) migrations are from wakeup path on this
> > Intel SPR when running hackbench: ttwu() -> set_task_cpu() ->
> > migrate_task_rq_fair(), so while I think it's a good finding that
> > newidle balance dropped, it's probably not the reason why migration
> > number dropped...
> 
> I profiled select_idle_sibling() and found that with this series,
> select_idle_cpu() tends to fail more and select_idle_sibling() fallbacks
> to use target in the end, which equals to prev_cpu very often.
> 
> Initially I think the reason why select_idle_cpu() failed more with this
> series is because "wake_list style enqueue" can make the target cpu appear
> as busy earlier and thus, it will be harder for select_idle_cpu() to
> find an idle cpu overall. But I also suspect SIS_UTIL makes a difference
> here: in vanilla kernel, the idle% is 8% and with this series, the idle%
> is only 2% and SIS_UTIL may simply skip doing any search for idle cpu.
> Anyway, I think I'll also need to profile select_idle_cpu() to see
> what's going on there too.

Looks like the reduction in task migration is due to SIS_UTIL, i.e.
select_idle_cpu() aborts a lot more after applying this series because
system utilization increased.

Here are some numbers:
                 @sis       @sic     @migrate_idle_cpu  @abort
vanilla:       24640640   15883958     11913588          4148649
this_series:   22345434   18597564      4294995         14319284

note:
- @sis: number of times select_idle_sibling() called;
- @sic: number of times select_idle_cpu() called;
- @migrate_idle_cpu: number of times task migrated due to
  select_idle_cpu() found an idle cpu that is different from prev_cpu;
- @abort: number of times select_idle_cpu() aborts the search due to
  SIS_UTIL.

All numbers are captured during a 5s window while running the below
workload on a 2 sockets Intel SPR(56 cores, 112 threads per socket):
hackbench -g 20 -f 20 --pipe --threads -l 480000 -s 100

So for this workload, I think this series is doing something good: it
increased system utilization and due to SIS_UTIL, it also reduced task
migration where task migration isn't very useful since system is already
overloaded.

Thanks,
Aaron

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ