lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 16 Jun 2023 14:17:00 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     "Gautham R. Shenoy" <gautham.shenoy@....com>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        "K Prateek Nayak" <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        "Yicong Yang" <yangyicong@...ilicon.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle
 balance based on system utilization

Hi Gautham,
On 2023-06-15 at 11:31:07 +0530, Gautham R. Shenoy wrote:
> Hello Chen Yu,
> 
> 
> On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> > When CPU is about to enter idle, it invokes newidle_balance() to pull
> > some tasks from other runqueues. Although there is per domain
> > max_newidle_lb_cost to throttle the newidle_balance(), it would be
> > good to further limit the scan based on overall system utilization.
> > The reason is that there is no limitation for newidle_balance() to
> > launch this balance simultaneously on multiple CPUs. Since each
> > newidle_balance() has to traverse all the CPUs to calculate the
> > statistics one by one, this total time cost on newidle_balance()
> > could be O(n^2). This is not good for performance or power saving.
> > 
> > For example, sqlite has spent quite some time on newidle balance()
> > on Intel Sapphire Rapids, which has 2 x 56C/112T = 224 CPUs:
> > 6.69%    0.09%  sqlite3     [kernel.kallsyms]   [k] newidle_balance
> > 5.39%    4.71%  sqlite3     [kernel.kallsyms]   [k] update_sd_lb_stats
> > 
> > Based on this observation, limit the scan depth of newidle_balance()
> > by considering the utilization of the LLC domain. Let the number of
> > scanned groups be a linear function of the utilization ratio:
> >
> 
> Is there any particular reason why this is being limited only to the
> LLC domain ?
> 
Thank you for your interest in this change. The original thought as you
might know is that our LLC has a huge number of groups.
And since SIS_UTIL has done similar thing for LLC, we want to propose
a simple prototype to demonstrate the idea.
> On architectures where the LLC domain may not be so large (POWER9/10,
> AMD), the additional cost is usually paid at the higher domains where
> the number of groups is greater / equal to the number of groups in the
> LLC domain and where sd_span is pretty large. It would be good to
> explore avoiding the scan cost on those domains as well, right?
> 
Exactly.
> > nr_groups_to_scan = nr_groups * (1 - util_ratio)
> 
> If we can extend this logic to higher domains, on a Zen3 1 Socket
> server with 128 CPUs at the DIE domain containing 8 groups, we can
> expect a significant reduction in the time spent doing
> update_sg_lb_stats() at higher utilizations.
> 
> util_ratio     nr_groups_to_scan        nr_cpus_scanned
> ========================================================
> 0.9              1                       16     (-87.5%)
> 0.75             2                       32     (-75%)
> 0.5              4                       64     (-50%)
> 0.25             6                       96     (-25%)
> 0.1              7                      112     (-12.5%) 
> 
> 
> On a Zen 4 1 socket server with 192 CPUs at the DIE domain containing
> 12 groups, values will be:
> 
> util_ratio     nr_groups_to_scan        nr_cpus_scanned
> ========================================================
> 0.9              1                       16     (-91%)
> 0.75             3                       48     (-75%)
> 0.5              6                       96     (-50%)
> 0.25             9                      144     (-25%)
> 0.1             10                      160     (-16.7%)
> 
Thanks for this information, I think we can extend the scan limitation
for not only LLC(MC domain) but also higher domains. I'll think about
it.

thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ