linux-kernel - Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJWvpi5VZtxnyDg8@chenyu5-mobl2.ccr.corp.intel.com>
Date:   Fri, 23 Jun 2023 22:43:50 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "Dietmar Eggemann" <dietmar.eggemann@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle
 balance based on system utilization

On 2023-06-23 at 22:33:23 +0800, Chen Yu wrote:
> Hi Peter,
> On 2023-06-21 at 13:17:21 +0200, Peter Zijlstra wrote:
> > On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> > > When CPU is about to enter idle, it invokes newidle_balance() to pull
> > > some tasks from other runqueues. Although there is per domain
> > > max_newidle_lb_cost to throttle the newidle_balance(), it would be
> > > good to further limit the scan based on overall system utilization.
> > > The reason is that there is no limitation for newidle_balance() to
> > > launch this balance simultaneously on multiple CPUs. Since each
> > > newidle_balance() has to traverse all the CPUs to calculate the
> > > statistics one by one, this total time cost on newidle_balance()
> > > could be O(n^2). This is not good for performance or power saving.
> > 
> > Another possible solution is to keep struct sg_lb_stats in
> > sd->child->shared (below the NUMA domains) and put a lock around it.
> > 
> > Then have update_sd_lb_stats() do something like:
> > 
> > 	struct sg_lb_stats *sgs = &sds->sgs;
> > 
> > 	if (raw_spin_trylock(&sds->sg_lock)) {
> > 		struct sg_lb_stats tmp;
> > 
> > 		... collect tmp
> > 
> > 		sds->sgs = tmp;
> > 		raw_spin_unlock(&sds->sg_lock);
> > 	}
> > 
> > 	... use sgs
> > 
> > Then you know you've always got a 'recent' copy but avoid the concurrent
> > updates.
> Thanks for taking a look and gave the suggestions! Yes, this is a good idea, by
> doing this we can further limit the number of CPU to do update in parallel, and
> allow the newidle CPU to reuse the data for idle load balance from others.
> This lock only allow 1 CPU in that domain to iterate the whole group, and the
> bottleneck might reply on how fast the CPU who grabs the lock can finish
> collecting the tmp sgs data. For MC domain, it would not take too much time, and for
> higher domains between MC and NUMA domain, it depends on how many CPUs there are in that
> domain.
I just realized that it's a trylock, so it should not block other CPUs who launch
the idle balance, but just to let 1 CPUs update the 'snapshot' at one time.
I'll do some tests.

thanks,
Chenyu
> I'll create one prototype based on your suggestion and measure the test data.
> 
> thanks,
> Chenyu