linux-kernel - Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle balance based on system utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZJWtM1rgC52twqPS@chenyu5-mobl2.ccr.corp.intel.com>
Date:   Fri, 23 Jun 2023 22:33:23 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "Dietmar Eggemann" <dietmar.eggemann@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle
 balance based on system utilization

Hi Peter,
On 2023-06-21 at 13:17:21 +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> > When CPU is about to enter idle, it invokes newidle_balance() to pull
> > some tasks from other runqueues. Although there is per domain
> > max_newidle_lb_cost to throttle the newidle_balance(), it would be
> > good to further limit the scan based on overall system utilization.
> > The reason is that there is no limitation for newidle_balance() to
> > launch this balance simultaneously on multiple CPUs. Since each
> > newidle_balance() has to traverse all the CPUs to calculate the
> > statistics one by one, this total time cost on newidle_balance()
> > could be O(n^2). This is not good for performance or power saving.
> 
> Another possible solution is to keep struct sg_lb_stats in
> sd->child->shared (below the NUMA domains) and put a lock around it.
> 
> Then have update_sd_lb_stats() do something like:
> 
> 	struct sg_lb_stats *sgs = &sds->sgs;
> 
> 	if (raw_spin_trylock(&sds->sg_lock)) {
> 		struct sg_lb_stats tmp;
> 
> 		... collect tmp
> 
> 		sds->sgs = tmp;
> 		raw_spin_unlock(&sds->sg_lock);
> 	}
> 
> 	... use sgs
> 
> Then you know you've always got a 'recent' copy but avoid the concurrent
> updates.
Thanks for taking a look and gave the suggestions! Yes, this is a good idea, by
doing this we can further limit the number of CPU to do update in parallel, and
allow the newidle CPU to reuse the data for idle load balance from others.
This lock only allow 1 CPU in that domain to iterate the whole group, and the
bottleneck might reply on how fast the CPU who grabs the lock can finish
collecting the tmp sgs data. For MC domain, it would not take too much time, and for
higher domains between MC and NUMA domain, it depends on how many CPUs there are in that
domain. I'll create one prototype based on your suggestion and measure the test data.

thanks,
Chenyu