lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Jun 2023 22:33:23 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        "Dietmar Eggemann" <dietmar.eggemann@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Abel Wu <wuyun.abel@...edance.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Len Brown <len.brown@...el.com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Yicong Yang <yangyicong@...ilicon.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 3/4] sched/fair: Calculate the scan depth for idle
 balance based on system utilization

Hi Peter,
On 2023-06-21 at 13:17:21 +0200, Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 12:18:57AM +0800, Chen Yu wrote:
> > When CPU is about to enter idle, it invokes newidle_balance() to pull
> > some tasks from other runqueues. Although there is per domain
> > max_newidle_lb_cost to throttle the newidle_balance(), it would be
> > good to further limit the scan based on overall system utilization.
> > The reason is that there is no limitation for newidle_balance() to
> > launch this balance simultaneously on multiple CPUs. Since each
> > newidle_balance() has to traverse all the CPUs to calculate the
> > statistics one by one, this total time cost on newidle_balance()
> > could be O(n^2). This is not good for performance or power saving.
> 
> Another possible solution is to keep struct sg_lb_stats in
> sd->child->shared (below the NUMA domains) and put a lock around it.
> 
> Then have update_sd_lb_stats() do something like:
> 
> 	struct sg_lb_stats *sgs = &sds->sgs;
> 
> 	if (raw_spin_trylock(&sds->sg_lock)) {
> 		struct sg_lb_stats tmp;
> 
> 		... collect tmp
> 
> 		sds->sgs = tmp;
> 		raw_spin_unlock(&sds->sg_lock);
> 	}
> 
> 	... use sgs
> 
> Then you know you've always got a 'recent' copy but avoid the concurrent
> updates.
Thanks for taking a look and gave the suggestions! Yes, this is a good idea, by
doing this we can further limit the number of CPU to do update in parallel, and
allow the newidle CPU to reuse the data for idle load balance from others.
This lock only allow 1 CPU in that domain to iterate the whole group, and the
bottleneck might reply on how fast the CPU who grabs the lock can finish
collecting the tmp sgs data. For MC domain, it would not take too much time, and for
higher domains between MC and NUMA domain, it depends on how many CPUs there are in that
domain. I'll create one prototype based on your suggestion and measure the test data.

thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ