linux-kernel - Re: [RFC PATCH 0/7] Optimization to reduce the cost of newidle balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZplJaymu/WQ7O5xC@chenyu5-mobl2>
Date: Fri, 19 Jul 2024 00:57:15 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Ingo Molnar
	<mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>, Tim Chen
	<tim.c.chen@...el.com>, Mel Gorman <mgorman@...hsingularity.net>, "Dietmar
 Eggemann" <dietmar.eggemann@....com>, K Prateek Nayak
	<kprateek.nayak@....com>, "Gautham R . Shenoy" <gautham.shenoy@....com>,
	"Chen Yu" <yu.chen.surf@...il.com>, Aaron Lu <aaron.lu@...el.com>,
	<linux-kernel@...r.kernel.org>, <void@...ifault.com>
Subject: Re: [RFC PATCH 0/7] Optimization to reduce the cost of newidle
 balance

Hi Peter,

On 2024-07-17 at 14:17:45 +0200, Peter Zijlstra wrote:
> On Thu, Jul 27, 2023 at 10:33:58PM +0800, Chen Yu wrote:
> > Hi,
> > 
> > This is the second version of the newidle balance optimization[1].
> > It aims to reduce the cost of newidle balance which is found to
> > occupy noticeable CPU cycles on some high-core count systems.
> > 
> > For example, when running sqlite on Intel Sapphire Rapids, which has
> > 2 x 56C/112T = 224 CPUs:
> > 
> > 6.69%    0.09%  sqlite3     [kernel.kallsyms]   [k] newidle_balance
> > 5.39%    4.71%  sqlite3     [kernel.kallsyms]   [k] update_sd_lb_stats
> > 
> > To mitigate this cost, the optimization is inspired by the question
> > raised by Tim:
> > Do we always have to find the busiest group and pull from it? Would
> > a relatively busy group be enough?
> 
> So doesn't this basically boil down to recognising that new-idle might
> not be the same as regular load-balancing -- we need any task, fast,
> rather than we need to make equal load.
>

Yes, exactly.

> David's shared runqueue patches did the same, they re-imagined this very
> path.
> 
> Now, David's thing went side-ways because of some regression that wasn't
> further investigated.
> 
> But it occurs to me this might be the same thing that Prateek chased
> down here:
> 
>   https://lkml.kernel.org/r/20240710090210.41856-1-kprateek.nayak@amd.com
> 
> Hmm ?
>

Thanks for the patch link. I took a look and if I understand correctly,
Prateek's patch fixes three issues related to TIF_POLLING_NRFLAG.
And the following two issues might cause aggressive newidle balance:

1. normal idle load balance does not have a chance to be triggered
   when exiting the idle loop. Since normal idle load balance does not
   work, we have to count on newidle balance to do more work.

2. newly idle load balance is incorrectly triggered when exiting from
   idle due to send_ipi(), even there is no task about to sleep.

Issue 2 will increase the frequency of invoking newly idle balance,
but issue 1 would not. Issue 1 mainly impacts the success ratio
of each newidle balance, but might not increase the frequency
to trigger a newidle balance - it should mainly depend on the behavior
of task runtime duration. Please correct me if I'm wrong.

All Prateek's 3 patches fix the existing newidle balance issue, I'll apply
his patch set and have a re-test.

> Supposing that is indeed the case, I think it makes more sense to
> proceed with that approach. That is, completely redo the sub-numa new
> idle balance.
> 

I did not quite follow this, Prateek's patch set does not redo the sub-numa new
idle balance I suppose? Or do you mean further work based on Prateek's patch set?

thanks,
Chenyu