linux-kernel - Re: [CHANGE 1/2] sched/isolation: Make use of more than one housekeeping cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250218153039.GD547103@pauld.westford.csb>
Date: Tue, 18 Feb 2025 10:30:39 -0500
From: Phil Auld <pauld@...hat.com>
To: Waiman Long <llong@...hat.com>
Cc: Vishal Chourasia <vishalc@...ux.ibm.com>,
	Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Frederic Weisbecker <frederic@...nel.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [CHANGE 1/2] sched/isolation: Make use of more than one
 housekeeping cpu

On Tue, Feb 18, 2025 at 10:23:50AM -0500 Waiman Long wrote:
> 
> On 2/18/25 10:00 AM, Phil Auld wrote:
> > Hi Vishal.
> > 
> > On Fri, Feb 14, 2025 at 11:08:19AM +0530 Vishal Chourasia wrote:
> > > Hi Phil, Vineeth
> > > 
> > > On Thu, Feb 13, 2025 at 09:26:53AM -0500, Phil Auld wrote:
> > > > On Thu, Feb 13, 2025 at 10:14:04AM +0530 Madadi Vineeth Reddy wrote:
> > > > > Hi Phil Auld,
> > > > > 
> > > > > On 11/02/25 19:31, Phil Auld wrote:
> > > > > > The exising code uses housekeeping_any_cpu() to select a cpu for
> > > > > > a given housekeeping task. However, this often ends up calling
> > > > > > cpumask_any_and() which is defined as cpumask_first_and() which has
> > > > > > the effect of alyways using the first cpu among those available.
> > > > > > 
> > > > > > The same applies when multiple NUMA nodes are involved. In that
> > > > > > case the first cpu in the local node is chosen which does provide
> > > > > > a bit of spreading but with multiple HK cpus per node the same
> > > > > > issues arise.
> > > > > > 
> > > > > > Spread the HK work out by having housekeeping_any_cpu() and
> > > > > > sched_numa_find_closest() use cpumask_any_and_distribute()
> > > > > > instead of cpumask_any_and().
> > > > > > 
> > > > > Got the overall intent of the patch for better load distribution on
> > > > > housekeeping tasks. However, one potential drawback could be that by
> > > > > spreading HK work across multiple CPUs might reduce the time that
> > > > > some cores can spend in deeper idle states which can be beneficial for
> > > > > power-sensitive systems.
> > > > > 
> > > > > Thoughts?
> > > > NOHZ_full setups are not generally used in power sensitive systems I think.
> > > > They aren't in our use cases at least.
> > > > 
> > > > In cases with many cpus a single housekeeping cpu can not keep up. Having
> > > > other HK cpus in deep idle states while the one in use is overloaded is
> > > > not a win.
> > > To me, an overloaded CPU sounds like where more than one tasks are ready
> > > to run, and a HK CPU is one receiving periodic scheduling clock
> > > ticks, so HP CPU is bound to comes out of any power-saving state it is in.
> > If the overload is caused by HK and interrupts there is nothing in the
> > system to help. Tasks, sure, can get load balanced.
> > 
> > And as you say, the HK cpus will have generally ticks happening anyway.
> > 
> > > > If your single HK cpu can keep up then only configure that one HK cpu.
> > > > The others will go idle and stay there.  And since they are nohz_full
> > > > might get to stay idle even longer.
> > > While it is good to distribute the load across each HK CPU in the HK
> > > cpumask (queuing jobs on different CPUs each time), this can cause
> > > jitter in virtualized environments. Unnecessaryily evicting other
> > > tenants, when it's better to overload a VP than to wake up other VPs of a
> > > tenant.
> > > 
> > Sorry I'm not sure I understand your setup. Are your running virtual
> > tenants on the HK cpus?  nohz_full in the guests? Maybe you only need
> > on HK then it won't matter.
> > 
> > My concern is that currently there is no point in having more than
> > one HK cpu (per node in a NUMA case). The code as currently implemented
> > is just not doing what it needs to.
> > 
> > We have numerous cases where a single HK cpu just cannot keep up and
> > the remote_tick warning fires. It also can lead to the other things
> > (orchastration sw, HA keepalives etc) on the HK cpus getting starved
> > which leads to other issues.  In these cases we recommend increasing
> > the number of HK cpus.  But... that only helps the userspace tasks
> > somewhat. It does not help the actual housekeeping part.
> 
> That is the part that should go into the commit log as well as it is the
> rationale behind your patch.
>

Sure, I can add that piece and resend.


Cheers,
Phil


> Cheers,
> Longman
> 

--