linux-kernel - Re: [CHANGE 1/2] sched/isolation: Make use of more than one housekeeping cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9380df9c-a358-4466-91f4-b5b2c0cfcbbb@redhat.com>
Date: Tue, 18 Feb 2025 10:33:47 -0500
From: Waiman Long <llong@...hat.com>
To: Phil Auld <pauld@...hat.com>, Waiman Long <llong@...hat.com>
Cc: Vishal Chourasia <vishalc@...ux.ibm.com>,
 Madadi Vineeth Reddy <vineethr@...ux.ibm.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Frederic Weisbecker <frederic@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [CHANGE 1/2] sched/isolation: Make use of more than one
 housekeeping cpu

On 2/18/25 10:30 AM, Phil Auld wrote:
> On Tue, Feb 18, 2025 at 10:23:50AM -0500 Waiman Long wrote:
>> On 2/18/25 10:00 AM, Phil Auld wrote:
>>> Hi Vishal.
>>>
>>> On Fri, Feb 14, 2025 at 11:08:19AM +0530 Vishal Chourasia wrote:
>>>> Hi Phil, Vineeth
>>>>
>>>> On Thu, Feb 13, 2025 at 09:26:53AM -0500, Phil Auld wrote:
>>>>> On Thu, Feb 13, 2025 at 10:14:04AM +0530 Madadi Vineeth Reddy wrote:
>>>>>> Hi Phil Auld,
>>>>>>
>>>>>> On 11/02/25 19:31, Phil Auld wrote:
>>>>>>> The exising code uses housekeeping_any_cpu() to select a cpu for
>>>>>>> a given housekeeping task. However, this often ends up calling
>>>>>>> cpumask_any_and() which is defined as cpumask_first_and() which has
>>>>>>> the effect of alyways using the first cpu among those available.
>>>>>>>
>>>>>>> The same applies when multiple NUMA nodes are involved. In that
>>>>>>> case the first cpu in the local node is chosen which does provide
>>>>>>> a bit of spreading but with multiple HK cpus per node the same
>>>>>>> issues arise.
>>>>>>>
>>>>>>> Spread the HK work out by having housekeeping_any_cpu() and
>>>>>>> sched_numa_find_closest() use cpumask_any_and_distribute()
>>>>>>> instead of cpumask_any_and().
>>>>>>>
>>>>>> Got the overall intent of the patch for better load distribution on
>>>>>> housekeeping tasks. However, one potential drawback could be that by
>>>>>> spreading HK work across multiple CPUs might reduce the time that
>>>>>> some cores can spend in deeper idle states which can be beneficial for
>>>>>> power-sensitive systems.
>>>>>>
>>>>>> Thoughts?
>>>>> NOHZ_full setups are not generally used in power sensitive systems I think.
>>>>> They aren't in our use cases at least.
>>>>>
>>>>> In cases with many cpus a single housekeeping cpu can not keep up. Having
>>>>> other HK cpus in deep idle states while the one in use is overloaded is
>>>>> not a win.
>>>> To me, an overloaded CPU sounds like where more than one tasks are ready
>>>> to run, and a HK CPU is one receiving periodic scheduling clock
>>>> ticks, so HP CPU is bound to comes out of any power-saving state it is in.
>>> If the overload is caused by HK and interrupts there is nothing in the
>>> system to help. Tasks, sure, can get load balanced.
>>>
>>> And as you say, the HK cpus will have generally ticks happening anyway.
>>>
>>>>> If your single HK cpu can keep up then only configure that one HK cpu.
>>>>> The others will go idle and stay there.  And since they are nohz_full
>>>>> might get to stay idle even longer.
>>>> While it is good to distribute the load across each HK CPU in the HK
>>>> cpumask (queuing jobs on different CPUs each time), this can cause
>>>> jitter in virtualized environments. Unnecessaryily evicting other
>>>> tenants, when it's better to overload a VP than to wake up other VPs of a
>>>> tenant.
>>>>
>>> Sorry I'm not sure I understand your setup. Are your running virtual
>>> tenants on the HK cpus?  nohz_full in the guests? Maybe you only need
>>> on HK then it won't matter.
>>>
>>> My concern is that currently there is no point in having more than
>>> one HK cpu (per node in a NUMA case). The code as currently implemented
>>> is just not doing what it needs to.
>>>
>>> We have numerous cases where a single HK cpu just cannot keep up and
>>> the remote_tick warning fires. It also can lead to the other things
>>> (orchastration sw, HA keepalives etc) on the HK cpus getting starved
>>> which leads to other issues.  In these cases we recommend increasing
>>> the number of HK cpus.  But... that only helps the userspace tasks
>>> somewhat. It does not help the actual housekeeping part.
>> That is the part that should go into the commit log as well as it is the
>> rationale behind your patch.
>>
> Sure, I can add that piece and resend.

While at it, you can also add some text to address the other concerns 
that reviewers have so far.

Cheers,
Longman

>
>
> Cheers,
> Phil
>
>
>> Cheers,
>> Longman
>>