lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <15328540-0c0a-4076-8ec8-77661b984fba@linux.ibm.com>
Date: Thu, 26 Jun 2025 20:16:36 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: Hillf Danton <hdanton@...a.com>
Cc: peterz@...radead.org, kprateek.nayak@....com, linux-kernel@...r.kernel.org
Subject: Re: [RFC v2 1/9] sched/docs: Document avoid_cpu_mask and avoid CPU
 concept

Hi Hillf.

> On Thu, 26 Jun 2025 00:41:00 +0530 Shrikanth Hegde wrote
>> This describes what avoid CPU means and what scheduler aims to do
>> when a CPU is marked as avoid.
>>
>> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
>> ---
>>   Documentation/scheduler/sched-arch.rst | 25 +++++++++++++++++++++++++
>>   1 file changed, 25 insertions(+)
>>
>> diff --git a/Documentation/scheduler/sched-arch.rst b/Documentation/scheduler/sched-arch.rst
>> index ed07efea7d02..d32755298fca 100644
>> --- a/Documentation/scheduler/sched-arch.rst
>> +++ b/Documentation/scheduler/sched-arch.rst
>> @@ -62,6 +62,31 @@ Your cpu_idle routines need to obey the following rules:
>>   arch/x86/kernel/process.c has examples of both polling and
>>   sleeping idle functions.
>>   
>> +CPU Avoid
>> +=========
>> +
>> +Under paravirt conditions it is possible to overcommit CPU resources.
>> +i.e sum of virtual CPU(vCPU) of all VM is greater than number of physical
>> +CPUs(pCPU). Under such conditions when all or many VM have high utilization,
>> +hypervisor won't be able to satisfy the requirement and has to context switch
>> +within or across VM. VM level context switch is more expensive compared to
>> +task context switch within the VM.
>> +
> Sounds like VMs not well configured (or pCPUs not well partationed).

No. That's how VMs under paravirtulized case configured as i understand.
Correct me if i am wrong.

On powerpc, we have Shared Processor Logical partitions (SPLPAR) which allows overcommit.
When other LPAR(VM) are idle, by having overcommit one could get more work done. This allows one
to configure more VMs too. The said issue happens only when every/most VMs ask for
CPU at the same time.

> 
>> +In such cases it is better that VM's co-ordinate among themselves and ask for
>> +less CPU request by not using some of the vCPUs. Such vCPUs where workload
>> +can be avoided at the moment are called as "Avoid CPUs". Note that when the
>> +pCPU contention goes away, these vCPUs can be used again by the workload.
>> +
> In the car cockpit scenario for example with type1 hypervisor, there is app
> kicking watchdog bound to every vCPU, so no vCPU should be avoided.

I don't understand what is meant here. Any reference links? Also in such cases,
arch shouldn't set any CPU as avoid. But it may not get this feature benefit.

> 
>> +Arch need to set/unset the vCPU as avoid in cpu_avoid_mask. When set, avoid
>> +the CPU and when unset, use it as usual.
>> +
>> +Scheduler will try to avoid those CPUs as much as it can.
>> +This is achived by
>> +1. Not selecting those CPU at wakeup.
>> +2. Push the task away from avoid CPU at tick.
>> +3. Not selecting avoid CPU at load balance.
>> +
>> +This works only for SCHED_RT and SCHED_NORMAL.
>>   
> Sounds like forcing a pill down through Peter's throat because Steve's headache.

I meant, this series till now address only RT and NORMAL. It could be made work for other classes too.
But i didn't see a point.

Since the mask is available, SCHED_EXT one could design their BPF hooks accordingly and SCHED_DL isn't designed to
work under such conditions. I don't know any user/workload which deploys SCHED_DL in CPU over-commited cases.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ