linux-kernel - Re: [RFC v2 1/9] sched/docs: Document avoid_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250628220230.2052-1-hdanton@sina.com>
Date: Sun, 29 Jun 2025 06:02:29 +0800
From: Hillf Danton <hdanton@...a.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
Cc: peterz@...radead.org,
	kprateek.nayak@....com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC v2 1/9] sched/docs: Document avoid_cpu_mask and avoid CPU concept

On Fri, 27 Jun 2025 10:07:22 +0530 Shrikanth Hegde wrote
> On 6/27/25 05:57, Hillf Danton wrote:
> > On Thu, 26 Jun 2025 20:16:36 +0530 Shrikanth Hegde wrote
> >>> On Thu, 26 Jun 2025 00:41:00 +0530 Shrikanth Hegde wrote
> >>>> This describes what avoid CPU means and what scheduler aims to do
> >>>> when a CPU is marked as avoid.
> >>>>
> >>>> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> >>>> ---
> >>>>    Documentation/scheduler/sched-arch.rst | 25 +++++++++++++++++++++++++
> >>>>    1 file changed, 25 insertions(+)
> >>>>
> >>>> diff --git a/Documentation/scheduler/sched-arch.rst b/Documentation/scheduler/sched-arch.rst
> >>>> index ed07efea7d02..d32755298fca 100644
> >>>> --- a/Documentation/scheduler/sched-arch.rst
> >>>> +++ b/Documentation/scheduler/sched-arch.rst
> >>>> @@ -62,6 +62,31 @@ Your cpu_idle routines need to obey the following rules:
> >>>>    arch/x86/kernel/process.c has examples of both polling and
> >>>>    sleeping idle functions.
> >>>>    
> >>>> +CPU Avoid
> >>>> +=========
> >>>> +
> >>>> +Under paravirt conditions it is possible to overcommit CPU resources.
> >>>> +i.e sum of virtual CPU(vCPU) of all VM is greater than number of physical
> >>>> +CPUs(pCPU). Under such conditions when all or many VM have high utilization,
> >>>> +hypervisor won't be able to satisfy the requirement and has to context switch
> >>>> +within or across VM. VM level context switch is more expensive compared to
> >>>> +task context switch within the VM.
> >>>> +
> >>> Sounds like VMs not well configured (or pCPUs not well partationed).
> >>
> >> No. That's how VMs under paravirtulized case configured as i understand.
> >> Correct me if i am wrong.
> >>
> >> On powerpc, we have Shared Processor Logical partitions (SPLPAR) which allows overcommit.
> >> When other LPAR(VM) are idle, by having overcommit one could get more work done. This allows one
> >> to configure more VMs too. The said issue happens only when every/most VMs ask for
> >> CPU at the same time.
> >>
> > After putting virtualization aside, lets see a simpler case where more
> > than 1024 apps are bound to a single (ppc having 4 CPUs for instance) CPU,
> > what can we do wrt app responsibility in kernel? 
> 
> In this case you will not likely have vCPU preemption. you will have 
> task preemption. That is ok. Patch doesn't aim to solve the case you 
> have mentioned above.
> 
It is a case of overcommit due to mis-config where scheduler does not
help simply because kernel is not the pill that kills all pains.

> In the generic SPLPAR configuration virtual processor usually have large 
> number of vCPUs and powerpc systems are fairly large in terms of CPU as 
> well.
>
Overcommit is not SPLPAR specific, nor PPC, because it is buggy for scheduler
to create overcommit on either PPC or Arm64.