linux-kernel - Re: [RFC v2 1/9] sched/docs: Document avoid_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250626062749.1854-1-hdanton@sina.com>
Date: Thu, 26 Jun 2025 14:27:40 +0800
From: Hillf Danton <hdanton@...a.com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>
Cc: peterz@...radead.org,
	kprateek.nayak@....com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC v2 1/9] sched/docs: Document avoid_cpu_mask and avoid CPU concept

On Thu, 26 Jun 2025 00:41:00 +0530 Shrikanth Hegde wrote
> This describes what avoid CPU means and what scheduler aims to do 
> when a CPU is marked as avoid. 
> 
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> ---
>  Documentation/scheduler/sched-arch.rst | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/Documentation/scheduler/sched-arch.rst b/Documentation/scheduler/sched-arch.rst
> index ed07efea7d02..d32755298fca 100644
> --- a/Documentation/scheduler/sched-arch.rst
> +++ b/Documentation/scheduler/sched-arch.rst
> @@ -62,6 +62,31 @@ Your cpu_idle routines need to obey the following rules:
>  arch/x86/kernel/process.c has examples of both polling and
>  sleeping idle functions.
>  
> +CPU Avoid
> +=========
> +
> +Under paravirt conditions it is possible to overcommit CPU resources.
> +i.e sum of virtual CPU(vCPU) of all VM is greater than number of physical
> +CPUs(pCPU). Under such conditions when all or many VM have high utilization,
> +hypervisor won't be able to satisfy the requirement and has to context switch
> +within or across VM. VM level context switch is more expensive compared to
> +task context switch within the VM.
> +
Sounds like VMs not well configured (or pCPUs not well partationed).

> +In such cases it is better that VM's co-ordinate among themselves and ask for
> +less CPU request by not using some of the vCPUs. Such vCPUs where workload
> +can be avoided at the moment are called as "Avoid CPUs". Note that when the
> +pCPU contention goes away, these vCPUs can be used again by the workload.
> +
In the car cockpit scenario for example with type1 hypervisor, there is app
kicking watchdog bound to every vCPU, so no vCPU should be avoided.

> +Arch need to set/unset the vCPU as avoid in cpu_avoid_mask. When set, avoid
> +the CPU and when unset, use it as usual.
> +
> +Scheduler will try to avoid those CPUs as much as it can.
> +This is achived by
> +1. Not selecting those CPU at wakeup.
> +2. Push the task away from avoid CPU at tick.
> +3. Not selecting avoid CPU at load balance.
> +
> +This works only for SCHED_RT and SCHED_NORMAL.
>  
Sounds like forcing a pill down through Peter's throat because Steve's headache.