linux-kernel - [RFC PATCH 0/5] sched: cpu parked and push current task mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250523181448.3777233-1-sshegde@linux.ibm.com>
Date: Fri, 23 May 2025 23:44:43 +0530
From: Shrikanth Hegde <sshegde@...ux.ibm.com>
To: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, tglx@...utronix.de, yury.norov@...il.com,
        maddy@...ux.ibm.com
Cc: sshegde@...ux.ibm.com, vschneid@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, jstultz@...gle.com, kprateek.nayak@....com,
        huschle@...ux.ibm.com, srikar@...ux.ibm.com,
        linux-kernel@...r.kernel.org, linux@...musvillemoes.dk
Subject: [RFC PATCH 0/5] sched: cpu parked and push current task mechanism

In a para-virtualised environment, there could be multiple
overcommitted VMs. i.e sum of virtual CPUs(vCPU) > physical CPU(pCPU). 
When all such VMs request for cpu cycles at the same, it is not possible
to serve all of them. This leads to VM level preemptions and hence the
steal time. 

Bring the notion of CPU parked state which implies underlying pCPU may
not be available for use at this time. This means it is better to avoid
this vCPU. So when a CPU is marked as parked, one should vacate it as
soon as it can. So it is going to dynamic at runtime and can change
often.

In general, task level preemption(driven by VM) is less expensive than VM
level preemption(driven by hypervisor). So pack to less CPUs helps to
improve the overall workload throughput/latency. 

Architecture needs to decide which CPUs are parked. Currently we are
exploring getting the hint from the stolen time and hypervisor provided 
statistics. There is simple powerpc debug patch which shows how one can
make use of it cpu parked feature. 

cpu parking and need for cpu parking has been explained here as well [1]. Much
of the context explained in the cover letter there applies to this
problem context as well. 
[1]: https://lore.kernel.org/all/20250512115325.30022-1-huschle@linux.ibm.com/

While trying the above method, on large system (480 vCPUS) it was taking 
around 8-10 seconds for workload to move. Which is a longer time, 
so this approach, where workload moves within 1-2 seconds

Pros: 
- Once tasks move, no load balancer overheads 
- Less need for stats. minimal load balancer changes. 
- Faster. Since it is based on sched_tick
- system maintains a state of parked cpus. Other subsystems may find it
  useful. 

Cons:
- stop machine based to move the current task. So couldn't move it
  before it gets scheduled.  
- Depends on CONFIG_HOTPLUG_CPU since it is relying on __balance_push_cpu_stop
  (might not be a big concern)

Sending this out to get feedback on the idea. This mechanism
seems lightweight and fast. There are other push task related patches
sent for EAS[2], and newidle balance[3]. Maybe it is time to come up push task
framework that each one can make use of. Need to dig more into it[4]. 
Need to address RT, DL, IRQ, taskset concerns still. There maybe
subtle races too(no warn/bugs on console while testing cfs tasks) 

[2]: https://lore.kernel.org/all/20250302210539.1563190-1-vincent.guittot@linaro.org/
[3]: https://lore.kernel.org/lkml/20250409111539.23791-1-kprateek.nayak@amd.com/
[4]: https://lore.kernel.org/all/xhsmh1putoxbz.mognet@vschneid-thinkpadt14sgen2i.remote.csb/

Based on tip/master  at fa95dea97bd1 (Merge branch into tip/master: 'perf/core')

Shrikanth Hegde (5):
  cpumask: Introduce cpu parked mask
  sched/core: Don't use parked cpu for selection
  sched/fair: Don't use parked cpu for load balancing
  sched/core: Push current task when cpu is parked
  powerpc: Use manual hint for cpu parking

 arch/powerpc/kernel/smp.c | 45 +++++++++++++++++++++++++++++++++++++++
 include/linux/cpumask.h   | 14 ++++++++++++
 kernel/cpu.c              |  3 +++
 kernel/sched/core.c       | 43 +++++++++++++++++++++++++++++++++++--
 kernel/sched/fair.c       |  1 +
 kernel/sched/sched.h      |  1 +
 6 files changed, 105 insertions(+), 2 deletions(-)

-- 
2.39.3