linux-kernel - Re: [RFC PATCH v3 07/10] sched/core: Push current task from paravirt CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7227822a-0b4a-47cc-af7f-190f6d3b3e07@amd.com>
Date: Thu, 11 Sep 2025 11:10:22 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, <mingo@...hat.com>,
	<peterz@...radead.org>, <juri.lelli@...hat.com>,
	<vincent.guittot@...aro.org>, <tglx@...utronix.de>, <yury.norov@...il.com>,
	<maddy@...ux.ibm.com>, <linux-kernel@...r.kernel.org>,
	<linuxppc-dev@...ts.ozlabs.org>, <gregkh@...uxfoundation.org>
CC: <vschneid@...hat.com>, <iii@...ux.ibm.com>, <huschle@...ux.ibm.com>,
	<rostedt@...dmis.org>, <dietmar.eggemann@....com>, <vineeth@...byteword.org>,
	<jgross@...e.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>
Subject: Re: [RFC PATCH v3 07/10] sched/core: Push current task from paravirt
 CPU

Hello Shrikanth,

On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
> Actively push out any task running on a paravirt CPU. Since the task is
> running on the CPU need to spawn a stopper thread and push the task out.
> 
> If task is sleeping, when it wakes up it is expected to move out. In
> case it still chooses a paravirt CPU, next tick will move it out.
> However, if the task in pinned only to paravirt CPUs, it will continue
> running there.
> 
> Though code is almost same as __balance_push_cpu_stop and quite close to
> push_cpu_stop, it provides a cleaner implementation w.r.t to PARAVIRT
> config.
> 
> Add push_task_work_done flag to protect pv_push_task_work buffer. This has
> been placed at the empty slot available considering 64/128 byte
> cacheline.
> 
> This currently works only FAIR and RT.

EXT can perhaps use the ops->cpu_{release,acquire}() if they are
interested in this.

> 
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> ---
>  kernel/sched/core.c  | 84 ++++++++++++++++++++++++++++++++++++++++++++
>  kernel/sched/sched.h |  9 ++++-
>  2 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 279b0dd72b5e..1f9df5b8a3a2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5629,6 +5629,10 @@ void sched_tick(void)
>  
>  	sched_clock_tick();
>  
> +	/* push the current task out if a paravirt CPU */
> +	if (is_cpu_paravirt(cpu))
> +		push_current_from_paravirt_cpu(rq);

Does this mean paravirt CPU is capable of handling an interrupt but may
not be continuously available to run a task? Or is the VMM expected to set
the CPU on the paravirt mask and give the vCPU sufficient time to move the
task before yanking it away from the pCPU?

> +
>  	rq_lock(rq, &rf);
>  	donor = rq->donor;
>  
> @@ -10977,4 +10981,84 @@ void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx)
>  struct cpumask __cpu_paravirt_mask __read_mostly;
>  EXPORT_SYMBOL(__cpu_paravirt_mask);
>  DEFINE_STATIC_KEY_FALSE(cpu_paravirt_push_tasks);
> +
> +static DEFINE_PER_CPU(struct cpu_stop_work, pv_push_task_work);
> +
> +static int paravirt_push_cpu_stop(void *arg)
> +{
> +	struct task_struct *p = arg;

Can we move all pushable tasks at once instead of just the rq->curr at
the time of the tick? It can also avoid keeping the reference to "p"
and only selectively pushing it. Thoughts?

> +	struct rq *rq = this_rq();
> +	struct rq_flags rf;
> +	int cpu;
> +
> +	raw_spin_lock_irq(&p->pi_lock);
> +	rq_lock(rq, &rf);
> +	rq->push_task_work_done = 0;
> +
> +	update_rq_clock(rq);
> +
> +	if (task_rq(p) == rq && task_on_rq_queued(p)) {
> +		cpu = select_fallback_rq(rq->cpu, p);
> +		rq = __migrate_task(rq, &rf, p, cpu);
> +	}
> +
> +	rq_unlock(rq, &rf);
> +	raw_spin_unlock_irq(&p->pi_lock);
> +	put_task_struct(p);
> +
> +	return 0;
> +}
> +
> +/* A CPU is marked as Paravirt when there is contention for underlying
> + * physical CPU and using this CPU will lead to hypervisor preemptions.
> + * It is better not to use this CPU.
> + *
> + * In case any task is scheduled on such CPU, move it out. In
> + * select_fallback_rq a non paravirt CPU will be chosen and henceforth
> + * task shouldn't come back to this CPU
> + */
> +void push_current_from_paravirt_cpu(struct rq *rq)
> +{
> +	struct task_struct *push_task = rq->curr;
> +	unsigned long flags;
> +	struct rq_flags rf;
> +
> +	if (!is_cpu_paravirt(rq->cpu))
> +		return;
> +
> +	/* Idle task can't be pused out */
> +	if (rq->curr == rq->idle)
> +		return;
> +
> +	/* Do for only SCHED_NORMAL AND RT for now */
> +	if (push_task->sched_class != &fair_sched_class &&
> +	    push_task->sched_class != &rt_sched_class)
> +		return;
> +
> +	if (kthread_is_per_cpu(push_task) ||
> +	    is_migration_disabled(push_task))
> +		return;
> +
> +	/* Is it affine to only paravirt cpus? */
> +	if (cpumask_subset(push_task->cpus_ptr, cpu_paravirt_mask))
> +		return;
> +
> +	/* There is already a stopper thread for this. Dont race with it */
> +	if (rq->push_task_work_done == 1)
> +		return;
> +
> +	local_irq_save(flags);
> +	preempt_disable();

Disabling IRQs implies preemption is disabled.

> +
> +	get_task_struct(push_task);
> +
> +	rq_lock(rq, &rf);
> +	rq->push_task_work_done = 1;
> +	rq_unlock(rq, &rf);
> +
> +	stop_one_cpu_nowait(rq->cpu, paravirt_push_cpu_stop, push_task,
> +			    this_cpu_ptr(&pv_push_task_work));
> +	preempt_enable();
> +	local_irq_restore(flags);
> +}
>  #endif
-- 
Thanks and Regards,
Prateek