[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7227822a-0b4a-47cc-af7f-190f6d3b3e07@amd.com>
Date: Thu, 11 Sep 2025 11:10:22 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Shrikanth Hegde <sshegde@...ux.ibm.com>, <mingo@...hat.com>,
<peterz@...radead.org>, <juri.lelli@...hat.com>,
<vincent.guittot@...aro.org>, <tglx@...utronix.de>, <yury.norov@...il.com>,
<maddy@...ux.ibm.com>, <linux-kernel@...r.kernel.org>,
<linuxppc-dev@...ts.ozlabs.org>, <gregkh@...uxfoundation.org>
CC: <vschneid@...hat.com>, <iii@...ux.ibm.com>, <huschle@...ux.ibm.com>,
<rostedt@...dmis.org>, <dietmar.eggemann@....com>, <vineeth@...byteword.org>,
<jgross@...e.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>
Subject: Re: [RFC PATCH v3 07/10] sched/core: Push current task from paravirt
CPU
Hello Shrikanth,
On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
> Actively push out any task running on a paravirt CPU. Since the task is
> running on the CPU need to spawn a stopper thread and push the task out.
>
> If task is sleeping, when it wakes up it is expected to move out. In
> case it still chooses a paravirt CPU, next tick will move it out.
> However, if the task in pinned only to paravirt CPUs, it will continue
> running there.
>
> Though code is almost same as __balance_push_cpu_stop and quite close to
> push_cpu_stop, it provides a cleaner implementation w.r.t to PARAVIRT
> config.
>
> Add push_task_work_done flag to protect pv_push_task_work buffer. This has
> been placed at the empty slot available considering 64/128 byte
> cacheline.
>
> This currently works only FAIR and RT.
EXT can perhaps use the ops->cpu_{release,acquire}() if they are
interested in this.
>
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.ibm.com>
> ---
> kernel/sched/core.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/sched.h | 9 ++++-
> 2 files changed, 92 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 279b0dd72b5e..1f9df5b8a3a2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5629,6 +5629,10 @@ void sched_tick(void)
>
> sched_clock_tick();
>
> + /* push the current task out if a paravirt CPU */
> + if (is_cpu_paravirt(cpu))
> + push_current_from_paravirt_cpu(rq);
Does this mean paravirt CPU is capable of handling an interrupt but may
not be continuously available to run a task? Or is the VMM expected to set
the CPU on the paravirt mask and give the vCPU sufficient time to move the
task before yanking it away from the pCPU?
> +
> rq_lock(rq, &rf);
> donor = rq->donor;
>
> @@ -10977,4 +10981,84 @@ void sched_enq_and_set_task(struct sched_enq_and_set_ctx *ctx)
> struct cpumask __cpu_paravirt_mask __read_mostly;
> EXPORT_SYMBOL(__cpu_paravirt_mask);
> DEFINE_STATIC_KEY_FALSE(cpu_paravirt_push_tasks);
> +
> +static DEFINE_PER_CPU(struct cpu_stop_work, pv_push_task_work);
> +
> +static int paravirt_push_cpu_stop(void *arg)
> +{
> + struct task_struct *p = arg;
Can we move all pushable tasks at once instead of just the rq->curr at
the time of the tick? It can also avoid keeping the reference to "p"
and only selectively pushing it. Thoughts?
> + struct rq *rq = this_rq();
> + struct rq_flags rf;
> + int cpu;
> +
> + raw_spin_lock_irq(&p->pi_lock);
> + rq_lock(rq, &rf);
> + rq->push_task_work_done = 0;
> +
> + update_rq_clock(rq);
> +
> + if (task_rq(p) == rq && task_on_rq_queued(p)) {
> + cpu = select_fallback_rq(rq->cpu, p);
> + rq = __migrate_task(rq, &rf, p, cpu);
> + }
> +
> + rq_unlock(rq, &rf);
> + raw_spin_unlock_irq(&p->pi_lock);
> + put_task_struct(p);
> +
> + return 0;
> +}
> +
> +/* A CPU is marked as Paravirt when there is contention for underlying
> + * physical CPU and using this CPU will lead to hypervisor preemptions.
> + * It is better not to use this CPU.
> + *
> + * In case any task is scheduled on such CPU, move it out. In
> + * select_fallback_rq a non paravirt CPU will be chosen and henceforth
> + * task shouldn't come back to this CPU
> + */
> +void push_current_from_paravirt_cpu(struct rq *rq)
> +{
> + struct task_struct *push_task = rq->curr;
> + unsigned long flags;
> + struct rq_flags rf;
> +
> + if (!is_cpu_paravirt(rq->cpu))
> + return;
> +
> + /* Idle task can't be pused out */
> + if (rq->curr == rq->idle)
> + return;
> +
> + /* Do for only SCHED_NORMAL AND RT for now */
> + if (push_task->sched_class != &fair_sched_class &&
> + push_task->sched_class != &rt_sched_class)
> + return;
> +
> + if (kthread_is_per_cpu(push_task) ||
> + is_migration_disabled(push_task))
> + return;
> +
> + /* Is it affine to only paravirt cpus? */
> + if (cpumask_subset(push_task->cpus_ptr, cpu_paravirt_mask))
> + return;
> +
> + /* There is already a stopper thread for this. Dont race with it */
> + if (rq->push_task_work_done == 1)
> + return;
> +
> + local_irq_save(flags);
> + preempt_disable();
Disabling IRQs implies preemption is disabled.
> +
> + get_task_struct(push_task);
> +
> + rq_lock(rq, &rf);
> + rq->push_task_work_done = 1;
> + rq_unlock(rq, &rf);
> +
> + stop_one_cpu_nowait(rq->cpu, paravirt_push_cpu_stop, push_task,
> + this_cpu_ptr(&pv_push_task_work));
> + preempt_enable();
> + local_irq_restore(flags);
> +}
> #endif
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists