[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+CzsjNyd9-QjUupszpULNkJ31U+wPWC81A5jaTFRFdPfMg@mail.gmail.com>
Date: Thu, 13 Nov 2025 20:00:21 +0800
From: Wanpeng Li <kernellwp@...il.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>, Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>, Steven Rostedt <rostedt@...dmis.org>,
Vincent Guittot <vincent.guittot@...aro.org>, Juri Lelli <juri.lelli@...hat.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Wanpeng Li <wanpengli@...cent.com>
Subject: Re: [PATCH 02/10] sched/fair: Add rate-limiting and validation helpers
Hi Prateekļ¼
On Wed, 12 Nov 2025 at 14:40, K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello Wanpeng,
>
> On 11/10/2025 9:02 AM, Wanpeng Li wrote:
> > +/*
> > + * High-frequency yield gating to reduce overhead on compute-intensive workloads.
> > + * Returns true if the yield should be skipped due to frequency limits.
> > + *
> > + * Optimized: single threshold with READ_ONCE/WRITE_ONCE, refresh timestamp on every call.
> > + */
> > +static bool yield_deboost_rate_limit(struct rq *rq, u64 now_ns)
> > +{
> > + u64 last = READ_ONCE(rq->yield_deboost_last_time_ns);
> > + bool limited = false;
> > +
> > + if (last) {
> > + u64 delta = now_ns - last;
> > + limited = (delta <= 6000ULL * NSEC_PER_USEC);
> > + }
> > +
> > + WRITE_ONCE(rq->yield_deboost_last_time_ns, now_ns);
>
> We only look at local rq so READ_ONCE()/WRITE_ONCE() seems
> unnecessary.
You're right. Since we're under rq->lock and only accessing the local
rq's fields, READ_ONCE()/WRITE_ONCE() provide no benefit here. Will
simplify to direct access.
>
> > + return limited;
> > +}
> > +
> > +/*
> > + * Validate tasks and basic parameters for yield deboost operation.
> > + * Performs comprehensive safety checks including feature enablement,
> > + * NULL pointer validation, task state verification, and same-rq requirement.
> > + * Returns false with appropriate debug logging if any validation fails,
> > + * ensuring only safe and meaningful yield operations proceed.
> > + */
> > +static bool __maybe_unused yield_deboost_validate_tasks(struct rq *rq, struct task_struct *p_target,
> > + struct task_struct **p_yielding_out,
> > + struct sched_entity **se_y_out,
> > + struct sched_entity **se_t_out)
> > +{
> > + struct task_struct *p_yielding;
> > + struct sched_entity *se_y, *se_t;
> > + u64 now_ns;
> > +
> > + if (!sysctl_sched_vcpu_debooster_enabled)
> > + return false;
> > +
> > + if (!rq || !p_target)
> > + return false;
> > +
> > + now_ns = rq->clock;
>
> Brief look at Patch 5 suggests we are under the rq_lock so might
> as well use the rq_clock(rq) helper. Also, you have to do a
> update_rq_clock() since it isn't done until yield_task_fair().
Good catch. Since yield_to() holds rq_lock but doesn't call
update_rq_clock() before invoking yield_to_task(), I need to call
update_rq_clock(rq) at the start of yield_to_deboost() and use
rq_clock(rq) instead of direct rq->clock access. This ensures the
clock is current before rate limiting checks.
>
> > +
> > + if (yield_deboost_rate_limit(rq, now_ns))
> > + return false;
> > +
> > + p_yielding = rq->curr;
> > + if (!p_yielding || p_yielding == p_target ||
> > + p_target->sched_class != &fair_sched_class ||
> > + p_yielding->sched_class != &fair_sched_class)
> > + return false;
>
> yield_to() in syscall.c has already checked for the sched
> class matching under double_rq_lock. That cannot change by the
> time we are here.
Correct. The sched_class checks are redundant since yield_to() already
validates curr->sched_class == p->sched_class under double_rq_lock(),
and sched_class cannot change while holding the lock. Will remove.
>
> > +
> > + se_y = &p_yielding->se;
> > + se_t = &p_target->se;
> > +
> > + if (!se_t || !se_y || !se_t->on_rq || !se_y->on_rq)
> > + return false;
> > +
> > + if (task_rq(p_yielding) != rq || task_rq(p_target) != rq)
>
> yield_to() has already checked for this under double_rq_lock()
> so this too should be unnecessary.
Right. yield_to() already ensures both tasks are on their expected run
queues under double_rq_lock(), so the task_rq(p_yielding) != rq ||
task_rq(p_target) != rq check is redundant. Will remove.
>
> > + return false;
> > +
> > + *p_yielding_out = p_yielding;
> > + *se_y_out = se_y;
> > + *se_t_out = se_t;
>
> Why do we need these pointers? Can't the caller simply do:
>
> if (!yield_deboost_validate_tasks(rq, target))
> return;
>
> p_yielding = rq->donor;
> se_y_out = &p_yielding->se;
> se_t = &target->se;
You're right, the output parameters are unnecessary. The caller can
derive them directly:
p_yielding = rq->donor (accounting for proxy exec)
se_y = &p_yielding->se
se_t = &target->se
I'll simplify yield_deboost_validate_tasks() to just return bool and
let the caller obtain these pointers.
>
> That reminds me - now that we have proxy execution, you need
> to re-evaluate the usage of rq->curr (running context) vs
> rq->donor (vruntime context) when looking at all this.
Good catch. Since we're manipulating vruntime/deadline/vlag, I should
use rq->donor (scheduling context) instead of rq->curr (execution
context). In the yield_to() path, curr should equal donor (the
yielding task is running), but using donor makes the vruntime
semantics clearer and consistent with
update_curr_fair()/check_preempt_wakeup_fair().
Regards,
Wanpeng
Powered by blists - more mailing lists