linux-kernel - Re: [PATCH v7 23/23] sched: Fix rt/dl load balancing via chain level balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCpqjiUi4iKKKXH8PLtUKe2JTJdt8W=7CQ6VJi409+RaEw@mail.gmail.com>
Date: Thu, 4 Jan 2024 19:42:34 -0800
From: John Stultz <jstultz@...gle.com>
To: Metin Kaya <metin.kaya@....com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelaf@...gle.com>, 
	Qais Yousef <qyousef@...gle.com>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, 
	Youssef Esmat <youssefesmat@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Daniel Bristot de Oliveira <bristot@...hat.com>, Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, 
	Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com
Subject: Re: [PATCH v7 23/23] sched: Fix rt/dl load balancing via chain level balance

On Fri, Dec 22, 2023 at 6:51 AM Metin Kaya <metin.kaya@....com> wrote:
> On 20/12/2023 12:18 am, John Stultz wrote:
> > +     /*
> > +      * Chain leads off the rq, we're free to push it anywhere.
> > +      *
> > +      * One wrinkle with relying on find_exec_ctx is that when the chain
> > +      * leads to a task currently migrating to rq, we see the chain as
> > +      * pushable & push everything prior to the migrating task. Even if
> > +      * we checked explicitly for this case, we could still race with a
> > +      * migration after the check.
> > +      * This shouldn't permanently produce a bad state though, as proxy()
>
>                                                         find_proxy_task()

Fixed.

> > +#ifdef CONFIG_SCHED_PROXY_EXEC
> >   static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
> > -                                       struct rq *later)
> > +                                       struct rq *later, bool *retry)
> > +{
> > +     if (!dl_task(task) || is_migration_disabled(task))
> > +             return false;
> > +
> > +     if (rq != this_rq()) {
> > +             struct task_struct *next_task = pick_next_pushable_dl_task(rq);
> > +
> > +             if (next_task == task) {
>
> Nit: We can `return false;` if next_task != task and save one level of
> indentation.

Ah, good point. Fixed.

> > +                     struct task_struct *exec_ctx;
> > +
> > +                     exec_ctx = find_exec_ctx(rq, next_task);
> > +                     *retry = (exec_ctx && !cpumask_test_cpu(later->cpu,
> > +                                                            &exec_ctx->cpus_mask));
> > +             } else {
> > +                     return false;
> > +             }
> > +     } else {
> > +             int pushable = task_is_pushable(rq, task, later->cpu);
> > +
> > +             *retry = pushable == -1;
> > +             if (!pushable)
> > +                     return false;
>
> `return pushable;` can replace above 2 lines.
> The same for rt_revalidate_rq_state().

Hrm. It does save lines, but I fret (in my estimation) it makes the
code a touch more complex to read. I might hold off on this for the
moment unless someone else pushes for it.

> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index fabb19891e95..d5ce95dc5c09 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
...
> > +#ifdef CONFIG_SCHED_PROXY_EXEC
> > +static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
> > +                                       struct rq *lowest, bool *retry)
>
> This function can be consolidated with dl_revalidate_rq_state() as you
> noted in the previous patch, although rt_revalidate_rq_state() has few
> comments.

Yeah. I need to stare at it a bit to try to figure out what might be
the best name to use for the common chunk.
I'd also like to figure a better way to do the retry stuff, as it feels messy.

> > +{
> > +     /*
> > +      * Releasing the rq lock means we need to re-check pushability.
> > +      * Some scenarios:
> > +      * 1) If a migration from another CPU sent a task/chain to rq
> > +      *    that made task newly unpushable by completing a chain
> > +      *    from task to rq->curr, then we need to bail out and push something
> > +      *    else.
> > +      * 2) If our chain led off this CPU or to a dequeued task, the last waiter
> > +      *    on this CPU might have acquired the lock and woken (or even migrated
> > +      *    & run, handed off the lock it held, etc...). This can invalidate the
> > +      *    result of find_lowest_rq() if our chain previously ended in a blocked
> > +      *    task whose affinity we could ignore, but now ends in an unblocked
> > +      *    task that can't run on lowest_rq.
> > +      * 3) Race described at https://lore.kernel.org/all/1523536384-26781-2-git-send-email-huawei.libin@huawei.com/
> > +      *
> > +      * Notes on these:
> > +      * - Scenario #2 is properly handled by rerunning find_lowest_rq
> > +      * - Scenario #1 requires that we fail
> > +      * - Scenario #3 can AFAICT only occur when rq is not this_rq(). And the
> > +      *   suggested fix is not universally correct now that push_cpu_stop() can
> > +      *   call this function.
> > +      */
> > +     if (!rt_task(task) || is_migration_disabled(task)) {
> > +             return false;
> > +     } else if (rq != this_rq()) {
>
> Nit: `else` can be dropped as in dl_revalidate_rq_state().

Ack. Done.

Thanks again for all the feedback!
-john