linux-kernel - Re: [PATCH v5 16/19] sched: Fix proxy/current (push,pull)ability

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCqtG9fWX7cZ4ZpCkPOKJDOrRBELnAxiuLn+WK1fUpV=1g@mail.gmail.com>
Date:   Tue, 22 Aug 2023 09:19:53 -0700
From:   John Stultz <jstultz@...gle.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Joel Fernandes <joelaf@...gle.com>,
        Qais Yousef <qyousef@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Zimuzo Ezeozue <zezeozue@...gle.com>,
        Youssef Esmat <youssefesmat@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>, kernel-team@...roid.com,
        "Connor O'Brien" <connoro@...gle.com>
Subject: Re: [PATCH v5 16/19] sched: Fix proxy/current (push,pull)ability

On Tue, Aug 22, 2023 at 8:20 AM Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
>
> On 19/08/2023 08:08, John Stultz wrote:
> > From: Valentin Schneider <valentin.schneider@....com>
>
> [...]
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bee7082b294f..e8065fc5c894 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -6656,6 +6656,21 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> >       raw_spin_unlock(&mutex->wait_lock);
> >       return ret;
> >  }
> > +
> > +static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next)
> > +{
> > +     /*
> > +      * pick_next_task() calls set_next_task() on the selected task
> > +      * at some point, which ensures it is not push/pullable.
> > +      * However, the selected task *and* the ,mutex owner form an
> > +      * atomic pair wrt push/pull.
> > +      *
> > +      * Make sure owner is not pushable. Unfortunately we can only
> > +      * deal with that by means of a dequeue/enqueue cycle. :-/
> > +      */
> > +     dequeue_task(rq, next, DEQUEUE_NOCLOCK | DEQUEUE_SAVE);
> > +     enqueue_task(rq, next, ENQUEUE_NOCLOCK | ENQUEUE_RESTORE);
> > +}
> >  #else /* PROXY_EXEC */
> >  static struct task_struct *
> >  proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> > @@ -6663,6 +6678,8 @@ proxy(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
> >       BUG(); // This should never be called in the !PROXY case
> >       return next;
> >  }
> > +
> > +static inline void proxy_tag_curr(struct rq *rq, struct task_struct *next) { }
> >  #endif /* PROXY_EXEC */
> >
> >  /*
> > @@ -6711,6 +6728,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> >       unsigned long prev_state;
> >       struct rq_flags rf;
> >       struct rq *rq;
> > +     bool proxied;
> >       int cpu;
> >
> >       cpu = smp_processor_id();
> > @@ -6760,6 +6778,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> >               switch_count = &prev->nvcsw;
> >       }
> >
> > +     proxied = (rq_selected(rq) != prev);
>
> Looks like proxied isn't used here. (*)

Ah. Looks like I should have split that off into a later change.
Thanks for pointing that out.

> >  pick_again:
> >       next = pick_next_task(rq, rq_selected(rq), &rf);
> >       rq_set_selected(rq, next);
> > @@ -6786,6 +6805,10 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> >                * changes to task_struct made by pick_next_task().
> >                */
> >               RCU_INIT_POINTER(rq->curr, next);
> > +
> > +             if (unlikely(!task_current_selected(rq, next)))
> > +                     proxy_tag_curr(rq, next);
> > +
> >               /*
> >                * The membarrier system call requires each architecture
> >                * to have a full memory barrier after updating
> > @@ -6810,6 +6833,10 @@ static void __sched notrace __schedule(unsigned int sched_mode)
> >               /* Also unlocks the rq: */
> >               rq = context_switch(rq, prev, next, &rf);
> >       } else {
> > +             /* In case next was already curr but just got blocked_donor*/
> > +             if (unlikely(!task_current_selected(rq, next)))
> > +                     proxy_tag_curr(rq, next);
>
> (*) v4 had:
>
> +          /* In case next was already curr but just got blocked_donor*/
> +             if (unlikely(!proxied && next->blocked_donor))
>

That gets re-added later in the series when we add the blocked_donor tracking:
https://github.com/johnstultz-work/linux-dev/commit/00d33fb7d94bba6d230a833d775f83f4d90e4661#diff-cc1a82129952a910fdc4292448c2a097a2ba538bebefcf3c06381e45639ae73eR7115

> > +
> >               rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
> >
> >               rq_unpin_lock(rq, &rf);
>
> I miss changes in enqueue_task_rt() and put_prev_task_rt() related to
> 'affinity of blocked tasks doesn't matter' from v4.

So, yeah. As I was focusing on the correctness of the proxy migration
and return paths, I realized having the push/pull logic migrate
blocked tasks independently really doesn't help things (as proxy will
push them back).

So in those call paths, we now skip trying to migrate blocked tasks,
leaving the core proxy logic to be the only one to move blocked tasks,
and the __sched check to be the one to return them to a runnable cpu
when they are unblocked.

thanks
-john