linux-kernel - Re: [PATCH v6 18/20] sched: Handle blocked-waiter migration (and return migration)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAB8ipk9Mye4t=wuSwF-xqJJ36BUxfW2wqb1k_6BCVrKqc4hTFg@mail.gmail.com>
Date:   Thu, 9 Nov 2023 14:38:19 +0800
From:   Xuewen Yan <xuewen.yan94@...il.com>
To:     John Stultz <jstultz@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Joel Fernandes <joelaf@...gle.com>,
        Qais Yousef <qyousef@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Zimuzo Ezeozue <zezeozue@...gle.com>,
        Youssef Esmat <youssefesmat@...gle.com>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>, kernel-team@...roid.com,
        Qais Yousef <qyousef@...alina.io>
Subject: Re: [PATCH v6 18/20] sched: Handle blocked-waiter migration (and
 return migration)

On Thu, Nov 9, 2023 at 2:08 PM John Stultz <jstultz@...gle.com> wrote:
>
> On Wed, Nov 8, 2023 at 9:32 PM Xuewen Yan <xuewen.yan94@...il.com> wrote:
> > I understand what you mean here. But I have some other worries:
> > considering the scenario of Big-Little cpu topology, when EAS is in
> > effect.
> > If the owner is a "small task", and the owner is on a small core, and
> > the blocked task is a "big task", the block task will be directly
> > migrated to the small core,
> > and because the task is on rq, when wake up, it would skip the
> > “select_task_rq” and put it directly on the small core.
> > As a result, the big task's performance may decrease.
> > The same reason, a small task may be placed on the big core, and there
> > may be a regression in power consumption.
> >
> ...
> > > +static inline bool proxy_return_migration(struct rq *rq, struct rq_flags *rf,
> > > +                                         struct task_struct *next)
> > > +{
> > > +       if (!sched_proxy_exec())
> > > +               return false;
> > > +
> > > +       if (next->blocked_on && next->blocked_on_waking) {
> > > +               if (!is_cpu_allowed(next, cpu_of(rq))) {
> >
> >
> > Based on the above reasons, could this be changed to the following?
> >                 /* When EAS enabled, we hope the task selects the cpu again */
> >                  if (sched_energy_enabled() || !is_cpu_allowed(next,
> > cpu_of(rq)) )
>
> Hey! Thanks so much for the feedback and review!
>
> That is a good point, this would cause a misplacement on the lock
> handoff.  Though I fret having to run through the return migration
> lock juggling here for every blocked_on wakeup would further hurt
> performance as well.
>
> I'm currently trying to see if I can extend the blocked_on_waking flag
> to keep more state (BLOCKED, WAKING, RUNNABLE) so that we can move the
> return migration back to the the try_to_wake_up() call path, while
> avoiding the task from becoming suddenly runnable on wakeup while on
> the wrong runqueue.  This would avoid the lock juggling as we'd
> already have the pi_lock. Though I'm a little hesitant as doing the
> deactivate()/select_task_rq()/activate() steps from ttwu might muddle
> up the careful logic around the on_rq/ttwu_runnable checks (definitely
> had issues in that area with earlier versions of the patch).

I also think it is better to put the return migration back to the
try_to_wake_up() call path.
When mutex_unlock, could we deactivate the block task before adding it
to wake_q?
In this case, it can follow the try_to_wake_up patch. But at this
time, the trace_sched_blocked_reason
may be no need?

>
> > In addition, I also thought that since the block task is no longer
> > dequeued, this will definitely cause the load on the CPU to increase.
> > Perhaps we need to evaluate the impact of this on power consumption.
> >
>
> Yeah. I've got that still as a todo in the cover letter:
> * CFS load balancing. Blocked tasks may carry forward load (PELT)
>   to the lock owner's CPU, so CPU may look like it is overloaded.
>
> If you have any thoughts there for a preferred approach, I'd be happy to hear.

Okay, I'm still studying these patches carefully, and I will to test
these patches later. When I find other problems, I will be happy to
share.

Thanks!

>
> thanks
> -john

BR
---
xuewen