linux-kernel - Re: [PATCH] sched: allow newidle balancing to bail out of load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29NtX7EZsAqrT8vXd6tgWe2HPRNPM=cWxFSSxBtW1MjFqOA@mail.gmail.com>
Date:   Thu, 9 Jun 2022 12:40:08 -0700
From:   Josh Don <joshdon@...gle.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: allow newidle balancing to bail out of load_balance

Thanks Vincent,

On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@...gle.com> wrote:
> >
> > While doing newidle load balancing, it is possible for new tasks to
> > arrive, such as with pending wakeups. newidle_balance() already accounts
> > for this by exiting the sched_domain load_balance() iteration if it
> > detects these cases. This is very important for minimizing wakeup
> > latency.
> >
> > However, if we are already in load_balance(), we may stay there for a
> > while before returning back to newidle_balance(). This is most
> > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > very straightforward workaround to this is to adjust should_we_balance()
> > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > detected.
>
> This one is close to the other tests and I wonder if it should be
> better placed before taking the busiest rq lock and detaching some
> tasks.
>
> Beside your use case where all other threads can't move in local cpu
> and load_balance() loops and clears other cpus, most of the time is
> probably spent in fbg() and fbq() so there are more chance that a task
> woke in this meantime and I imagine that it becomes useless to take
> lock and move tasks from another cpu if the local cpu is no more newly
> idle.
>
> Have you tried other places in load_balance() and does this one
> provide the lowest wakeup latency ?
>
> That being said, the current patch makes sense.

I tested with another check after fbg/fbq and there wasn't any
noticeable improvement to observed wakeup latency (not totally
unexpected, since it only helps for wakeups that come during fbg/fbq).
However, I don't think there's any harm in having that extra check in
the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
we can. fbq+fbg are together taking ~3-4us per iteration in my repro.

If there are no objections I can send a v2 with the added delta:

@@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                goto out_balanced;
        }

+       /*
+        * fbg/fbq can take a while. In the newly idle case, recheck whether
+        * we should continue with balancing, since it is possible that a
+        * task woke up in the interim.
+        */
+       if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
+               *continue_balancing = 0;
+               goto out_balanced;
+       }
+
        BUG_ON(busiest == env.dst_rq);

        schedstat_add(sd->lb_imbalance[idle], env.imbalance);