[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDNzMa_vEr69eUXQBoc_5M978w=m+nykVG40gamz0YBBw@mail.gmail.com>
Date:   Thu, 29 Oct 2020 14:58:22 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Tao Zhou <ouwen210@...mail.com>
Subject: Re: [PATCH v2] sched/fair: prefer prev cpu in asymmetric wakeup path
On Thu, 29 Oct 2020 at 12:16, Valentin Schneider
<valentin.schneider@....com> wrote:
>
>
> Hi Vincent,
>
> On 28/10/20 17:44, Vincent Guittot wrote:
> > During fast wakeup path, scheduler always check whether local or prev cpus
> > are good candidates for the task before looking for other cpus in the
> > domain. With
> >   commit b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup scan")
> > the heterogenous system gains a dedicated path but doesn't try to reuse
> > prev cpu whenever possible. If the previous cpu is idle and belong to the
> > LLC domain, we should check it 1st before looking for another cpu because
> > it stays one of the best candidate and this also stabilizes task placement
> > on the system.
> >
> > This change aligns asymmetric path behavior with symmetric one and reduces
> > cases where the task migrates across all cpus of the sd_asym_cpucapacity
> > domains at wakeup.
> >
> > This change does not impact normal EAS mode but only the overloaded case or
> > when EAS is not used.
> >
> > - On hikey960 with performance governor (EAS disable)
> >
> > ./perf bench sched pipe -T -l 50000
> >              mainline           w/ patch
> > # migrations   999364                  0
> > ops/sec        149313(+/-0.28%)   182587(+/- 0.40) +22%
> >
> > - On hikey with performance governor
> >
> > ./perf bench sched pipe -T -l 50000
> >              mainline           w/ patch
> > # migrations        0                  0
> > ops/sec         47721(+/-0.76%)    47899(+/- 0.56) +0.4%
> >
> > According to test on hikey, the patch doesn't impact symmetric system
> > compared to current implementation (only tested on arm64)
> >
> > Also read the uclamped value of task's utilization at most twice instead
> > instead each time we compare task's utilization with cpu's capacity.
> >
> > Fixes: b7a331615d25 ("sched/fair: Add asymmetric CPU capacity wakeup scan")
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>
> Other than the below, I quite like this!
>
> > ---
> > Changes in v2:
> > - merge asymmetric and symmetric path instead of duplicating tests on target,
> >   prev and other special cases.
> >
> > - factorize call to uclamp_task_util(p) and use fits_capacity(). This could
> >   explain part of the additionnal improvement compared to v1 (+22% instead of
> >   +17% on v1).
> >
> > - Keep using LLC instead of asym domain for early check of target, prev and
> >   recent_used_cpu to ensure cache sharing between the task. This doesn't
> >   change anything for dynamiQ but will ensure same cache for legacy big.LITTLE
> >   and also simply the changes.
> >
>
> On legacy big.LITTLE systems, sd_asym_cpucapacity spans all CPUs, so we
> would iterate over those in select_idle_capacity() anyway - the policy
> we've been going for is that capacity fitness trumps cache use.
I agree on that but I haven't been able to convince myself that adding
the complexity for the 3 shortcuts (target, prev and recent_used) will
give any benefit.
For example, the problem that I raised with perf bench sched pipe, is
solved on legacy bg.LITTLE with this version because the 2 threads
ends up on the same cache domain
>
> This does require the system to have a decent interconnect, cache snooping
> & co, but that is IMO a requirement of any sane asymmetric system.
>
> To put words into code, this is the kind of check I would see:
>
>   if (static_branch_unlikely(&sched_asym_cpucapacity))
>         return fits_capacity(task_util, capacity_of(cpu));
>   else
>         return cpus_share_cache(cpu, other);
>
> > - don't check capacity for the per-cpu kthread UC because the assumption is
> >   that the wakee queued work for the per-cpu kthread that is now complete and
> >   the task was already on this cpu.
> >
> > - On an asymmetric system where an exclusive cpuset defines a symmetric island,
> >   task's load is synced and tested although it's not needed. But taking care of
> >   this special case by testing if sd_asym_cpucapacity is not null impacts by
> >   more than 4% the performance of default sched_asym_cpucapacity path.
> >
> > - The huge increase of the number of migration for hikey960 mainline comes from
> >   teh fact that the ftrace buffer was overloaded by events in the tests done
> >   with v1.
Powered by blists - more mailing lists
 
