[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7dd00a98d6454d5e92a7d9b936d1aa1c@hisilicon.com>
Date: Wed, 26 May 2021 21:38:19 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"bsegall@...gle.com" <bsegall@...gle.com>,
"mgorman@...e.de" <mgorman@...e.de>,
"valentin.schneider@....com" <valentin.schneider@....com>,
"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"bristot@...hat.com" <bristot@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"guodong.xu@...aro.org" <guodong.xu@...aro.org>,
yangyicong <yangyicong@...wei.com>,
tangchengchang <tangchengchang@...wei.com>,
Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
are already in same LLC
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@...radead.org]
> Sent: Thursday, May 27, 2021 12:16 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> Cc: vincent.guittot@...aro.org; mingo@...hat.com; dietmar.eggemann@....com;
> rostedt@...dmis.org; bsegall@...gle.com; mgorman@...e.de;
> valentin.schneider@....com; juri.lelli@...hat.com; bristot@...hat.com;
> linux-kernel@...r.kernel.org; guodong.xu@...aro.org; yangyicong
> <yangyicong@...wei.com>; tangchengchang <tangchengchang@...wei.com>;
> Linuxarm <linuxarm@...wei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
>
>
> $subject is weird; sched/fair: is the right tag, and then start with a
> capital letter.
>
> On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote:
> > when waker and wakee are already in the same LLC, it is pointless to worry
> > about the competition caused by pulling wakee to waker's LLC domain.
>
> But there's more than LLC.
I suppose other concerns might be about the "idle" and "load" of
waker's cpu and wakee's prev_cpu. Here even though we disable
wake_wide(), wake_affine() still has chance to select wakee's
prev_cpu rather than pulling to waker. So disabling wake_wide()
doesn't mean we will 100% pull.
static int wake_affine(struct sched_domain *sd, struct task_struct *p,
int this_cpu, int prev_cpu, int sync)
{
int target = nr_cpumask_bits;
if (sched_feat(WA_IDLE))
target = wake_affine_idle(this_cpu, prev_cpu, sync);
if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
if (target == nr_cpumask_bits)
return prev_cpu;
..
return target;
}
Furthermore, select_idle_sibling() can also pick wakee's prev_cpu
if it is idle:
static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
...
/*
* If the previous CPU is cache affine and idle, don't be stupid:
*/
if (prev != target && cpus_share_cache(prev, target) &&
(available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
asym_fits_capacity(task_util, prev))
return prev;
...
}
Except those, could you please give me some clue about what else
you have concerns on?
>
> > Signed-off-by: Barry Song <song.bao.hua@...ilicon.com>
> > ---
> > kernel/sched/fair.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3248e24a90b0..cfb1bd47acc3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu,
> int wake_flags)
> > new_cpu = prev_cpu;
> > }
> >
> > - want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> > + /*
> > + * we use wake_wide to make smarter pull and avoid cruel
> > + * competition because of jam-packed tasks in waker's LLC
> > + * domain. But if waker and wakee have been already in
> > + * same LLC domain, it seems it is pointless to depend
> > + * on wake_wide
> > + */
> > + want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
> > + cpumask_test_cpu(cpu, p->cpus_ptr);
> > }
>
> And no supportive numbers...
Sorry for the confusion.
I actually put some supportive numbers at the below thread which
derived this patch:
https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/
when I tried to give Dietmar some pgbench data in that thread,
I found in kunpeng920, while software ran in one die/numa with
24cores sharing LLC, disabling wake_wide() brought the best
pgbench result.
llc_as_factor don't_use_wake_wide
Hmean 1 10869.27 ( 0.00%) 10723.08 * -1.34%*
Hmean 8 19580.59 ( 0.00%) 19469.34 * -0.57%*
Hmean 12 29643.56 ( 0.00%) 29520.16 * -0.42%*
Hmean 24 43194.47 ( 0.00%) 43774.78 * 1.34%*
Hmean 32 40163.23 ( 0.00%) 40742.93 * 1.44%*
Hmean 48 42249.29 ( 0.00%) 48329.00 * 14.39%*
The test was done by https://github.com/gormanm/mmtests
and
./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag
Commit "sched: Implement smarter wake-affine logic"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419
says pgbench can improve by wake_wide(), but I've actually
seen the opposite result while waker and wakee are already
in one LLC.
Not quite sure if it is specific to kunpeng920, perhaps
I need to run the same test on some x86 machines.
Thanks
Barry
Powered by blists - more mailing lists