linux-kernel - RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee are already in same LLC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7dd00a98d6454d5e92a7d9b936d1aa1c@hisilicon.com>
Date:   Wed, 26 May 2021 21:38:19 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "bsegall@...gle.com" <bsegall@...gle.com>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "valentin.schneider@....com" <valentin.schneider@....com>,
        "juri.lelli@...hat.com" <juri.lelli@...hat.com>,
        "bristot@...hat.com" <bristot@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "guodong.xu@...aro.org" <guodong.xu@...aro.org>,
        yangyicong <yangyicong@...wei.com>,
        tangchengchang <tangchengchang@...wei.com>,
        Linuxarm <linuxarm@...wei.com>
Subject: RE: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
 are already in same LLC



> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@...radead.org]
> Sent: Thursday, May 27, 2021 12:16 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> Cc: vincent.guittot@...aro.org; mingo@...hat.com; dietmar.eggemann@....com;
> rostedt@...dmis.org; bsegall@...gle.com; mgorman@...e.de;
> valentin.schneider@....com; juri.lelli@...hat.com; bristot@...hat.com;
> linux-kernel@...r.kernel.org; guodong.xu@...aro.org; yangyicong
> <yangyicong@...wei.com>; tangchengchang <tangchengchang@...wei.com>;
> Linuxarm <linuxarm@...wei.com>
> Subject: Re: [PATCH] sched: fair: don't depend on wake_wide if waker and wakee
> are already in same LLC
> 
> 
> $subject is weird; sched/fair: is the right tag, and then start with a
> capital letter.
> 
> On Wed, May 26, 2021 at 09:10:57PM +1200, Barry Song wrote:
> > when waker and wakee are already in the same LLC, it is pointless to worry
> > about the competition caused by pulling wakee to waker's LLC domain.
> 
> But there's more than LLC.

I suppose other concerns might be about the "idle" and "load" of
waker's cpu and wakee's prev_cpu. Here even though we disable
wake_wide(), wake_affine() still has chance to select wakee's
prev_cpu rather than pulling to waker. So disabling wake_wide()
doesn't mean we will 100% pull.

static int wake_affine(struct sched_domain *sd, struct task_struct *p,
		       int this_cpu, int prev_cpu, int sync)
{
	int target = nr_cpumask_bits;

	if (sched_feat(WA_IDLE))
		target = wake_affine_idle(this_cpu, prev_cpu, sync);

	if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits)
		target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);

	if (target == nr_cpumask_bits)
		return prev_cpu;

	..
	return target;
}

Furthermore, select_idle_sibling() can also pick wakee's prev_cpu
if it is idle:

static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
	...

	/*
	 * If the previous CPU is cache affine and idle, don't be stupid:
	 */
	if (prev != target && cpus_share_cache(prev, target) &&
	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
	    asym_fits_capacity(task_util, prev))
		return prev;
	...
}

Except those, could you please give me some clue about what else
you have concerns on?

> 
> > Signed-off-by: Barry Song <song.bao.hua@...ilicon.com>
> > ---
> >  kernel/sched/fair.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 3248e24a90b0..cfb1bd47acc3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6795,7 +6795,15 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu,
> int wake_flags)
> >  			new_cpu = prev_cpu;
> >  		}
> >
> > -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> > +		/*
> > +		 * we use wake_wide to make smarter pull and avoid cruel
> > +		 * competition because of jam-packed tasks in waker's LLC
> > +		 * domain. But if waker and wakee have been already in
> > +		 * same LLC domain, it seems it is pointless to depend
> > +		 * on wake_wide
> > +		 */
> > +		want_affine = (cpus_share_cache(cpu, prev_cpu) || !wake_wide(p)) &&
> > +				cpumask_test_cpu(cpu, p->cpus_ptr);
> >  	}
> 
> And no supportive numbers...

Sorry for the confusion.

I actually put some supportive numbers at the below thread which
derived this patch:
https://lore.kernel.org/lkml/bbc339cef87e4009b6d56ee37e202daf@hisilicon.com/

when I tried to give Dietmar some pgbench data in that thread,
I found in kunpeng920, while software ran in one die/numa with
24cores sharing LLC, disabling wake_wide() brought the best
pgbench result.

                llc_as_factor          don't_use_wake_wide
Hmean     1     10869.27 (   0.00%)    10723.08 *  -1.34%*
Hmean     8     19580.59 (   0.00%)    19469.34 *  -0.57%*
Hmean     12    29643.56 (   0.00%)    29520.16 *  -0.42%*
Hmean     24    43194.47 (   0.00%)    43774.78 *   1.34%*
Hmean     32    40163.23 (   0.00%)    40742.93 *   1.44%*
Hmean     48    42249.29 (   0.00%)    48329.00 *  14.39%*

The test was done by https://github.com/gormanm/mmtests
and
./run-mmtests.sh --config ./configs/config-db-pgbench-timed-ro-medium test_tag

Commit "sched: Implement smarter wake-affine logic"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62470419
says pgbench can improve by wake_wide(), but I've actually
seen the opposite result while waker and wakee are already
in one LLC.

Not quite sure if it is specific to kunpeng920, perhaps
I need to run the same test on some x86 machines.

Thanks
Barry