linux-kernel - Re: [PATCH 1/5] sched/fair: ignore SIS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YxhH7cB+OIMAB0dM@chenyu5-mobl1>
Date:   Wed, 7 Sep 2022 15:27:41 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
CC:     Abel Wu <wuyun.abel@...edance.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Josh Don <joshdon@...gle.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

Hi Mel,
On 2022-09-06 at 10:57:17 +0100, Mel Gorman wrote:
> On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 6089251a4720..59b27a2ef465 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > >   		if (sd_share) {
> > >   			/* because !--nr is the condition to stop scan */
> > >   			nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> > > -			/* overloaded LLC is unlikely to have idle cpu/core */
> > > -			if (nr == 1)
> > > -				return -1;
> > > +
> > > +			/*
> > > +			 * Non-overloaded case: Scan full domain if there is
> > > +			 * 	an idle core. Otherwise, scan for an idle
> > > +			 * 	CPU based on nr_idle_scan
> > > +			 * Overloaded case: Unlikely to have an idle CPU but
> > > +			 * 	conduct a limited scan if there is potentially
> > > +			 * 	an idle core.
> > > +			 */
> > > +			if (nr > 1) {
> > > +				if (has_idle_core)
> > > +					nr = sd->span_weight;
> > > +			} else {
> > > +				if (!has_idle_core)
> > > +					return -1;
> > > +				nr = 2;
> > > +			}
> > >   		}
> > >   	}
> > >   	for_each_cpu_wrap(cpu, cpus, target + 1) {
> > > +		if (!--nr)
> > > +			break;
> > > +
> > >   		if (has_idle_core) {
> > >   			i = select_idle_core(p, cpu, cpus, &idle_cpu);
> > >   			if ((unsigned int)i < nr_cpumask_bits)
> > >   				return i;
> > >   		} else {
> > > -			if (!--nr)
> > > -				return -1;
> > >   			idle_cpu = __select_idle_cpu(cpu, p);
> > >   			if ((unsigned int)idle_cpu < nr_cpumask_bits)
> > >   				break;
> > 
> > I spent last few days testing this, with 3 variations (assume
> > has_idle_core):
> > 
> >  a) full or limited (2cores) scan when !nr_idle_scan
> >  b) whether clear sds->has_idle_core when partial scan failed
> >  c) scale scan depth with load or not
> > 
> > some observations:
> > 
> >  1) It seems always bad if not clear sds->has_idle_core when
> >     partial scan fails. It is due to over partially scanned
> >     but still can not find an idle core. (Following ones are
> >     based on clearing has_idle_core even in partial scans.)
> > 
> 
> Ok, that's rational. There will be corner cases where there was no idle
> CPU near the target when there is an idle core far away but it would be
> counter to the purpose of SIS_UTIL to care about that corner case.
> 
> >  2) Unconditionally full scan when has_idle_core is not good
> >     for netperf_{udp,tcp} and tbench4. It is probably because
> >     the SIS success rate of these workloads is already high
> >     enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
> >     hackbench ~= 3.5%) which negate a lot of the benefit full
> >     scan brings.
> > 
> 
> That's also rational. For a single client/server on netperf, it's expected
> that the SIS success rate is high and scanning is minimal. As the client
> and server are sharing data on localhost and somewhat synchronous, it may
> even partially benefit from SMT sharing.
>
Maybe off-topic, since we monitor the success rate and also other metrics
for each optimization in SIS path, is it possible to merge your statistics
patch [1] into upstream so we don't need to rebase in the future(although
it is targeting kernel development)?

Link: https://lore.kernel.org/lkml/20210726102247.21437-2-mgorman@techsingularity.net/
[cut]


thanks,
Chenyu