linux-kernel - Re: Re: [PATCH 1/5] sched/fair: ignore SIS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fa6a22d3-6dbf-0ffc-088b-68b3c241f2d0@bytedance.com>
Date:   Wed, 7 Sep 2022 15:52:32 +0800
From:   Abel Wu <wuyun.abel@...edance.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...e.de>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Josh Don <joshdon@...gle.com>, Chen Yu <yu.c.chen@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

On 9/6/22 5:57 PM, Mel Gorman wrote:
> On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 6089251a4720..59b27a2ef465 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>>    		if (sd_share) {
>>>    			/* because !--nr is the condition to stop scan */
>>>    			nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
>>> -			/* overloaded LLC is unlikely to have idle cpu/core */
>>> -			if (nr == 1)
>>> -				return -1;
>>> +
>>> +			/*
>>> +			 * Non-overloaded case: Scan full domain if there is
>>> +			 * 	an idle core. Otherwise, scan for an idle
>>> +			 * 	CPU based on nr_idle_scan
>>> +			 * Overloaded case: Unlikely to have an idle CPU but
>>> +			 * 	conduct a limited scan if there is potentially
>>> +			 * 	an idle core.
>>> +			 */
>>> +			if (nr > 1) {
>>> +				if (has_idle_core)
>>> +					nr = sd->span_weight;
>>> +			} else {
>>> +				if (!has_idle_core)
>>> +					return -1;
>>> +				nr = 2;
>>> +			}
>>>    		}
>>>    	}
>>>    	for_each_cpu_wrap(cpu, cpus, target + 1) {
>>> +		if (!--nr)
>>> +			break;
>>> +
>>>    		if (has_idle_core) {
>>>    			i = select_idle_core(p, cpu, cpus, &idle_cpu);
>>>    			if ((unsigned int)i < nr_cpumask_bits)
>>>    				return i;
>>>    		} else {
>>> -			if (!--nr)
>>> -				return -1;
>>>    			idle_cpu = __select_idle_cpu(cpu, p);
>>>    			if ((unsigned int)idle_cpu < nr_cpumask_bits)
>>>    				break;
>>
>> I spent last few days testing this, with 3 variations (assume
>> has_idle_core):
>>
>>   a) full or limited (2cores) scan when !nr_idle_scan
>>   b) whether clear sds->has_idle_core when partial scan failed
>>   c) scale scan depth with load or not
>>
>> some observations:
>>
>>   1) It seems always bad if not clear sds->has_idle_core when
>>      partial scan fails. It is due to over partially scanned
>>      but still can not find an idle core. (Following ones are
>>      based on clearing has_idle_core even in partial scans.)
>>
> 
> Ok, that's rational. There will be corner cases where there was no idle
> CPU near the target when there is an idle core far away but it would be
> counter to the purpose of SIS_UTIL to care about that corner case.

Yes, and this corner case (that may become normal sometimes actually)
will be continuously exist if scanning in a linear fashion while scan
depth is limited (SIS_UTIL/SIS_PROP/...), especially when the LLC is
getting larger nowadays.

> 
>>   2) Unconditionally full scan when has_idle_core is not good
>>      for netperf_{udp,tcp} and tbench4. It is probably because
>>      the SIS success rate of these workloads is already high
>>      enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
>>      hackbench ~= 3.5%) which negate a lot of the benefit full
>>      scan brings.
>>
> 
> That's also rational. For a single client/server on netperf, it's expected
> that the SIS success rate is high and scanning is minimal. As the client
> and server are sharing data on localhost and somewhat synchronous, it may
> even partially benefit from SMT sharing.
> 
> So basic approach would be "always clear sds->has_idle_core" + "limit
> scan even when has_idle_core is true", right?

Yes, exactly the same as what you suggested at first place.

> 
> If so, stick a changelog on it and resend!

I will include this in the SIS_FILTER patchset.

> 
>>   3) Scaling scan depth with load seems good for the hackbench
>>      socket tests, and neutral in pipe tests. And I think this
>>      is just the case you mentioned before, under fast wake-up
>>      workloads the has_idle_core will become not that reliable,
>>      so a full scan won't always win.
>>
> 
> My recommendation is leave this out for now but assuming the rest of
> the patches get picked up, consider posting it for the next major kernel
> version (i.e. separate the basic and clever approaches by one major kernel
> version). By separating them, there is less chance of a false positive
> bisection pointing to the wrong patch. Any regression will not be perfectly
> reproducible so the changes of a false positive bisection is relatively high.

Makes sense. I will just include the basic part first.

Thanks,
Abel