[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa6a22d3-6dbf-0ffc-088b-68b3c241f2d0@bytedance.com>
Date: Wed, 7 Sep 2022 15:52:32 +0800
From: Abel Wu <wuyun.abel@...edance.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Peter Zijlstra <peterz@...radead.org>,
Mel Gorman <mgorman@...e.de>,
Vincent Guittot <vincent.guittot@...aro.org>,
Josh Don <joshdon@...gle.com>, Chen Yu <yu.c.chen@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core
On 9/6/22 5:57 PM, Mel Gorman wrote:
> On Mon, Sep 05, 2022 at 10:40:00PM +0800, Abel Wu wrote:
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 6089251a4720..59b27a2ef465 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>> if (sd_share) {
>>> /* because !--nr is the condition to stop scan */
>>> nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
>>> - /* overloaded LLC is unlikely to have idle cpu/core */
>>> - if (nr == 1)
>>> - return -1;
>>> +
>>> + /*
>>> + * Non-overloaded case: Scan full domain if there is
>>> + * an idle core. Otherwise, scan for an idle
>>> + * CPU based on nr_idle_scan
>>> + * Overloaded case: Unlikely to have an idle CPU but
>>> + * conduct a limited scan if there is potentially
>>> + * an idle core.
>>> + */
>>> + if (nr > 1) {
>>> + if (has_idle_core)
>>> + nr = sd->span_weight;
>>> + } else {
>>> + if (!has_idle_core)
>>> + return -1;
>>> + nr = 2;
>>> + }
>>> }
>>> }
>>> for_each_cpu_wrap(cpu, cpus, target + 1) {
>>> + if (!--nr)
>>> + break;
>>> +
>>> if (has_idle_core) {
>>> i = select_idle_core(p, cpu, cpus, &idle_cpu);
>>> if ((unsigned int)i < nr_cpumask_bits)
>>> return i;
>>> } else {
>>> - if (!--nr)
>>> - return -1;
>>> idle_cpu = __select_idle_cpu(cpu, p);
>>> if ((unsigned int)idle_cpu < nr_cpumask_bits)
>>> break;
>>
>> I spent last few days testing this, with 3 variations (assume
>> has_idle_core):
>>
>> a) full or limited (2cores) scan when !nr_idle_scan
>> b) whether clear sds->has_idle_core when partial scan failed
>> c) scale scan depth with load or not
>>
>> some observations:
>>
>> 1) It seems always bad if not clear sds->has_idle_core when
>> partial scan fails. It is due to over partially scanned
>> but still can not find an idle core. (Following ones are
>> based on clearing has_idle_core even in partial scans.)
>>
>
> Ok, that's rational. There will be corner cases where there was no idle
> CPU near the target when there is an idle core far away but it would be
> counter to the purpose of SIS_UTIL to care about that corner case.
Yes, and this corner case (that may become normal sometimes actually)
will be continuously exist if scanning in a linear fashion while scan
depth is limited (SIS_UTIL/SIS_PROP/...), especially when the LLC is
getting larger nowadays.
>
>> 2) Unconditionally full scan when has_idle_core is not good
>> for netperf_{udp,tcp} and tbench4. It is probably because
>> the SIS success rate of these workloads is already high
>> enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
>> hackbench ~= 3.5%) which negate a lot of the benefit full
>> scan brings.
>>
>
> That's also rational. For a single client/server on netperf, it's expected
> that the SIS success rate is high and scanning is minimal. As the client
> and server are sharing data on localhost and somewhat synchronous, it may
> even partially benefit from SMT sharing.
>
> So basic approach would be "always clear sds->has_idle_core" + "limit
> scan even when has_idle_core is true", right?
Yes, exactly the same as what you suggested at first place.
>
> If so, stick a changelog on it and resend!
I will include this in the SIS_FILTER patchset.
>
>> 3) Scaling scan depth with load seems good for the hackbench
>> socket tests, and neutral in pipe tests. And I think this
>> is just the case you mentioned before, under fast wake-up
>> workloads the has_idle_core will become not that reliable,
>> so a full scan won't always win.
>>
>
> My recommendation is leave this out for now but assuming the rest of
> the patches get picked up, consider posting it for the next major kernel
> version (i.e. separate the basic and clever approaches by one major kernel
> version). By separating them, there is less chance of a false positive
> bisection pointing to the wrong patch. Any regression will not be perfectly
> reproducible so the changes of a false positive bisection is relatively high.
Makes sense. I will just include the basic part first.
Thanks,
Abel
Powered by blists - more mailing lists