[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dce80b8e-8a2c-d498-bdf9-0ca3ca340ce7@amd.com>
Date: Thu, 29 Sep 2022 22:49:24 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chen Yu <yu.c.chen@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Tim Chen <tim.c.chen@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Juri Lelli <juri.lelli@...hat.com>,
Rik van Riel <riel@...riel.com>,
Aaron Lu <aaron.lu@...el.com>,
Abel Wu <wuyun.abel@...edance.com>,
Yicong Yang <yangyicong@...ilicon.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Choose the CPU where short task is
running during wake up
Hello Chenyu,
Thank you for looking into this issue.
On 9/29/2022 10:55 AM, Chen Yu wrote:
> Hi Prateek,
> [..snip..]
>
>>> kernel/sched/fair.c | 31 ++++++++++++++++++++++++++++++-
>>> 1 file changed, 30 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 914096c5b1ae..7519ab5b911c 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p)
>>> return 1;
>>> }
>>>
>>> +/*
>>> + * If a task switches in and then voluntarily relinquishes the
>>> + * CPU quickly, it is regarded as a short running task.
>>> + * sysctl_sched_min_granularity is chosen as the threshold,
>>> + * as this value is the minimal slice if there are too many
>>> + * runnable tasks, see __sched_period().
>>> + */
>>> +static int is_short_task(struct task_struct *p)
>>> +{
>>> + return (p->se.sum_exec_runtime <=
>>> + (p->nvcsw * sysctl_sched_min_granularity));
>>> +}
>>> +
>>> /*
>>> * The purpose of wake_affine() is to quickly determine on which CPU we can run
>>> * soonest. For the purpose of speed we only consider the waking and previous
>>> @@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>>> if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
>>> return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
>>>
>>> - if (sync && cpu_rq(this_cpu)->nr_running == 1)
>>> + if ((sync && cpu_rq(this_cpu)->nr_running == 1) ||
>>> + is_short_task(cpu_curr(this_cpu)))
>>
>> This change seems to optimize for affine wakeup which benefits
>> tasks with producer-consumer pattern but is not ideal for Stream.
>> Currently the logic ends will do an affine wakeup even if sync
>> flag is not set:
>>
>> stream-4135 [029] d..2. 353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082
>> stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
>> stream-4135 [029] d..2. 353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30
>> <idle>-0 [030] dNh2. 353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030
>>
>> I believe a consideration should be made for the sync flag when
>> going for an affine wakeup. Also the check for short running could
>> be at the end after checking if prev_cpu is an available_idle_cpu.
>>
> We can move the short running check after the prev_cpu check. If we
> add the sync flag check would it shrink the coverage of this change?
I've ran some test where I just move the condition to check for
short running towards the end of task wake_affine_idle and also
incorporated suggestion from Tim in wake_affine_idle. I've shared
the results in a parallel thread.
> Since I found that there is limited scenario would enable the sync
> flag and we want to make the short running check a generic optimization.
> But yes, we can test with/without sync flag constrain to see which one
> gives better data.
>>> return this_cpu;
>>>
>>> if (available_idle_cpu(prev_cpu))
>>> @@ -6434,6 +6448,21 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>>> /* overloaded LLC is unlikely to have idle cpu/core */
>>> if (nr == 1)
>>> return -1;
>>> +
>>> + /*
>>> + * If nr is smaller than 60% of llc_weight, it
>>> + * indicates that the util_avg% is higher than 50%.
>>> + * This is calculated by SIS_UTIL in
>>> + * update_idle_cpu_scan(). The 50% util_avg indicates
>>> + * a half-busy LLC domain. System busier than this
>>> + * level could lower its bar to choose a compromised
>>> + * "idle" CPU. If the waker on target CPU is a short
>>> + * task and the wakee is also a short task, pick
>>> + * target directly.
>>> + */
>>> + if (!has_idle_core && (5 * nr < 3 * sd->span_weight) &&
>>> + is_short_task(p) && is_short_task(cpu_curr(target)))
>>> + return target;
>>
>> Pileup seen in hackbench could also be a result of an early
>> bailout here for smaller LLCs but I don't have any data to
>> substantiate that claim currently.
>>
>>> }
>>> }
>>>
>> Please let me know if you need any more data from the test
>> system for any of the benchmarks covered or if you would like
>> me to run any other benchmark on the test system.
> Thank you for your testing, I'll enable SNC to divide the LLC domain
> into smaller ones, and to see if the issue could be reproduced
> on my platform too, then I'll update my finding on this.
Thank you for testing with SNC enabled. It should get the LLC size
closer to the Zen3 system I've tested on.
>
> thanks,
> Chenyu
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists