[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c50baa4-beef-54b9-74fe-1cbf6e8f8dbd@amd.com>
Date: Thu, 29 Sep 2022 23:04:50 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Honglei Wang <wanghonglei@...ichuxing.com>,
Chen Yu <yu.c.chen@...el.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Tim Chen <tim.c.chen@...el.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Juri Lelli <juri.lelli@...hat.com>,
Rik van Riel <riel@...riel.com>,
Aaron Lu <aaron.lu@...el.com>,
Abel Wu <wuyun.abel@...edance.com>,
Yicong Yang <yangyicong@...ilicon.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
Ingo Molnar <mingo@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched/fair: Choose the CPU where short task is
running during wake up
Hello Honglei,
Thank you for looking into this.
On 9/29/2022 12:29 PM, Honglei Wang wrote:
>
> [..snip..]
>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 914096c5b1ae..7519ab5b911c 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p)
>>>> return 1;
>>>> }
>>>> +/*
>>>> + * If a task switches in and then voluntarily relinquishes the
>>>> + * CPU quickly, it is regarded as a short running task.
>>>> + * sysctl_sched_min_granularity is chosen as the threshold,
>>>> + * as this value is the minimal slice if there are too many
>>>> + * runnable tasks, see __sched_period().
>>>> + */
>>>> +static int is_short_task(struct task_struct *p)
>>>> +{
>>>> + return (p->se.sum_exec_runtime <=
>>>> + (p->nvcsw * sysctl_sched_min_granularity));
>>>> +}
>>>> +
>>>> /*
>>>> * The purpose of wake_affine() is to quickly determine on which CPU we can run
>>>> * soonest. For the purpose of speed we only consider the waking and previous
>>>> @@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>>>> if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
>>>> return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
>>>> - if (sync && cpu_rq(this_cpu)->nr_running == 1)
>>>> + if ((sync && cpu_rq(this_cpu)->nr_running == 1) ||
>>>> + is_short_task(cpu_curr(this_cpu)))
>
> Seems it a bit breaks idle (or will be idle) purpose of wake_affine_idle() here. Maybe we can do it something like this?
>
> if ((sync || is_short_task(cpu_curr(this_cpu))) && cpu_rq(this_cpu)->nr_running == 1)
I believe this will still cause performance degradation on split-LLC
system for Stream like workloads. Based on the logs below, we can
have a situation as follows:
stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
Where sync is 0 but is_short_task() may return 1 and the
current_rq->nr_running is 1. This will lead to two Stream threads
getting placed on same LLC during wakeup which will cause cache
contention and performance degradation.
>
> Thanks,
> Honglei
>
>>>
>>> This change seems to optimize for affine wakeup which benefits
>>> tasks with producer-consumer pattern but is not ideal for Stream.
>>> Currently the logic ends will do an affine wakeup even if sync
>>> flag is not set:
>>>
>>> stream-4135 [029] d..2. 353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082
>>> stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1)
>>> stream-4135 [029] d..2. 353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30
>>> <idle>-0 [030] dNh2. 353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030
This is the exact situation observed during our testing.
>>>
>>> [..snip..]
>>>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists