lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 29 Sep 2022 22:49:24 +0530 From: K Prateek Nayak <kprateek.nayak@....com> To: Chen Yu <yu.c.chen@...el.com> Cc: Peter Zijlstra <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>, Tim Chen <tim.c.chen@...el.com>, Mel Gorman <mgorman@...hsingularity.net>, Juri Lelli <juri.lelli@...hat.com>, Rik van Riel <riel@...riel.com>, Aaron Lu <aaron.lu@...el.com>, Abel Wu <wuyun.abel@...edance.com>, Yicong Yang <yangyicong@...ilicon.com>, "Gautham R . Shenoy" <gautham.shenoy@....com>, Ingo Molnar <mingo@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org Subject: Re: [RFC PATCH] sched/fair: Choose the CPU where short task is running during wake up Hello Chenyu, Thank you for looking into this issue. On 9/29/2022 10:55 AM, Chen Yu wrote: > Hi Prateek, > [..snip..] > >>> kernel/sched/fair.c | 31 ++++++++++++++++++++++++++++++- >>> 1 file changed, 30 insertions(+), 1 deletion(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index 914096c5b1ae..7519ab5b911c 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -6020,6 +6020,19 @@ static int wake_wide(struct task_struct *p) >>> return 1; >>> } >>> >>> +/* >>> + * If a task switches in and then voluntarily relinquishes the >>> + * CPU quickly, it is regarded as a short running task. >>> + * sysctl_sched_min_granularity is chosen as the threshold, >>> + * as this value is the minimal slice if there are too many >>> + * runnable tasks, see __sched_period(). >>> + */ >>> +static int is_short_task(struct task_struct *p) >>> +{ >>> + return (p->se.sum_exec_runtime <= >>> + (p->nvcsw * sysctl_sched_min_granularity)); >>> +} >>> + >>> /* >>> * The purpose of wake_affine() is to quickly determine on which CPU we can run >>> * soonest. For the purpose of speed we only consider the waking and previous >>> @@ -6050,7 +6063,8 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync) >>> if (available_idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu)) >>> return available_idle_cpu(prev_cpu) ? prev_cpu : this_cpu; >>> >>> - if (sync && cpu_rq(this_cpu)->nr_running == 1) >>> + if ((sync && cpu_rq(this_cpu)->nr_running == 1) || >>> + is_short_task(cpu_curr(this_cpu))) >> >> This change seems to optimize for affine wakeup which benefits >> tasks with producer-consumer pattern but is not ideal for Stream. >> Currently the logic ends will do an affine wakeup even if sync >> flag is not set: >> >> stream-4135 [029] d..2. 353.580953: sched_waking: comm=stream pid=4129 prio=120 target_cpu=082 >> stream-4135 [029] d..2. 353.580957: select_task_rq_fair: wake_affine_idle: Select this_cpu: sync(0) rq->nr_running(1) is_short_task(1) >> stream-4135 [029] d..2. 353.580960: sched_migrate_task: comm=stream pid=4129 prio=120 orig_cpu=82 dest_cpu=30 >> <idle>-0 [030] dNh2. 353.580993: sched_wakeup: comm=stream pid=4129 prio=120 target_cpu=030 >> >> I believe a consideration should be made for the sync flag when >> going for an affine wakeup. Also the check for short running could >> be at the end after checking if prev_cpu is an available_idle_cpu. >> > We can move the short running check after the prev_cpu check. If we > add the sync flag check would it shrink the coverage of this change? I've ran some test where I just move the condition to check for short running towards the end of task wake_affine_idle and also incorporated suggestion from Tim in wake_affine_idle. I've shared the results in a parallel thread. > Since I found that there is limited scenario would enable the sync > flag and we want to make the short running check a generic optimization. > But yes, we can test with/without sync flag constrain to see which one > gives better data. >>> return this_cpu; >>> >>> if (available_idle_cpu(prev_cpu)) >>> @@ -6434,6 +6448,21 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool >>> /* overloaded LLC is unlikely to have idle cpu/core */ >>> if (nr == 1) >>> return -1; >>> + >>> + /* >>> + * If nr is smaller than 60% of llc_weight, it >>> + * indicates that the util_avg% is higher than 50%. >>> + * This is calculated by SIS_UTIL in >>> + * update_idle_cpu_scan(). The 50% util_avg indicates >>> + * a half-busy LLC domain. System busier than this >>> + * level could lower its bar to choose a compromised >>> + * "idle" CPU. If the waker on target CPU is a short >>> + * task and the wakee is also a short task, pick >>> + * target directly. >>> + */ >>> + if (!has_idle_core && (5 * nr < 3 * sd->span_weight) && >>> + is_short_task(p) && is_short_task(cpu_curr(target))) >>> + return target; >> >> Pileup seen in hackbench could also be a result of an early >> bailout here for smaller LLCs but I don't have any data to >> substantiate that claim currently. >> >>> } >>> } >>> >> Please let me know if you need any more data from the test >> system for any of the benchmarks covered or if you would like >> me to run any other benchmark on the test system. > Thank you for your testing, I'll enable SNC to divide the LLC domain > into smaller ones, and to see if the issue could be reproduced > on my platform too, then I'll update my finding on this. Thank you for testing with SNC enabled. It should get the LLC size closer to the Zen3 system I've tested on. > > thanks, > Chenyu -- Thanks and Regards, Prateek
Powered by blists - more mailing lists