lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 21 Nov 2023 09:58:19 +0100 From: Pierre Gondois <pierre.gondois@....com> To: Vincent Guittot <vincent.guittot@...aro.org> Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com> Subject: Re: [PATCH] sched/fair: Use all little CPUs for CPU-bound workload Hello Vincent, On 11/17/23 15:17, Vincent Guittot wrote: > Hi Pierre, > > On Fri, 10 Nov 2023 at 13:59, Pierre Gondois <pierre.gondois@....com> wrote: >> >> Running n CPU-bound tasks on an n CPUs platform with asymmetric CPU >> capacity might result in a task placement where two tasks run on a >> big CPU and none on a little CPU. This placement could be more optimal >> by using all CPUs. >> >> Testing platform: >> Juno-r2: >> - 2 big CPUs (1-2), maximum capacity of 1024 >> - 4 little CPUs (0,3-5), maximum capacity of 383 >> >> Testing workload ([1]): >> Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks >> is affine to a CPU, except for: >> - one little CPU which is left idle. >> - one big CPU which has 2 tasks affine. >> After the 100ms (step 2), remove the cpumask affinity. >> >> Before patch: >> During step 2, the load balancer running from the idle CPU tags sched >> domains as: >> - little CPUs: 'group_has_spare'. Indeed, 3 CPU-bound tasks run on a >> 4 CPUs sched-domain, and the idle CPU provides enough spare >> capacity. >> - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs >> sched-domain, so the following path is used: >> group_is_overloaded() >> \-if (sgs->sum_nr_running <= sgs->group_weight) return true; > > This remembers me a similar discussion with Qais: > https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/ Yes indeed, this is exactly the same case and same backstory actually. > >> >> The following path which would change the migration type to >> 'migrate_task' is not taken: >> calculate_imbalance() >> \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) >> as the local group has some spare capacity, so the imbalance >> is not 0. >> >> The migration type requested is 'migrate_util' and the busiest >> runqueue is the big CPU's runqueue having 2 tasks (each having a >> utilization of 512). The idle little CPU cannot pull one of these >> task as its capacity is too small for the task. The following path >> is used: >> detach_tasks() >> \-case migrate_util: >> \-if (util > env->imbalance) goto next; >> >> After patch: >> When the local group has spare capacity and the busiest group is at >> least tagged as 'group_fully_busy', if the local group has more CPUs > > the busiest group is more than 'group_fully_busy' > >> than CFS tasks and the busiest group more CFS tasks than CPUs, >> request a 'migrate_task' type migration. >> >> Improvement: >> Running the testing workload [1] with the step 2 representing >> a ~10s load for a big CPU: >> Before patch: ~19.3s >> After patch: ~18s (-6.7%) >> >> The issue only happens at the DIE level on platforms able to have >> 'migrate_util' migration types, i.e. no DynamIQ systems where >> SD_SHARE_PKG_RESOURCES is set. >> >> Signed-off-by: Pierre Gondois <pierre.gondois@....com> >> --- >> kernel/sched/fair.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index df348aa55d3c..5a215c96d420 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -10495,6 +10495,23 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s >> env->imbalance = max(local->group_capacity, local->group_util) - >> local->group_util; >> >> + /* >> + * On an asymmetric system with CPU-bound tasks, a >> + * migrate_util balance might not be able to migrate a >> + * task from a big to a little CPU, letting a little >> + * CPU unused. >> + * If local has an empty CPU and busiest is overloaded, > > group_has_spare doesn't mean that the local has an empty cpu but that > one or more cpu might be idle some time which could not be the case > when the load balance happen > >> + * balance one task with a migrate_task migration type >> + * instead. >> + */ >> + if (env->sd->flags & SD_ASYM_CPUCAPACITY && >> + local->sum_nr_running < local->group_weight && >> + busiest->sum_nr_running > busiest->group_weight) { >> + env->migration_type = migrate_task; >> + env->imbalance = 1; > > I wonder if this is too aggressive. We can have cases where > (local->sum_nr_running < local->group_weight) at the time of the load > balance because one cpu can be shortly idle and you will migrate the > task that will then compete with another one on a little core. So Ok right. > maybe you should do something similar to the migrate_load in > detach_tasks like: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index fc8e9ced6aa8..3a04fa0f1eae 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8977,7 +8977,7 @@ static int detach_tasks(struct lb_env *env) > case migrate_util: > util = task_util_est(p); > > - if (util > env->imbalance) > + if (shr_bound(util, > env->sd->nr_balance_failed) > env->imbalance) > goto next; > > env->imbalance -= util; > -- > > This should cover more intermediate cases and would benefit to more > topology and cases Your change also solves the issue. I'll try to see if this might raise other issues, Thanks for the suggestion, Regards, Pierre > >> + return; >> + } >> + >> /* >> * In some cases, the group's utilization is max or even >> * higher than capacity because of migrations but the >> -- >> 2.25.1 >>
Powered by blists - more mailing lists