[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250717062036.432243-1-adamli@os.amperecomputing.com>
Date: Thu, 17 Jul 2025 06:20:30 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: mingo@...hat.com,
peterz@...radead.org,
juri.lelli@...hat.com,
vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com,
rostedt@...dmis.org,
bsegall@...gle.com,
mgorman@...e.de,
vschneid@...hat.com,
cl@...ux.com,
linux-kernel@...r.kernel.org,
patches@...erecomputing.com,
shkaushik@...erecomputing.com,
Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork
Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.
{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
* * * * * * * * * * * *
When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be
crowded with newly forked threads, at the same time CPU 0-7 can be idle.
The first patch 'Only update stats of allowed CPUs when looking for dst
group' *alone* can fix this imbalance issue. With this patch, performance
significantly improved for workload with frequent task fork, if the
workload is set to use part of CPUs in a schedule group.
And I think the second patch also makes sense in this scenario. If group
weight includes CPUs a task cannot use, group classification can be
incorrect.
Peter mentioned [1] that the second patch might also apply to
update_sg_lb_stats(). The third patch counts group weight from 'env->cpus'
(active CPUs). Group classification can be incorrect if group weight
includes inactive CPUs.
Peter also mentioned that update_sg_wakeup_stats() and update_sg_lb_stats()
are very similar, that they might be unified. The RFC patches 4-6 try to
refactor the two functions. The common logic is unified to a new function
update_sg_stats().
I tested with Specjbb workload on arm64 server. The patch set does not
introduce observable performance change. But the test cannot cover every
code path. Please review.
v2:
Follow Peter's suggestions:
1) Apply the second patch to update_sg_lb_stats().
2) Refactor and unify update_sg_wakeup_stats() and update_sg_lb_stats().
v1:
https://lore.kernel.org/lkml/20250701024549.40166-1-adamli@os.amperecomputing.com/
links:
[1]: https://lore.kernel.org/lkml/20250704091758.GG2001818@noisy.programming.kicks-ass.net/
Adam Li (6):
sched/fair: Only update stats for allowed CPUs when looking for dst
group
sched/fair: Only count group weight for allowed CPUs when looking for
dst group
sched/fair: Only count group weight for CPUs doing load balance when
looking for src group
sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL
pointers
sched/fair: Introduce update_sg_stats()
sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()
kernel/sched/fair.c | 274 ++++++++++++++++++++++++--------------------
1 file changed, 148 insertions(+), 126 deletions(-)
--
2.34.1
Powered by blists - more mailing lists