linux-kernel - [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250717062036.432243-1-adamli@os.amperecomputing.com>
Date: Thu, 17 Jul 2025 06:20:30 +0000
From: Adam Li <adamli@...amperecomputing.com>
To: mingo@...hat.com,
	peterz@...radead.org,
	juri.lelli@...hat.com,
	vincent.guittot@...aro.org
Cc: dietmar.eggemann@....com,
	rostedt@...dmis.org,
	bsegall@...gle.com,
	mgorman@...e.de,
	vschneid@...hat.com,
	cl@...ux.com,
	linux-kernel@...r.kernel.org,
	patches@...erecomputing.com,
	shkaushik@...erecomputing.com,
	Adam Li <adamli@...amperecomputing.com>
Subject: [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork

Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.

{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
  * * * * * * * *    * * *  *

When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be
crowded with newly forked threads, at the same time CPU 0-7 can be idle.

The first patch 'Only update stats of allowed CPUs when looking for dst
group' *alone* can fix this imbalance issue. With this patch, performance
significantly improved for workload with frequent task fork, if the
workload is set to use part of CPUs in a schedule group.

And I think the second patch also makes sense in this scenario. If group
weight includes CPUs a task cannot use, group classification can be
incorrect.

Peter mentioned [1] that the second patch might also apply to
update_sg_lb_stats(). The third patch counts group weight from 'env->cpus'
(active CPUs). Group classification can be incorrect if group weight
includes inactive CPUs.

Peter also mentioned that update_sg_wakeup_stats() and update_sg_lb_stats()
are very similar, that they might be unified. The RFC patches 4-6 try to
refactor the two functions. The common logic is unified to a new function
update_sg_stats().

I tested with Specjbb workload on arm64 server. The patch set does not
introduce observable performance change. But the test cannot cover every
code path. Please review.

v2:
  Follow Peter's suggestions:
  1) Apply the second patch to update_sg_lb_stats().
  2) Refactor and unify update_sg_wakeup_stats() and update_sg_lb_stats().

v1:
  https://lore.kernel.org/lkml/20250701024549.40166-1-adamli@os.amperecomputing.com/

links:
[1]: https://lore.kernel.org/lkml/20250704091758.GG2001818@noisy.programming.kicks-ass.net/

Adam Li (6):
  sched/fair: Only update stats for allowed CPUs when looking for dst
    group
  sched/fair: Only count group weight for allowed CPUs when looking for
    dst group
  sched/fair: Only count group weight for CPUs doing load balance when
    looking for src group
  sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL
    pointers
  sched/fair: Introduce update_sg_stats()
  sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()

 kernel/sched/fair.c | 274 ++++++++++++++++++++++++--------------------
 1 file changed, 148 insertions(+), 126 deletions(-)

-- 
2.34.1