linux-kernel - [PATCH 0/4] sched: Fix cluster scheduling in the presence of asymmetric capacity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250627-rneri-fix-cas-clusters-v1-0-121ffb50bbc7@linux.intel.com>
Date: Fri, 27 Jun 2025 14:45:26 -0700
From: Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
 Juri Lelli <juri.lelli@...hat.com>, 
 Vincent Guittot <vincent.guittot@...aro.org>, 
 Dietmar Eggemann <dietmar.eggemann@....com>, 
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, 
 Tim C Chen <tim.c.chen@...ux.intel.com>, Barry Song <baohua@...nel.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>, 
 ricardo.neri@...el.com, linux-kernel@...r.kernel.org, 
 Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>
Subject: [PATCH 0/4] sched: Fix cluster scheduling in the presence of
 asymmetric capacity

Cluster scheduling balances load among clusters of CPUs sharing a resource
[1]. It was broken on Intel hybrid processors using asymmetric packing of
tasks. Tim fixed that [2]. It is broken again when combined with asymmetric
CPU capacity.

The diagram below shows a processor with big (B) and small (s) CPUs. Also,
small CPUs are grouped in cluster sharing mid-level cache. This topology is
common in Intel hybrid processors.

         ------   ------
         | B  |   | B  |   -----------------   -----------------
         |    |   |    |   | s | s | s | s |   | s | s | s | s |
         ------   ------   -----------------   -----------------
         | L2 |   | L2 |   |      L2       |   |       L2      |
         -------------------------------------------------------
         |                          L3                         |
         -------------------------------------------------------

On a partially busy system (one with idle CPUs; busy CPUs have one task
each), scheduling for asymmetric capacity ensures that misfit tasks are
placed on the big CPUs. The remaining tasks, misfit or not, run on the
small CPUs. If CONFIG_SCHED_CLUSTER is enabled, these remaining tasks
should be evenly spread between the two small-CPU clusters.

This does not happen today because various checks in the load balancer
prevent a small CPU in one cluster from pulling tasks from another:

  * A bug in update_sd_pick_busiest() causes it to not check for capacity
    when preferring a fully_busy big CPU (which it cannot help) vs a has_
    spare small-CPU cluster (which it can).

  * Accounting misfit load in a group is pointless if the destination CPU
    is equally a small CPU. Moreover, update_sd_pick_busiest() will not
    pick such group as busiest anyway.

  * Once a busiest group has been identified, sched_balance_find_src_rq()
    will refuse to migrate tasks to CPUs of equal capacity.

  * The SD_PREFER_SIBLING flag is removed from scheduling domains with
    asymmetric capacity.

I address these issues in this series. Details are in the changelog of each
patch.

I tested these patches on an Alder Lake system with Hyper-Threading
disabled. I also tested with CONFIG_SCHED_CLUSTER=n to ensure that
processors without clusters continue to work.

[1]. https://lore.kernel.org/r/20210924085104.44806-1-21cnbao@gmail.com/
[2]. https://lore.kernel.org/r/cover.1688770494.git.tim.c.chen@linux.intel.com/

---
Ricardo Neri (4):
      sched/fair: Always skip fully_busy higher-capacity groups for load balance
      sched/fair: Ignore misfit load if the destination CPU cannot help
      sched/fair: Allow load balancing between CPUs of equal capacity
      sched/topology: Keep SD_PREFER_SIBLING for domains with clusters

 kernel/sched/fair.c     | 27 +++++++++++++++------------
 kernel/sched/topology.c | 11 +++++++++--
 2 files changed, 24 insertions(+), 14 deletions(-)
---
base-commit: e51a38e71974982abb3f2f16141763a1511f7a3f
change-id: 20250620-rneri-fix-cas-clusters-bb4287d1e152

Best regards,
-- 
Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>