linux-kernel - Re: [PATCH RESEND 0/3] Represent cluster topology and enable load balance between clusters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wvLw=US1ddJr=Jrim1vs-F2hpcQ29LQyqDENd7Fk=ssA@mail.gmail.com>
Date:   Sat, 2 Oct 2021 20:09:58 +1300
From:   Barry Song <21cnbao@...il.com>
To:     Tim Chen <tim.c.chen@...ux.intel.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Borislav Petkov <bp@...en8.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Ben Segall <bsegall@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Guodong Xu <guodong.xu@...aro.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        "Cc: Len Brown" <lenb@...nel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        LAK <linux-arm-kernel@...ts.infradead.org>,
        Linuxarm <linuxarm@...wei.com>,
        Mark Rutland <mark.rutland@....com>,
        Mel Gorman <mgorman@...e.de>, msys.mizuma@...il.com,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Steven Rostedt <rostedt@...dmis.org>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Sudeep Holla <sudeep.holla@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Will Deacon <will@...nel.org>, x86 <x86@...nel.org>,
        yangyicong <yangyicong@...wei.com>
Subject: Re: [PATCH RESEND 0/3] Represent cluster topology and enable load
 balance between clusters

On Sat, Oct 2, 2021 at 12:22 PM Tim Chen <tim.c.chen@...ux.intel.com> wrote:
>
> On Fri, 2021-10-01 at 16:57 +0200, Peter Zijlstra wrote:
> > On Fri, Oct 01, 2021 at 12:39:56PM +0200, Vincent Guittot wrote:
> > > Hi Barry,
> > >
> > > On Fri, 1 Oct 2021 at 12:32, Barry Song <21cnbao@...il.com> wrote:
> > > > Hi Vincent, Dietmar, Peter, Ingo,
> > > > Do you have any comment on this first series which exposes
> > > > cluster topology
> > > > of ARM64 kunpeng 920 & x86 Jacobsville and supports load balance
> > > > only for
> > > > the 1st stage?
> > > > I will be very grateful for your comments so that things can move
> > > > forward in the
> > > > right direction. I think Tim also looks forward to bringing up
> > > > cluster
> > > > support in
> > > > Jacobsville.
> > >
> > > This patchset makes sense to me and the addition of a new
> > > scheduling
> > > level to better reflect the HW topology goes in the right
> > > direction.
> >
> > So I had a look, dreading the selecti-idle-sibling changes, and was
> > pleasantly surprised they're gone :-)

Thanks, Peter and Vincent for reviewing.

My tiny scheduler team is still hardly working on the
select-idle-sibling changes.
And that one will be sent as a separate series as an improvement to this series.
I promise the wake-affine series won't be that scary when you see it
next time :-)

> >
> > As is, this does indeed look like something mergable without too much
> > hassle.
> >
> > The one questino I have is, do we want default y?
>
> I also agree that default y is preferable.

Thanks, Tim, for your comments.
I am ok to make it default "Y" for x86 after having a better doc as below:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd27b1cdac34..940eb1fe0abb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1002,12 +1002,17 @@ config NR_CPUS
          to the kernel image.

 config SCHED_CLUSTER
-       bool "Cluster scheduler support"
-       default n
+       def_bool y
+       prompt "Cluster scheduler support"
        help
         Cluster scheduler support improves the CPU scheduler's decision
-        making when dealing with machines that have clusters of CPUs
-        sharing L2 cache. If unsure say N here.
+        making when dealing with machines that have clusters of CPUs.
+        Cluster usually means a couple of CPUs which are placed closely
+        by sharing mid-level caches, last-level cache tags or internal
+        busses. For example, on x86 Jacobsville, each 4 CPUs share one
+        L2 cache. This feature isn't a universal win because it can bring
+        a cost of slightly increased overhead in some places. If unsure
+        say N here.

This also aligns well with SCHED_MC and SCHED_SMT in arch/x86/kconfig:
config SCHED_MC
    def_bool y
    prompt "Multi-core scheduler support"

config SCHED_SMT
    def_bool y if SMP

But ARM64 is running in a different tradition, arch/arm64/Kconfig has
SCHED_MC and SCHED_SMT as below:
   config SCHED_MC
   bool "Multi-core scheduler support"
   help
    ...

config SCHED_SMT
  bool "SMT scheduler support"
  help
  ...

I don't want to be an odd man :-)  So for ARM64, I vote keeping the
Kconfig file as is.  And I am planning to modify arch/arm64/defconfig
in second patchset(select-idle-sibling) by adding
CONFIG_SCHED_CLUSTR=y
as load-balance plus wake-affine changes seem to make cluster
scheduler much more widely win on kunpeng920 while doing load-
balance only can sometimes hurt. so I don't mind holding "N" for
a while on the ARM64 platform.

>
> >
> > The one nit I have is the Kconfig text, I'm not really sure that's
> > clarifying what a cluster is.
>
> Do you have a preference of a different name other than cluster?
> Or simply better documentation on what a cluster is for ARM64
> and x86 in Kconfig?

Anyway, naming is really a hard thing. cluster seems not a bad name for
ARM SoCs as besides kunpeng, some other ARM SoCs are also using this
name in specifications, for example, neoverse-n1, phytium etc.

Will we use the same name between x86 and ARM and just refine the document
as below? Does the below doc explain what is "cluster" better?

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7e4651a1aaf4..86821e83b935 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -993,8 +993,13 @@ config SCHED_CLUSTER
        bool "Cluster scheduler support"
        help
          Cluster scheduler support improves the CPU scheduler's decision
-         making when dealing with machines that have clusters(sharing internal
-         bus or sharing LLC cache tag). If unsure say N here.
+         making when dealing with machines that have clusters of CPUs.
+         Cluster usually means a couple of CPUs which are placed closely
+         by sharing mid-level caches, last-level cache tags or internal
+         busses. For example, on Hisilicon Kunpeng920, each 4 CPUs share
+         LLC cache tags. This feature isn't a universal win because it
+         can bring a cost of slightly increased overhead in some places.
+         If unsure say N here.

 config SCHED_SMT
        bool "SMT scheduler support"
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd27b1cdac34..940eb1fe0abb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1002,12 +1002,17 @@ config NR_CPUS
          to the kernel image.

 config SCHED_CLUSTER
-       bool "Cluster scheduler support"
-       default n
+       def_bool y
+       prompt "Cluster scheduler support"
        help
         Cluster scheduler support improves the CPU scheduler's decision
-        making when dealing with machines that have clusters of CPUs
-        sharing L2 cache. If unsure say N here.
+        making when dealing with machines that have clusters of CPUs.
+        Cluster usually means a couple of CPUs which are placed closely
+        by sharing mid-level caches, last-level cache tags or internal
+        busses. For example, on x86 Jacobsville, each 4 CPUs share one
+        L2 cache. This feature isn't a universal win because it can bring
+        a cost of slightly increased overhead in some places. If unsure
+        say N here.

 config SCHED_SMT
        def_bool y if SMP


>
> Thanks.
>
> Tim
>

Thanks
barry