linux-kernel - Re: [PATCH v3 15/16] arch_topology: Set cluster identifier in each core/thread from /cpu-map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtB8iPzEXipsJqNtd9-aJMKx-FAaiGMzOg58HgRQuo39iA@mail.gmail.com>
Date:   Fri, 10 Jun 2022 12:08:44 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Sudeep Holla <sudeep.holla@....com>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>,
        linux-kernel@...r.kernel.org, Atish Patra <atishp@...shpatra.org>,
        Atish Patra <atishp@...osinc.com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Qing Wang <wangqing@...o.com>,
        linux-arm-kernel@...ts.infradead.org,
        linux-riscv@...ts.infradead.org, Rob Herring <robh+dt@...nel.org>
Subject: Re: [PATCH v3 15/16] arch_topology: Set cluster identifier in each
 core/thread from /cpu-map

On Mon, 6 Jun 2022 at 12:22, Sudeep Holla <sudeep.holla@....com> wrote:
>
> On Fri, Jun 03, 2022 at 02:30:04PM +0200, Dietmar Eggemann wrote:
> > On 25/05/2022 10:14, Sudeep Holla wrote:
> > > Let us set the cluster identifier as parsed from the device tree
> > > cluster nodes within /cpu-map.
> > >
> > > We don't support nesting of clusters yet as there are no real hardware
> > > to support clusters of clusters.
> > >
> > > Signed-off-by: Sudeep Holla <sudeep.holla@....com>
> > > ---
> > >  drivers/base/arch_topology.c | 13 ++++++++-----
> > >  1 file changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > > index b8f0d72908c8..5f4f148a7769 100644
> > > --- a/drivers/base/arch_topology.c
> > > +++ b/drivers/base/arch_topology.c
> > > @@ -492,7 +492,7 @@ static int __init get_cpu_for_node(struct device_node *node)
> > >  }
> > >
> > >  static int __init parse_core(struct device_node *core, int package_id,
> > > -                        int core_id)
> > > +                        int cluster_id, int core_id)
> > >  {
> > >     char name[20];
> > >     bool leaf = true;
> > > @@ -508,6 +508,7 @@ static int __init parse_core(struct device_node *core, int package_id,
> > >                     cpu = get_cpu_for_node(t);
> > >                     if (cpu >= 0) {
> > >                             cpu_topology[cpu].package_id = package_id;
> > > +                           cpu_topology[cpu].cluster_id = cluster_id;
> > >                             cpu_topology[cpu].core_id = core_id;
> > >                             cpu_topology[cpu].thread_id = i;
> > >                     } else if (cpu != -ENODEV) {
> > > @@ -529,6 +530,7 @@ static int __init parse_core(struct device_node *core, int package_id,
> > >             }
> > >
> > >             cpu_topology[cpu].package_id = package_id;
> > > +           cpu_topology[cpu].cluster_id = cluster_id;
> >
> > I'm still not convinced that this is the right thing to do. Let's take
> > the juno board as an example here. And I guess our assumption should be
> > that we want to make CONFIG_SCHED_CLUSTER a default option, like
> > CONFIG_SCHED_MC is. Simply to avoid a unmanageable zoo of config-option
> > combinations.
> >
>
> Agreed on the config part.
>
> > (1) Scheduler Domains (SDs) w/o CONFIG_SCHED_CLUSTER:
> >
> > MC  <-- !!!
> > DIE
> >
> > (2) SDs w/ CONFIG_SCHED_CLUSTER:
> >
> > CLS <-- !!!
> > DIE
> >
>
> Yes I have seen this.
>
> > In (2) MC gets degenerated in sd_parent_degenerate() since CLS and MC
> > cpumasks are equal and MC does not have any additional flags compared to
> > CLS.
> > I'm not convinced that we can change the degeneration rules without
> > destroying other scenarios of the scheduler so that here MC stays and
> > CLS gets removed instead.
> >
>
> Why ? Are you suggesting that we shouldn't present the hardware cluster
> to the topology because of the above reason ? If so, sorry that is not a
> valid reason. We could add login to return NULL or appropriate value
> needed in cpu_clustergroup_mask id it matches MC level mask if we can't
> deal that in generic scheduler code. But the topology code can't be
> compromised for that reason as it is user visible.

I tend to agree with Dietmar. The legacy use of cluster node in DT
refers to the dynamiQ or legacy b.L cluster which is also aligned to
the LLC and the MC scheduling level. The new cluster level that has
been introduced recently does not target this level but some
intermediate levels either inside like for the kupeng920 or the v9
complex or outside like for the ampere altra. So I would say that
there is one cluster node level in DT that refers to the same MC/LLC
level and only an additional child/parent cluster node should be used
to fill the clustergroup_mask.

IIUC, we don't describe the dynamiQ level in ACPI which  uses cache
topology instead to define cpu_coregroup_mask whereas DT described the
dynamiQ instead of using cache topology. If you use cache topology
now, then you should skip the dynamiQ

Finally, even if CLS and MC have the same scheduling behavior for now,
they might ends up with different scheduling properties which would
mean that replacing MC level by CLS one for current SoC would become
wrong

>
> > Even though MC and CLS are doing the same right now from the perspective
> > of the scheduler, we should also see MC and not CLS under (2). CLS only
> > makes sense longer term if the scheduler also makes use of it (next to
> > MC) in the wakeup-path for instance. Especially when this happens, a
> > platform should always construct the same scheduler domain hierarchy, no
> > matter which CONFIG_SCHED_XXX options are enabled.
> >
> >
> > You can see this in update_siblings_masks()
> >
> >     if (last_level_cache_is_shared)
> >         set llc_sibling
> >
> >     if (cpuid_topo->package_id != cpu_topo->package_id)
> >         continue
> >
> >     set core_sibling
> >
> >   If llc cache and socket boundaries are congruent, llc_sibling and
> >   core_sibling are the same.
> >
> >     if (cpuid_topo->cluster_id != cpu_topo->cluster_id)
> >         continue
> >
> >     set cluster_sibling
> >
> >   Now we potentially set clusters. Since socket=0 is by default and we
> >   use the existing juno.dts, the cluster nodes end up being congruent to
> >   the llc cache cpumasks as well.
> >
>
> Correct and I see no problems as it matches what the hardware is. So I am
> not expecting any change in any cpumasks there as they all are aligned with
> the hardware.
>
> > The problem is that we code `llc cache` and `DT cluster nodes` as the
> > same thing in juno.dts.
>
> Why is that a problem ? If so, blame hardware and deal with it as we have to
> 😄 as usual we get all sorts of topology.
>
> > `Cluster0/1` are congruent with the llc information, although they should
> > be actually `socket0/1` right now.
>
> That was complete non-sense and wrong. Boot and check in ACPI mode.
>
> > But we can't set-up a cpu-map with a `socketX` containing `coreY` directly.
> > And then we use llc_sibling and cluster_sibling in two different SD
> > cpumask functions (cpu_coregroup_mask() and cpu_clustergroup_mask()).
> >
>
> We just need to deal with that. How is that dealt today with ACPI. My
> changes are making these aligned with ACPI. If something is broken as
> per you understanding with ACPI, then that needs fixing. The topology
> presented and parsed by ACPI is correct and we are aligning DT with that.
> There is no question on that.
>
> > Remember, CONFIG_SCHED_CLUSTER was introduced in ACPI/PPTT as a cpumask
> > which is a subset of the cpumasks of CONFIG_SCHED_MC.
> >
>
> But that change also introduced cluster masks into the topology which again
> aligns with my changes.
>
> > IMHO we probably could just introduce your changes w/o setting `cpu-map
> > cluster nodes` in DT for now. We would just have to make sure that for
> > all `*.dts` affected, the `llc cache` info can take over the old role of
> > the `cluster nodes`. In this case e.g. Juno ends up with MC, DIE no
> > matter if CONFIG_SCHED_CLUSTER is set or not.
>
> Sure I can agree with that if Juno ACPI is not broken. But I am sure it is
> broken based on your argument above. If it is, that needs fixing and this
> series just gets topology parsing in both ACPI and DT aligned, nothing
> more or nothing less. In the process it may be introducing clusters, but
> if it is not dealt correctly in ACPI, then it won't be in DT too and needs
> fixing anyways.
>
> --
> Regards,
> Sudeep