lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 17 May 2022 11:57:18 +0100
From:   Sudeep Holla <sudeep.holla@....com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Qing Wang <wangqing@...o.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J . Wysocki" <rafael@...nel.org>,
        Sudeep Holla <sudeep.holla@....com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] arch_topology: Use llc_id instead of package_id

Hi Dietmar,

Thanks for the detailed response.

On Tue, May 17, 2022 at 11:14:44AM +0200, Dietmar Eggemann wrote:
> On 16/05/2022 12:35, Sudeep Holla wrote:
> > On Fri, May 13, 2022 at 02:04:29PM +0200, Dietmar Eggemann wrote:
> >> On 13/05/2022 13:03, Sudeep Holla wrote:
> >>> On Fri, May 13, 2022 at 12:42:00PM +0200, Dietmar Eggemann wrote:
> >>>> On 13/05/2022 11:03, Sudeep Holla wrote:
> >>>>> On Fri, May 13, 2022 at 10:34:00AM +0200, Dietmar Eggemann wrote:
> 
> [...]
> 
> >> cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
> >> CLS
> >> MC
> >> ... I skip the NUMA levels
> >>
> >> # cat /proc/schedstat | awk '{print $1 " " $2 }' | grep ^[cd] | head -5
> >> cpu0 0
> >> domain0 00000000,00000000,0000000f <--  4 CPUs <-- cluster_id
> >> domain1 00000000,00000000,00ffffff <-- 24 CPUs
> >>
> >> If you use cluster_id for 1. level cluster then you cover MC's 24 CPUs
> > 
> > OK, the way I am looking at this from CPU topology perspective seem to be
> > different from what you are expecting here w.r.t sched_domains.
> > 
> > IMO, these cpumasks on its own must represent real CPU topology and how it
> > is used via cpu_*_mask by the scheduler domains is different.
> 
> I'm afraid that in case we want to change the mapping of scheduler
> (topology) flags (like SD_SHARE_PKG_RESOURCES) via `*_flags()` of
> default_topology[] we would have to consider all ACPI corner cases (see
> cpu_coregroup_mask()) as well.
> 
> See (1) further down.
>

Understood.

> [...]
> 
> > I see on Juno with SCHED_CLUSTER and cluster masks set, I see CLS and MC
> > domains.
> 
> Right but that's not really correct from the scheduler's POV.
>

OK

> Juno has LLC on cpumasks [0,3-5] and [1-2], not on [0-5]. So the most
> important information is the highest Sched Domain with the
> SD_SHARE_PKG_RESOURCES flag. This is the MC layer (cpu_core_flags() in
> default_topology[]). So the scheduler would think that [0-5] are sharing
> LLC.
>

Ah OK, but if LLC sibling masks are updated, cpu_coregroup_mask() takes
care of it IIUC, right ?

> You have LLC at:
> 
> cat /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list
>                                        ^^^^^^
> 0,3-5
> 
> but the scheduler sees the highest SD_SHARE_PKG_RESOURCES on MC:
> 
> cat /sys/kernel/debug/sched/domains/cpu0/domain1/flags
>                                          ^^^^^^^
> ... SD_SHARE_PKG_RESOURCES ...
> 
> [...]
> 
> >> For one level (MC) yes, but not for 2 (MC and CLS). And cluster_id was
> >> introduces for the 2. level.
> >>
> > 
> > That sounds wrong and not what ACPI PPTT code says. My series just aligns
> > with what is done with ACPI PPTT IIUC. I need to check that again if my
> > understand differs from what is being done. But the example of Kunpeng920
> > aligns with my understanding.
> 
> (1) IMHO, as long as we don't consider cache (llc) information in DT we
> can't have the same coverage as ACPI.
>

Agreed. But we are not changing any topology masks as per sched domain
requirements as they get exposed to the userspace as is.

> Let's take an ACPI !CONFIG_NUMA Kunpeng920 as an example.
> 
> # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
> CLS
> MC
> DIE
> 
> # cat /sys/kernel/debug/sched/domains/cpu0/domain*/flags
> SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE
>   SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING
>   ^^^^^^^^^^^^^^^^^^^^^^
> SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE
>   SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING
>   ^^^^^^^^^^^^^^^^^^^^^^
> SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE
>   SD_PREFER_SIBLING
> 
> cat /proc/schedstat | awk '{print $1 " " $2 }' | grep ^[cd] | head -4
> cpu0 0
> domain0 00000000,00000000,0000000f
> domain1 00000000,00000000,00ffffff <-- (2)
> domain2 ffffffff,ffffffff,ffffffff
> 
> cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
> 0
> cat /sys/devices/system/cpu/cpu0/topology/core_cpus_list
> 0
> cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
> 0-3
> cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list
> 0-47
> cat /sys/devices/system/cpu/cpu0/topology/package_cpus_list
> 0-47
> 
> The MC mask 00ffffff is not derived from any topology mask but from the
> llc (index3) mask:
> 
> cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
>                                        ^^^^^^
> 0-23 <-- (2)
>

Understood and on Juno if we get llc_siblings right, the sched domains
must be sorted correctly ?

>
> Coming back to the original request (the support of Armv9 L2 complexes
> in Android) from Qing on systems like QC SM8450:
>
>       .---------------.
> CPU   |0 1 2 3 4 5 6 7|
>       +---------------+
> uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
>       +---------------+
>   L2  |   |   | | | | |
>       +---------------+
>   L3  |<--         -->|
>       +---------------+
>       |<-- cluster -->|
>       +---------------+
>       |<--   DSU   -->|
>       '---------------'
>
> This still wouldn't be possible. We know that Phantom Domain (grouping
> after uarch) is not supported in mainline but heavily used in Android
> (legacy deps).
>

Correct, unless you convince to get a suitable notion of *whatever*
phantom domains represent into the DT bindings, they don't exist.
If someone wants to support this config, they must first represent that
in the firmware so that OS can infer information from the same.

> If we say we only support L2 sharing (asymmetric here since only CPU0-3
> have it !!!) and we don't support Phantom then your approach will work
> for such systems.

Thanks, as I said what is *Phantom* domain ;) ? Can you point me to the
DT or ACPI binding for the same ? Just kidding, I know they don't exist.

Anyways, I understand your concern that llc_sibling must go with my set
of topology changes which I agree. Is that the only concern ?

--
Regards,
Sudeep

Powered by blists - more mailing lists