linux-kernel - Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230913140629.000052ad@Huawei.com>
Date:   Wed, 13 Sep 2023 14:06:29 +0100
From:   Jonathan Cameron <Jonathan.Cameron@...wei.com>
To:     Robin Murphy <robin.murphy@....com>
CC:     guojinhui <guojinhui.liam@...edance.com>,
        <catalin.marinas@....com>, <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <lizefan.x@...edance.com>,
        <will@...nel.org>
Subject: Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0

On Wed, 13 Sep 2023 12:23:37 +0100
Robin Murphy <robin.murphy@....com> wrote:

> On 2023-09-13 10:44, guojinhui wrote:
> >>> In EL0, it can get the register midr's value to distinguish vendor.
> >>> But it won't return real value of the register mpidr by using mrs
> >>> in EL0. The register mpidr's value is useful to obtain the cpu
> >>> topology information.  
> >>
> >> ...except there's no guarantee that the MPIDR value is anything other
> >> than a unique identifier. Proper topology information is already exposed
> >> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
> >> should be using that.
> >>
> >> Not to mention that userspace fundamentally can't guarantee it won't be
> >> migrated at just the wrong point and read the MPIDR of a different CPU
> >> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
> >> such that userspace has a stable and reliable source of information in
> >> case it needs to consider potential errata.)
> >>
> >> Thanks,
> >> Robin.
> >>
> >> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
> >> [2]
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt  
> > 
> > 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
> > the die infomation from the MPIDR value. Such as the kunpeng-920,
> > 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
> > 
> > ```
> > <DIE>.<CLUSTER>.<CORE>.<HT>
> > 
> > cpu = 0, 81080000
> > cpu = 1, 81080100
> > ...
> > cpu = 3, 81080300
> > cpu = 4, 81090000
> > ...
> > cpu = 7, 81090300
> > cpu = 8, 810a0000
> > ...
> > cpu = 11, 810a0300
> > cpu = 12, 810b0000
> > ...
> > cpu = 15, 810b0300
> > cpu = 16, 810c0000
> > ...
> > cpu = 19, 810c0300
> > cpu = 20, 810d0000
> > ...
> > cpu = 31, 810f0300
> > cpu = 32, 81180000
> > ...
> > cpu = 63, 811f0300
> > ```
> > 
> > we can get the die infomation by 0x810, 0x811.  
> 
> This is very much a platform-specific assumption, though, and once 
> you're assuming enough to be able to derive anything meaningful from a 
> raw MPIDR, you could equally derive the same thing from existing sources 
> like NUMA topology (if you know the SoC, then for sure you can know how 
> nodes relate to dies).
> 
> > 2. we can bind the task to the specific cpu to obtain the MPIDR value.  
> 
> ...unless that CPU then gets offlined, the task is forcibly migrated 
> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
> 
> > 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
> > in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
> > but it doesn't provide the information of die.
> > 
> > ```
> > # ls /sys/devices/system/cpu/cpu0/topology/
> > cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
> > # cat /sys/devices/system/cpu/cpu0/topology/*
> > 00000000,00000000,00000000,00000003
> > 0-1
> > 616
> > 00000000,00000000,00000000,00000001
> > 0
> > 6656
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 00000000,00000000,ffffffff,ffffffff
> > 0-63
> > 0
> > 00000000,00000000,00000000,00000001
> > 0
> > 
> > # uname -r
> > 6.6.0-rc1
> > ```
> > 
> > Then I check the code which parses the cpu topology infomation from PPTT:
> > 
> > ```
> > int __init parse_acpi_topology(void)
> > {
> >          int cpu, topology_id;
> > 
> >          if (acpi_disabled)
> >                  return 0;
> > 
> >          for_each_possible_cpu(cpu) {
> >                  topology_id = find_acpi_cpu_topology(cpu, 0);
> >                  if (topology_id < 0)
> >                          return topology_id;
> > 
> >                  if (acpi_cpu_is_threaded(cpu)) {
> >                          cpu_topology[cpu].thread_id = topology_id;
> >                          topology_id = find_acpi_cpu_topology(cpu, 1);
> >                          cpu_topology[cpu].core_id   = topology_id;
> >                  } else {
> >                          cpu_topology[cpu].thread_id  = -1;
> >                          cpu_topology[cpu].core_id    = topology_id;
> >                  }
> >                  topology_id = find_acpi_cpu_topology_cluster(cpu);
> >                  cpu_topology[cpu].cluster_id = topology_id;
> >                  topology_id = find_acpi_cpu_topology_package(cpu);
> >                  cpu_topology[cpu].package_id = topology_id;
> >          }
> > 
> >          return 0;
> > }
> > ```
> > 
> > Actually, it just gives the infomation of thread, cluster and package
> > though the PPTT provides the dies infomation.
> > 
> > May be we can implement some code to obtain die information from PPTT?  
> 
> I guess if any additional levels of hierarchy exist between the root 
> "package" level and what we infer to be the "cluster" level, then it 
> seems reasonable to me to infer the next level above "package" to be 
> "die". Then it looks like pretty much just a case of wiring up 
> topology_die_id() through the generic topology code.

Cluster was a vague enough concept that it was safe to define
it as the layer above a core.  Die is a lot less obvious because it
has more direct physical meaning.  There are interconnect and cache
topologies where something is shared above the cluster where it
doesn't correspond to a die (in the sense of a chiplet / one
piece of silicon).  For those you'd have another level in PPTT before
you reach the one that can be thought of as a die.

What about the die is of relevance for a userspace scheduler?
Sharing of L3, NUMA proximity domain / DDR controller locations?

All that is visible either via the cache topology or the NUMA node
topology.

If there is a strong reason (beyond 'x86 has it in a system register')
for having die explicitly provided in PPTT, then file a code first ACPI
proposal to add a flags, similar to the existing one that indicates
Physical Package.  Note that a strong justification would be needed.

Jonathan



> 
> Thanks,
> Robin.
>