linux-kernel - Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ab420c93-5baa-a486-d21a-6a305d498687@arm.com>
Date:   Wed, 13 Sep 2023 19:57:23 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc:     guojinhui <guojinhui.liam@...edance.com>, catalin.marinas@....com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        lizefan.x@...edance.com, will@...nel.org
Subject: Re: [PATCH] arm64: cpufeature: Expose the real mpidr value to EL0

On 2023-09-13 14:06, Jonathan Cameron wrote:
> On Wed, 13 Sep 2023 12:23:37 +0100
> Robin Murphy <robin.murphy@....com> wrote:
> 
>> On 2023-09-13 10:44, guojinhui wrote:
>>>>> In EL0, it can get the register midr's value to distinguish vendor.
>>>>> But it won't return real value of the register mpidr by using mrs
>>>>> in EL0. The register mpidr's value is useful to obtain the cpu
>>>>> topology information.
>>>>
>>>> ...except there's no guarantee that the MPIDR value is anything other
>>>> than a unique identifier. Proper topology information is already exposed
>>>> to userspace[1], as described by ACPI PPTT or Devicetree[2]. Userspace
>>>> should be using that.
>>>>
>>>> Not to mention that userspace fundamentally can't guarantee it won't be
>>>> migrated at just the wrong point and read the MPIDR of a different CPU
>>>> anyway. (This is why the MIDRs and REVIDRs are also reported via sysfs,
>>>> such that userspace has a stable and reliable source of information in
>>>> case it needs to consider potential errata.)
>>>>
>>>> Thanks,
>>>> Robin.
>>>>
>>>> [1] https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html
>>>> [2]
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt
>>>
>>> 1. If we can get the infomation of the vendor (by MIDR), i think it possible to obtain
>>> the die infomation from the MPIDR value. Such as the kunpeng-920,
>>> 4 cores per cluster, 8 clusters per die, whose MPIDR value is as follow:
>>>
>>> ```
>>> <DIE>.<CLUSTER>.<CORE>.<HT>
>>>
>>> cpu = 0, 81080000
>>> cpu = 1, 81080100
>>> ...
>>> cpu = 3, 81080300
>>> cpu = 4, 81090000
>>> ...
>>> cpu = 7, 81090300
>>> cpu = 8, 810a0000
>>> ...
>>> cpu = 11, 810a0300
>>> cpu = 12, 810b0000
>>> ...
>>> cpu = 15, 810b0300
>>> cpu = 16, 810c0000
>>> ...
>>> cpu = 19, 810c0300
>>> cpu = 20, 810d0000
>>> ...
>>> cpu = 31, 810f0300
>>> cpu = 32, 81180000
>>> ...
>>> cpu = 63, 811f0300
>>> ```
>>>
>>> we can get the die infomation by 0x810, 0x811.
>>
>> This is very much a platform-specific assumption, though, and once
>> you're assuming enough to be able to derive anything meaningful from a
>> raw MPIDR, you could equally derive the same thing from existing sources
>> like NUMA topology (if you know the SoC, then for sure you can know how
>> nodes relate to dies).
>>
>>> 2. we can bind the task to the specific cpu to obtain the MPIDR value.
>>
>> ...unless that CPU then gets offlined, the task is forcibly migrated
>> elsewhere, and ends up obtaining the *wrong* MPIDR value :(
>>
>>> 3. I have checked the sysfs interface `/sys/devices/system/cpu/cpuN/topology/*`
>>> in Ampere and kunpeng-920 with the latest linux kernel before i submit the patch,
>>> but it doesn't provide the information of die.
>>>
>>> ```
>>> # ls /sys/devices/system/cpu/cpu0/topology/
>>> cluster_cpus  cluster_cpus_list  cluster_id  core_cpus  core_cpus_list  core_id  core_siblings  core_siblings_list  package_cpus  package_cpus_list  physical_package_id  thread_siblings  thread_siblings_list
>>> # cat /sys/devices/system/cpu/cpu0/topology/*
>>> 00000000,00000000,00000000,00000003
>>> 0-1
>>> 616
>>> 00000000,00000000,00000000,00000001
>>> 0
>>> 6656
>>> 00000000,00000000,ffffffff,ffffffff
>>> 0-63
>>> 00000000,00000000,ffffffff,ffffffff
>>> 0-63
>>> 0
>>> 00000000,00000000,00000000,00000001
>>> 0
>>>
>>> # uname -r
>>> 6.6.0-rc1
>>> ```
>>>
>>> Then I check the code which parses the cpu topology infomation from PPTT:
>>>
>>> ```
>>> int __init parse_acpi_topology(void)
>>> {
>>>           int cpu, topology_id;
>>>
>>>           if (acpi_disabled)
>>>                   return 0;
>>>
>>>           for_each_possible_cpu(cpu) {
>>>                   topology_id = find_acpi_cpu_topology(cpu, 0);
>>>                   if (topology_id < 0)
>>>                           return topology_id;
>>>
>>>                   if (acpi_cpu_is_threaded(cpu)) {
>>>                           cpu_topology[cpu].thread_id = topology_id;
>>>                           topology_id = find_acpi_cpu_topology(cpu, 1);
>>>                           cpu_topology[cpu].core_id   = topology_id;
>>>                   } else {
>>>                           cpu_topology[cpu].thread_id  = -1;
>>>                           cpu_topology[cpu].core_id    = topology_id;
>>>                   }
>>>                   topology_id = find_acpi_cpu_topology_cluster(cpu);
>>>                   cpu_topology[cpu].cluster_id = topology_id;
>>>                   topology_id = find_acpi_cpu_topology_package(cpu);
>>>                   cpu_topology[cpu].package_id = topology_id;
>>>           }
>>>
>>>           return 0;
>>> }
>>> ```
>>>
>>> Actually, it just gives the infomation of thread, cluster and package
>>> though the PPTT provides the dies infomation.
>>>
>>> May be we can implement some code to obtain die information from PPTT?
>>
>> I guess if any additional levels of hierarchy exist between the root
>> "package" level and what we infer to be the "cluster" level, then it
>> seems reasonable to me to infer the next level above "package" to be
>> "die". Then it looks like pretty much just a case of wiring up
>> topology_die_id() through the generic topology code.
> 
> Cluster was a vague enough concept that it was safe to define
> it as the layer above a core.  Die is a lot less obvious because it
> has more direct physical meaning.  There are interconnect and cache
> topologies where something is shared above the cluster where it
> doesn't correspond to a die (in the sense of a chiplet / one
> piece of silicon).  For those you'd have another level in PPTT before
> you reach the one that can be thought of as a die.

Oh, silly me; I was already assuming that there may be intermediate 
levels between cluster and die, but of course you could have that while 
still *not* having multiple dies, and thus no distinct "die" level in 
PPTT at all when that's already sufficiently implied by "package", so 
indeed it couldn't work in general.

> What about the die is of relevance for a userspace scheduler?
> Sharing of L3, NUMA proximity domain / DDR controller locations?
> 
> All that is visible either via the cache topology or the NUMA node
> topology.
> 
> If there is a strong reason (beyond 'x86 has it in a system register')
> for having die explicitly provided in PPTT, then file a code first ACPI
> proposal to add a flags, similar to the existing one that indicates
> Physical Package.  Note that a strong justification would be needed.

I'd agree with that. Thanks for the clarifications!

Cheers,
Robin.