linux-kernel - Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c05e39c-41f3-451c-b119-7b8662c1ceee@arm.com>
Date: Mon, 2 Sep 2024 09:43:54 +0200
From: Pierre Gondois <pierre.gondois@....com>
To: Yicong Yang <yangyicong@...wei.com>
Cc: yangyicong@...ilicon.com, linuxppc-dev@...ts.ozlabs.org, bp@...en8.de,
 dave.hansen@...ux.intel.com, mingo@...hat.com,
 linux-arm-kernel@...ts.infradead.org, mpe@...erman.id.au,
 peterz@...radead.org, tglx@...utronix.de, sudeep.holla@....com,
 will@...nel.org, catalin.marinas@....com, x86@...nel.org,
 linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
 gregkh@...uxfoundation.org, rafael@...nel.org, jonathan.cameron@...wei.com,
 prime.zeng@...ilicon.com, linuxarm@...wei.com, xuwei5@...wei.com,
 guohanjun@...wei.com
Subject: Re: [PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based
 system

Hello Yicong

On 8/30/24 11:35, Yicong Yang wrote:
> On 2024/8/29 20:46, Pierre Gondois wrote:
>> Hello Yicong,
>>
>> On 8/29/24 09:40, Yicong Yang wrote:
>>> Hi Pierre,
>>>
>>> On 2024/8/27 23:40, Pierre Gondois wrote:
>>>> Hello Yicong,
>>>> IIUC we are looking for the maximum number of threads a CPU can have
>>>> in the platform. But is is actually possible to have a platform with
>>>> CPUs not having the same number of threads ?
>>>>
>>>
>>> I was thinking of the heterogenous platforms, for example part of the
>>> cores have SMT and others don't (I don't have any though, but there
>>> should be some such platforms for arm64).
>>>
>>>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>>>> that the number of threads is either 1 or cpu_smt_max_threads (as
>>>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>>>> a (hypothetical) platform with:
>>>> - CPU0: 2 threads
>>>> - CPU1: 4 threads
>>>> should not be able to set the number of threads to 2, as
>>>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>>>
>>>
>>> It's a bit more complex. If we enable/disable the SMT using on/off control
>>> then we won't have this problem. We'll online all the CPUs on enabling and
>>> offline CPUs which is not primary thread and don't care about the thread
>>> number of each core.
>>>
>>> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
>>> and only enabled on powerpc. I think this interface is not enough to handle
>>> the hypothetical we assumed, since it assumes the homogenous platform that
>>> all the CPUs have the same thread number. But this should be fine for
>>> the platforms with SMT(with same thread number) and non-SMT cores, since it do
>>> indicates the real thread number of the SMT cores.
>>>
>>>> So if there is an assumption that all the CPUs have the same number of
>>>> threads, it should be sufficient to count the number of threads for one
>>>> CPU right ?
>>>>
>>>
>>> Naturally and conveniently I may think use of the threads number of CPU0 (boot
>>> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
>>> platform :(
>>>
>>>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>>>> I think we should either:
>>>> - use your method and check that all the CPUs have the same number of threads
>>>> - get the number of threads for one CPU and check that all the CPUs are
>>>>      identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>>>
>>> I think this won't be simpler since we still need to iterate all the CPUs to see
>>> if they have the same hetero_id (checked how we're using this ACPI tag in
>>> arm_acpi_register_pmu_device()). We could already know if they're identical by
>>> comparing the thread number and do the update if necessary.
>>>
>>>> - have a per-cpu cpu_smt_max_threads value ?
>>>
>>> This should be unnecessary in currently stage if there's no platforms
>>> with several kind cores have different thread number like in your assumption
>>> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
>>> a global cpu_smt_max_threads to enable the SMT control should be enough.
>>> Does it make sense?
>>
>> Yes, I agree with all the things you said:
>> - the current smt/control interface cannot handle asymmetric SMT platforms
>> - I am not aware if such platform exist so far
>>
>> I think it would still be good to check all the CPUs have the same number of
>> threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
>> patch attached (and to check CPUs have the same number of threads). Feel free
>> to take what you like/ignore what is inside the attached patch, or let me know
>> if I should submit a part in a separate patchset,
>>
> 
> Checked the changes, we can make this series as the basic support and a separate
> series of yours as a further support of SMT control on arm64:
> - support thread_id on ACPI based arm64 platform
> - support partial SMT control by enable CONFIG_SMT_NUM_THREADS_DYNAMIC

I'm not sure I fully understand what you mean. I can send patches to
enable SMT_NUM_THREADS_DYNAMIC on top of a v6 of yours IIUC. I let you pick
the changes that you estimate to make more sense in your serie (if that makes
sense) ? Please let met know if that works for you (or not).

> 
> some minor comments below.
> 
>> ----------------------------
>>
>>      [RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>>          - On arm64 ACPI systems, change the thread_id assignment to have
>>        increasing values starting from 0. This is already the case for DT
>>        based systems. Doing so allows to uniformly identify the n-th
>>        thread of a given CPU.
>>      - Check that all CPUs have the same number of threads (for DT/ACPI)
>>      - Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
>>          On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
>>      for socket0 and 128 + (0,32,64,96) for socket1:
>>      $ cd /sys/devices/system/cpu/smt/
>>      $ cat ../online
>>      0-255
>>      $ echo 2 > control
>>      $ cat ../online
>>      0-63,128-191
>>      $ echo 3 > control
>>      $ cat ../online
>>      0-95,128-223
>>      $ echo on > control
>>      $ cat ../online
>>      0-255
>>
> 
> So it's a real SMT4 system, it does make sense to have this partial SMT control
> support.

Yes right

> 
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index bd3bc2f5e0ec..1d8521483065 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -239,6 +239,7 @@ config ARM64
>>          select HAVE_GENERIC_VDSO
>>          select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
>>          select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
>> +       select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
>>          select IRQ_DOMAIN
>>          select IRQ_FORCED_THREADING
>>          select KASAN_VMALLOC if KASAN
>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
>> index 0f6ef432fb84..7dd211f81687 100644
>> --- a/arch/arm64/include/asm/topology.h
>> +++ b/arch/arm64/include/asm/topology.h
>> @@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
>>   #define arch_scale_hw_pressure topology_get_hw_pressure
>>   #define arch_update_hw_pressure        topology_update_hw_pressure
>>   
>> +#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
>> +#include <linux/cpu_smt.h>
>> +static inline bool topology_smt_thread_allowed(unsigned int cpu)
>> +{
>> +       return topology_thread_id(cpu) < cpu_smt_num_threads;
>> +}
>> +#endif
>> +
>>   #include <asm-generic/topology.h>
>>   
>>   #endif /* _ASM_ARM_TOPOLOGY_H */
>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> index f72e1e55b05e..a83babe19972 100644
>> --- a/arch/arm64/kernel/topology.c
>> +++ b/arch/arm64/kernel/topology.c
>> @@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
>>   {
>>          int thread_num, max_smt_thread_num = 1;
>>          struct xarray core_threads;
>> +       bool have_thread = false;
>>          int cpu, topology_id;
>> +       unsigned long i;
>>          void *entry;
>>   
>>          if (acpi_disabled)
>> @@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
>>                          return topology_id;
>>   
>>                  if (acpi_cpu_is_threaded(cpu)) {
>> +                       have_thread = true;
>> +
>>                          cpu_topology[cpu].thread_id = topology_id;
>>                          topology_id = find_acpi_cpu_topology(cpu, 1);
>>                          cpu_topology[cpu].core_id   = topology_id;
>> @@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
>>                          if (!entry) {
>>                                  xa_store(&core_threads, topology_id,
>>                                           xa_mk_value(1), GFP_KERNEL);
>> +                               cpu_topology[cpu].thread_id = 0;
>>                          } else {
>>                                  thread_num = xa_to_value(entry);
>> -                               thread_num++;
>> +                               cpu_topology[cpu].thread_id = thread_num++;
>>                                  xa_store(&core_threads, topology_id,
>>                                           xa_mk_value(thread_num), GFP_KERNEL);
>>   
>> @@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
>>                  cpu_topology[cpu].cluster_id = topology_id;
>>                  topology_id = find_acpi_cpu_topology_package(cpu);
>>                  cpu_topology[cpu].package_id = topology_id;
>> +
>> +               pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
>> +                       cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
>> +                       cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
>>          }
>>   
>> +       if (have_thread)
> 
> we could know this from max_smt_thread_num so have_thread maybe unnecessary.

Yes right, I will change that.

> 
>> +               xa_for_each(&core_threads, i, entry)
>> +                       if (xa_to_value(entry) != max_smt_thread_num)
>> +                               pr_warn("Heterogeneous SMT topology not handled");\
> 
> Wondering if we can avoid this 2nd loop. Greg express the worries of looping twice on large scale
> system in v1. Maybe we could use the hetero_id and get the necessary information in one loop, I need
> further think.

I found this comments (not sure this is what you are refering to):
- https://lore.kernel.org/linux-arm-kernel/20231011103303.00002d8f@Huawei.com/
- https://lore.kernel.org/all/20230921150333.c2zqigs3xxwcg4ln@bogus/T/#m406c4c16871ca7ae431beb20feccfb5e14498452

I don't see another way to do it right now. Also, I thing the complexity is in
O(2n), which should be better than the original O(n**2),

Regards,
Pierre

> 
> Thanks.
> 
>> +
>>          cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>   
>>          xa_destroy(&core_threads);
>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
>> index 95513abd664f..20d7f5b72ddd 100644
>> --- a/drivers/base/arch_topology.c
>> +++ b/drivers/base/arch_topology.c
>> @@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
>>          return cpu;
>>   }
>>   
>> -static void __init update_smt_num_threads(unsigned int num_threads)
>> +static void __init update_smt_num_threads(int num_threads)
>>   {
>> -       static unsigned int max_smt_thread_num = 1;
>> +       static int max_smt_thread_num = -1;
>>   
>> -       if (num_threads > max_smt_thread_num) {
>> +       if (max_smt_thread_num < 0) {
>>                  max_smt_thread_num = num_threads;
>>                  cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>> +       } else if (num_threads != max_smt_thread_num) {
>> +               pr_warn("Heterogeneous SMT topology not handled");
>>          }
>>   }
>>   
>> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
>> index b721f360d759..afdfdc64a0a1 100644
>> --- a/include/linux/arch_topology.h
>> +++ b/include/linux/arch_topology.h
>> @@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
>>   #define topology_physical_package_id(cpu)      (cpu_topology[cpu].package_id)
>>   #define topology_cluster_id(cpu)       (cpu_topology[cpu].cluster_id)
>>   #define topology_core_id(cpu)          (cpu_topology[cpu].core_id)
>> +#define topology_thread_id(cpu)                (cpu_topology[cpu].thread_id)
>>   #define topology_core_cpumask(cpu)     (&cpu_topology[cpu].core_sibling)
>>   #define topology_sibling_cpumask(cpu)  (&cpu_topology[cpu].thread_sibling)
>>   #define topology_cluster_cpumask(cpu)  (&cpu_topology[cpu].cluster_sibling)
>>
>> ----------------------------
>>
>>
>> Regards,
>> Pierre
>>
>>>
>>> Thanks,
>>> Yicong
>>>
>>>>
>>>> Same comment for the DT patch. If there is an assumption that all CPUs have
>>>> the same number of threads, then update_smt_num_threads() could only be called
>>>> once I suppose,
>>>>
>>>> Regards,
>>>> Pierre
>>>>
>>>>
>>>> On 8/6/24 10:53, Yicong Yang wrote:
>>>>> From: Yicong Yang <yangyicong@...ilicon.com>
>>>>>
>>>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>>>> get the SMT number of each core. Instead using a temporary xarray
>>>>> to record the SMT number of each core when building the topology
>>>>> and we can know the largest SMT number in the system. Then we can
>>>>> enable the support of SMT control.
>>>>>
>>>>> Signed-off-by: Yicong Yang <yangyicong@...ilicon.com>
>>>>> ---
>>>>>     arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>>>>     1 file changed, 24 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>>>> --- a/arch/arm64/kernel/topology.c
>>>>> +++ b/arch/arm64/kernel/topology.c
>>>>> @@ -15,8 +15,10 @@
>>>>>     #include <linux/arch_topology.h>
>>>>>     #include <linux/cacheinfo.h>
>>>>>     #include <linux/cpufreq.h>
>>>>> +#include <linux/cpu_smt.h>
>>>>>     #include <linux/init.h>
>>>>>     #include <linux/percpu.h>
>>>>> +#include <linux/xarray.h>
>>>>>       #include <asm/cpu.h>
>>>>>     #include <asm/cputype.h>
>>>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>>>>      */
>>>>>     int __init parse_acpi_topology(void)
>>>>>     {
>>>>> +    int thread_num, max_smt_thread_num = 1;
>>>>> +    struct xarray core_threads;
>>>>>         int cpu, topology_id;
>>>>> +    void *entry;
>>>>>           if (acpi_disabled)
>>>>>             return 0;
>>>>>     +    xa_init(&core_threads);
>>>>> +
>>>>>         for_each_possible_cpu(cpu) {
>>>>>             topology_id = find_acpi_cpu_topology(cpu, 0);
>>>>>             if (topology_id < 0)
>>>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>>>>                 cpu_topology[cpu].thread_id = topology_id;
>>>>>                 topology_id = find_acpi_cpu_topology(cpu, 1);
>>>>>                 cpu_topology[cpu].core_id   = topology_id;
>>>>> +
>>>>> +            entry = xa_load(&core_threads, topology_id);
>>>>> +            if (!entry) {
>>>>> +                xa_store(&core_threads, topology_id,
>>>>> +                     xa_mk_value(1), GFP_KERNEL);
>>>>> +            } else {
>>>>> +                thread_num = xa_to_value(entry);
>>>>> +                thread_num++;
>>>>> +                xa_store(&core_threads, topology_id,
>>>>> +                     xa_mk_value(thread_num), GFP_KERNEL);
>>>>> +
>>>>> +                if (thread_num > max_smt_thread_num)
>>>>> +                    max_smt_thread_num = thread_num;
>>>>> +            }
>>>>>             } else {
>>>>>                 cpu_topology[cpu].thread_id  = -1;
>>>>>                 cpu_topology[cpu].core_id    = topology_id;
>>>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>>>>             cpu_topology[cpu].package_id = topology_id;
>>>>>         }
>>>>>     +    cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>>>> +
>>>>> +    xa_destroy(&core_threads);
>>>>>         return 0;
>>>>>     }
>>>>>     #endif
>>>>
>>>> .