lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0cccad0d-8632-5e3f-ba07-3e96ad5d8263@loongson.cn>
Date: Mon, 14 Oct 2024 18:01:07 +0800
From: maobibo <maobibo@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
 lixianglai@...ngson.cn, WANG Xuerui <kernel@...0n.name>
Subject: Re: [PATCH] LoongArch: Fix cpu hotplug issue



On 2024/10/14 下午5:29, Huacai Chen wrote:
> On Mon, Oct 14, 2024 at 5:12 PM maobibo <maobibo@...ngson.cn> wrote:
>>
>>
>>
>> On 2024/10/14 下午4:23, Huacai Chen wrote:
>>> On Mon, Oct 14, 2024 at 4:01 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>
>>>> Huacai,
>>>>
>>>> On 2024/10/14 下午3:39, Huacai Chen wrote:
>>>>> On Mon, Oct 14, 2024 at 3:21 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>
>>>>>> Huacai,
>>>>>>
>>>>>> On 2024/10/14 下午3:05, Huacai Chen wrote:
>>>>>>> Hi, Bibo,
>>>>>>>
>>>>>>> I'm a little confused, so please correct me if I'm wrong.
>>>>>>>
>>>>>>> On Mon, Oct 14, 2024 at 2:33 PM Bibo Mao <maobibo@...ngson.cn> wrote:
>>>>>>>>
>>>>>>>> On LoongArch system, there are two places to set cpu numa node. One
>>>>>>>> is in arch specified function smp_prepare_boot_cpu(), the other is
>>>>>>>> in generic function early_numa_node_init(). The latter will overwrite
>>>>>>>> the numa node information.
>>>>>>>>
>>>>>>>> However for hot-added cpu, cpu_logical_map() fails to its physical
>>>>>>>> cpuid at beginning since it is not enabled in ACPI MADT table. So
>>>>>>>> function early_cpu_to_node() also fails to get its numa node for
>>>>>>>> hot-added cpu, and generic function early_numa_node_init() will
>>>>>>>> overwrite incorrect numa node.
>>>>>>> For hot-added cpus, we will call acpi_map_cpu() -->
>>>>>>> acpi_map_cpu2node() --> set_cpuid_to_node(), and set_cpuid_to_node()
>>>>>>> operates on __cpuid_to_node[]. So I think early_cpu_to_node() should
>>>>>>> be correct?
>>>>>>
>>>>>> __cpuid_to_node[] is correct which is physical cpuid to numa node,
>>>>>> however cpu_logical_map(cpu) is not set. It fails to get physical cpuid
>>>>>> from logic cpu.
>>>>>>
>>>>>> int early_cpu_to_node(int cpu)
>>>>>> {
>>>>>>             int physid = cpu_logical_map(cpu);
>>>>>>
>>>>>> <<<<<<<<<<< Here physid is -1.
>>>>> early_cpu_to_node() is not supposed to be called after boot, and if it
>>>> Which calls early_cpu_to_node() after boot?
>>>>
>>>>> is really needed, I think a better solution is:
>>>>>
>>>>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
>>>>> index f1a74b80f22c..998cf45fd3b7 100644
>>>>> --- a/arch/loongarch/kernel/acpi.c
>>>>> +++ b/arch/loongarch/kernel/acpi.c
>>>>> @@ -311,6 +311,8 @@ static int __ref acpi_map_cpu2node(acpi_handle
>>>>> handle, int cpu, int physid)
>>>>>
>>>>>            nid = acpi_get_node(handle);
>>>>>            if (nid != NUMA_NO_NODE) {
>>>>> +               __cpu_number_map[physid] = cpu;
>>>>> +               __cpu_logical_map[cpu] = physid;
>>>> This does not solve the problem. The above has been done in function
>>>> cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);
>>>>
>>>> static int set_processor_mask(u32 id, u32 flags)
>>>> {
>>>> ...
>>>>            if (flags & ACPI_MADT_ENABLED) {
>>>>                    num_processors++;
>>>>                    set_cpu_present(cpu, true);
>>>>                    __cpu_number_map[cpuid] = cpu;
>>>>                    __cpu_logical_map[cpu] = cpuid;
>>>>            }
>>>>
>>>> The problem is that
>>>>            smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
>>>> <<<<<<<<<<<<<<<<
>>>> set_cpu_numa_node() is called in function smp_prepare_boot_cpu()
>>>>
>>>>            early_numa_node_init();
>>>>
>>>> static void __init early_numa_node_init(void)
>>>> {
>>>> #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>>>> #ifndef cpu_to_node
>>>>            int cpu;
>>>>
>>>>            /* The early_cpu_to_node() should be ready here. */
>>>>            for_each_possible_cpu(cpu)
>>>>                    set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
>>>> <<<<<<<<<<<<<<<<
>>>> * however here early_cpu_to_node is -1, so that cpu_to_node(cpu) will
>>>> always return -1 in late. *, which causes cpu hotadd problem.
>>> Still confused. For ACPI_MADT_ENABLED cpus, everything is right after
>>> early_numa_node_init(). For !ACPI_MADT_ENABLED cpus, cpu_to_node()
>>> returns -1 after early_numa_node_init() and before hot-add, but if
>>> acpi_map_cpu() do things right, cpu_to_node() should still work well
>>> after hot-add.
>> yes, if "_PXM" information for hot-add cpu handle exist, it works well.
>>
>> However if "_PXM" information does not exist, it falls back to legacy
>> method from smp_prepare_boot_cpu(). However cpu_numa_node information is
>> overwritten with -1 by later function early_numa_node_init().
> OK, now I finally get the key point. But no _PXM should be treated as
> a BIOS bug, right?
Currently if no numa information is added in qemu command line, there 
will be no "_PXM" information for hot-added cpu. Such as for this command:
   qemu-system-loongarch64 -m 4096 -smp 
1,maxcpus=4,sockets=1,cores=4,threads=1
> 
>  From comments we can see:
> 
>                   * If possible cpus > present cpus here (e.g. some possible
>                   * cpus will be added by cpu-hotplug later), for possible but
>                   * not present cpus, early_cpu_to_node will return NUMA_NO_NODE,
>                   * and we just map them to online nodes in round-robin way.
>                   * Once hotplugged, new correct mapping will be built for them.
> 
> This means even with this patch, cpu_to_node() can return a "valid"
> node rather than NUMA_NO_NODE, but this round-robin node is still an
> incorrect node.
The round-robin node is not standard, may it is copied from x86, I do 
not know how to use it however. At least SRAT tables provides numa 
information only that there is not logical cpu allocated in SRAT table 
parsing. How about something like this?

diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
index f1a74b80f22c..bb9fdd318998 100644
--- a/arch/loongarch/kernel/acpi.c
+++ b/arch/loongarch/kernel/acpi.c
@@ -310,6 +310,12 @@ static int __ref acpi_map_cpu2node(acpi_handle 
handle, int cpu, int physid)
         int nid;

         nid = acpi_get_node(handle);
+       /*
+        * Fall back to srat numa node information if _PXM is not provided
+        */
+       if (nid != NUMA_NO_NODE)
+               nid = __cpuid_to_node[physid];
+
         if (nid != NUMA_NO_NODE) {
                 set_cpuid_to_node(physid, nid);
                 node_set(nid, numa_nodes_parsed);

Regards
Bibo Mao
> 
> Huacai
> 
>>
>> Regards
>> Bibo Mao
>>>
>>> Huacai
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>
>>>>
>>>>>                    set_cpuid_to_node(physid, nid);
>>>>>                    node_set(nid, numa_nodes_parsed);
>>>>>                    set_cpu_numa_node(cpu, nid);
>>>>>
>>>>> Huacai
>>>>>
>>>>>>
>>>>>>             if (physid < 0)
>>>>>>                     return NUMA_NO_NODE;
>>>>>>
>>>>>>             return __cpuid_to_node[physid];
>>>>>> }
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> Here static array __cpu_to_node and api set_early_cpu_to_node()
>>>>>>>> is added, so that early_cpu_to_node is consistent with function
>>>>>>>> cpu_to_node() for hot-added cpu.
>>>>>>>>
>>>>>>>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
>>>>>>>> ---
>>>>>>>>      arch/loongarch/include/asm/numa.h |  2 ++
>>>>>>>>      arch/loongarch/kernel/numa.c      | 10 +++++++++-
>>>>>>>>      arch/loongarch/kernel/smp.c       |  1 +
>>>>>>>>      3 files changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/loongarch/include/asm/numa.h b/arch/loongarch/include/asm/numa.h
>>>>>>>> index b5f9de9f102e..e8e6fcfb006a 100644
>>>>>>>> --- a/arch/loongarch/include/asm/numa.h
>>>>>>>> +++ b/arch/loongarch/include/asm/numa.h
>>>>>>>> @@ -50,6 +50,7 @@ static inline void set_cpuid_to_node(int cpuid, s16 node)
>>>>>>>>      }
>>>>>>>>
>>>>>>>>      extern int early_cpu_to_node(int cpu);
>>>>>>>> +extern void set_early_cpu_to_node(int cpu, s16 node);
>>>>>>>>
>>>>>>>>      #else
>>>>>>>>
>>>>>>>> @@ -57,6 +58,7 @@ static inline void early_numa_add_cpu(int cpuid, s16 node)    { }
>>>>>>>>      static inline void numa_add_cpu(unsigned int cpu)              { }
>>>>>>>>      static inline void numa_remove_cpu(unsigned int cpu)           { }
>>>>>>>>      static inline void set_cpuid_to_node(int cpuid, s16 node)      { }
>>>>>>>> +static inline void set_early_cpu_to_node(int cpu, s16 node)    { }
>>>>>>>>
>>>>>>>>      static inline int early_cpu_to_node(int cpu)
>>>>>>>>      {
>>>>>>>> diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c
>>>>>>>> index 84fe7f854820..62508aace644 100644
>>>>>>>> --- a/arch/loongarch/kernel/numa.c
>>>>>>>> +++ b/arch/loongarch/kernel/numa.c
>>>>>>>> @@ -34,6 +34,9 @@ static struct numa_meminfo numa_meminfo;
>>>>>>>>      cpumask_t cpus_on_node[MAX_NUMNODES];
>>>>>>>>      cpumask_t phys_cpus_on_node[MAX_NUMNODES];
>>>>>>>>      EXPORT_SYMBOL(cpus_on_node);
>>>>>>>> +static s16 __cpu_to_node[NR_CPUS] = {
>>>>>>>> +       [0 ... CONFIG_NR_CPUS - 1] = NUMA_NO_NODE
>>>>>>>> +};
>>>>>>>>
>>>>>>>>      /*
>>>>>>>>       * apicid, cpu, node mappings
>>>>>>>> @@ -117,11 +120,16 @@ int early_cpu_to_node(int cpu)
>>>>>>>>             int physid = cpu_logical_map(cpu);
>>>>>>>>
>>>>>>>>             if (physid < 0)
>>>>>>>> -               return NUMA_NO_NODE;
>>>>>>>> +               return __cpu_to_node[cpu];
>>>>>>>>
>>>>>>>>             return __cpuid_to_node[physid];
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +void set_early_cpu_to_node(int cpu, s16 node)
>>>>>>>> +{
>>>>>>>> +       __cpu_to_node[cpu] = node;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      void __init early_numa_add_cpu(int cpuid, s16 node)
>>>>>>>>      {
>>>>>>>>             int cpu = __cpu_number_map[cpuid];
>>>>>>>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>>>>>>>> index 9afc2d8b3414..998668be858c 100644
>>>>>>>> --- a/arch/loongarch/kernel/smp.c
>>>>>>>> +++ b/arch/loongarch/kernel/smp.c
>>>>>>>> @@ -512,6 +512,7 @@ void __init smp_prepare_boot_cpu(void)
>>>>>>>>                             set_cpu_numa_node(cpu, node);
>>>>>>>>                     else {
>>>>>>>>                             set_cpu_numa_node(cpu, rr_node);
>>>>>>>> +                       set_early_cpu_to_node(cpu, rr_node);
>>>>>>>>                             rr_node = next_node_in(rr_node, node_online_map);
>>>>>>>>                     }
>>>>>>>>             }
>>>>>>>>
>>>>>>>> base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
>>>>>>>> --
>>>>>>>> 2.39.3
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ