lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28523dc2-5fae-cf0f-0bd7-5a1978406356@loongson.cn>
Date: Thu, 17 Oct 2024 11:02:18 +0800
From: maobibo <maobibo@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>, Jianmin Lv <lvjianmin@...ngson.cn>
Cc: loongarch@...ts.linux.dev, linux-kernel@...r.kernel.org,
 lixianglai@...ngson.cn, WANG Xuerui <kernel@...0n.name>
Subject: Re: [PATCH] LoongArch: Fix cpu hotplug issue



On 2024/10/14 下午6:11, Huacai Chen wrote:
> On Mon, Oct 14, 2024 at 6:01 PM maobibo <maobibo@...ngson.cn> wrote:
>>
>>
>>
>> On 2024/10/14 下午5:29, Huacai Chen wrote:
>>> On Mon, Oct 14, 2024 at 5:12 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/14 下午4:23, Huacai Chen wrote:
>>>>> On Mon, Oct 14, 2024 at 4:01 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>
>>>>>> Huacai,
>>>>>>
>>>>>> On 2024/10/14 下午3:39, Huacai Chen wrote:
>>>>>>> On Mon, Oct 14, 2024 at 3:21 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>>>
>>>>>>>> Huacai,
>>>>>>>>
>>>>>>>> On 2024/10/14 下午3:05, Huacai Chen wrote:
>>>>>>>>> Hi, Bibo,
>>>>>>>>>
>>>>>>>>> I'm a little confused, so please correct me if I'm wrong.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 14, 2024 at 2:33 PM Bibo Mao <maobibo@...ngson.cn> wrote:
>>>>>>>>>>
>>>>>>>>>> On LoongArch system, there are two places to set cpu numa node. One
>>>>>>>>>> is in arch specified function smp_prepare_boot_cpu(), the other is
>>>>>>>>>> in generic function early_numa_node_init(). The latter will overwrite
>>>>>>>>>> the numa node information.
>>>>>>>>>>
>>>>>>>>>> However for hot-added cpu, cpu_logical_map() fails to its physical
>>>>>>>>>> cpuid at beginning since it is not enabled in ACPI MADT table. So
>>>>>>>>>> function early_cpu_to_node() also fails to get its numa node for
>>>>>>>>>> hot-added cpu, and generic function early_numa_node_init() will
>>>>>>>>>> overwrite incorrect numa node.
>>>>>>>>> For hot-added cpus, we will call acpi_map_cpu() -->
>>>>>>>>> acpi_map_cpu2node() --> set_cpuid_to_node(), and set_cpuid_to_node()
>>>>>>>>> operates on __cpuid_to_node[]. So I think early_cpu_to_node() should
>>>>>>>>> be correct?
>>>>>>>>
>>>>>>>> __cpuid_to_node[] is correct which is physical cpuid to numa node,
>>>>>>>> however cpu_logical_map(cpu) is not set. It fails to get physical cpuid
>>>>>>>> from logic cpu.
>>>>>>>>
>>>>>>>> int early_cpu_to_node(int cpu)
>>>>>>>> {
>>>>>>>>              int physid = cpu_logical_map(cpu);
>>>>>>>>
>>>>>>>> <<<<<<<<<<< Here physid is -1.
>>>>>>> early_cpu_to_node() is not supposed to be called after boot, and if it
>>>>>> Which calls early_cpu_to_node() after boot?
>>>>>>
>>>>>>> is really needed, I think a better solution is:
>>>>>>>
>>>>>>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
>>>>>>> index f1a74b80f22c..998cf45fd3b7 100644
>>>>>>> --- a/arch/loongarch/kernel/acpi.c
>>>>>>> +++ b/arch/loongarch/kernel/acpi.c
>>>>>>> @@ -311,6 +311,8 @@ static int __ref acpi_map_cpu2node(acpi_handle
>>>>>>> handle, int cpu, int physid)
>>>>>>>
>>>>>>>             nid = acpi_get_node(handle);
>>>>>>>             if (nid != NUMA_NO_NODE) {
>>>>>>> +               __cpu_number_map[physid] = cpu;
>>>>>>> +               __cpu_logical_map[cpu] = physid;
>>>>>> This does not solve the problem. The above has been done in function
>>>>>> cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);
>>>>>>
>>>>>> static int set_processor_mask(u32 id, u32 flags)
>>>>>> {
>>>>>> ...
>>>>>>             if (flags & ACPI_MADT_ENABLED) {
>>>>>>                     num_processors++;
>>>>>>                     set_cpu_present(cpu, true);
>>>>>>                     __cpu_number_map[cpuid] = cpu;
>>>>>>                     __cpu_logical_map[cpu] = cpuid;
>>>>>>             }
>>>>>>
>>>>>> The problem is that
>>>>>>             smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
>>>>>> <<<<<<<<<<<<<<<<
>>>>>> set_cpu_numa_node() is called in function smp_prepare_boot_cpu()
>>>>>>
>>>>>>             early_numa_node_init();
>>>>>>
>>>>>> static void __init early_numa_node_init(void)
>>>>>> {
>>>>>> #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>>>>>> #ifndef cpu_to_node
>>>>>>             int cpu;
>>>>>>
>>>>>>             /* The early_cpu_to_node() should be ready here. */
>>>>>>             for_each_possible_cpu(cpu)
>>>>>>                     set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
>>>>>> <<<<<<<<<<<<<<<<
>>>>>> * however here early_cpu_to_node is -1, so that cpu_to_node(cpu) will
>>>>>> always return -1 in late. *, which causes cpu hotadd problem.
>>>>> Still confused. For ACPI_MADT_ENABLED cpus, everything is right after
>>>>> early_numa_node_init(). For !ACPI_MADT_ENABLED cpus, cpu_to_node()
>>>>> returns -1 after early_numa_node_init() and before hot-add, but if
>>>>> acpi_map_cpu() do things right, cpu_to_node() should still work well
>>>>> after hot-add.
>>>> yes, if "_PXM" information for hot-add cpu handle exist, it works well.
>>>>
>>>> However if "_PXM" information does not exist, it falls back to legacy
>>>> method from smp_prepare_boot_cpu(). However cpu_numa_node information is
>>>> overwritten with -1 by later function early_numa_node_init().
>>> OK, now I finally get the key point. But no _PXM should be treated as
>>> a BIOS bug, right?
>> Currently if no numa information is added in qemu command line, there
>> will be no "_PXM" information for hot-added cpu. Such as for this command:
>>     qemu-system-loongarch64 -m 4096 -smp
>> 1,maxcpus=4,sockets=1,cores=4,threads=1
>>>
>>>   From comments we can see:
>>>
>>>                    * If possible cpus > present cpus here (e.g. some possible
>>>                    * cpus will be added by cpu-hotplug later), for possible but
>>>                    * not present cpus, early_cpu_to_node will return NUMA_NO_NODE,
>>>                    * and we just map them to online nodes in round-robin way.
>>>                    * Once hotplugged, new correct mapping will be built for them.
>>>
>>> This means even with this patch, cpu_to_node() can return a "valid"
>>> node rather than NUMA_NO_NODE, but this round-robin node is still an
>>> incorrect node.
>> The round-robin node is not standard, may it is copied from x86, I do
>> not know how to use it however. At least SRAT tables provides numa
>> information only that there is not logical cpu allocated in SRAT table
>> parsing. How about something like this?
> This looks better, but maybe Jianmin has a best solution?
> 
>>
>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
>> index f1a74b80f22c..bb9fdd318998 100644
>> --- a/arch/loongarch/kernel/acpi.c
>> +++ b/arch/loongarch/kernel/acpi.c
>> @@ -310,6 +310,12 @@ static int __ref acpi_map_cpu2node(acpi_handle
>> handle, int cpu, int physid)
>>           int nid;
>>
>>           nid = acpi_get_node(handle);
>> +       /*
>> +        * Fall back to srat numa node information if _PXM is not provided
> The comments may be wrong? I think SRAT node information is equal to _PXM.

"_PXM" information does not belong to SRAT table, it is in DSDT CPU 
device table.
For numa system "_PXM" information is required, however for uma system 
"_PXM" information is not required since there is only one memory node.

X86 linux uses SRAT table for present cpus and _PXM information in cpu 
device for hot-added cpus. For uma system, _PXM is absent and rr method 
is used for online nodes, its value is correct still since there is only 
one memory node.

Regards
Bibo Mao

>> +        */
>> +       if (nid != NUMA_NO_NODE)
>> +               nid = __cpuid_to_node[physid];
>> +
>>           if (nid != NUMA_NO_NODE) {
> Maybe this condition can be removed now?
> 
> Huacai
> 
>>                   set_cpuid_to_node(physid, nid);
>>                   node_set(nid, numa_nodes_parsed);
>>
>> Regards
>> Bibo Mao
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>>
>>>>> Huacai
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>
>>>>>>
>>>>>>>                     set_cpuid_to_node(physid, nid);
>>>>>>>                     node_set(nid, numa_nodes_parsed);
>>>>>>>                     set_cpu_numa_node(cpu, nid);
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>>              if (physid < 0)
>>>>>>>>                      return NUMA_NO_NODE;
>>>>>>>>
>>>>>>>>              return __cpuid_to_node[physid];
>>>>>>>> }
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Bibo Mao
>>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here static array __cpu_to_node and api set_early_cpu_to_node()
>>>>>>>>>> is added, so that early_cpu_to_node is consistent with function
>>>>>>>>>> cpu_to_node() for hot-added cpu.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
>>>>>>>>>> ---
>>>>>>>>>>       arch/loongarch/include/asm/numa.h |  2 ++
>>>>>>>>>>       arch/loongarch/kernel/numa.c      | 10 +++++++++-
>>>>>>>>>>       arch/loongarch/kernel/smp.c       |  1 +
>>>>>>>>>>       3 files changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/loongarch/include/asm/numa.h b/arch/loongarch/include/asm/numa.h
>>>>>>>>>> index b5f9de9f102e..e8e6fcfb006a 100644
>>>>>>>>>> --- a/arch/loongarch/include/asm/numa.h
>>>>>>>>>> +++ b/arch/loongarch/include/asm/numa.h
>>>>>>>>>> @@ -50,6 +50,7 @@ static inline void set_cpuid_to_node(int cpuid, s16 node)
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>>       extern int early_cpu_to_node(int cpu);
>>>>>>>>>> +extern void set_early_cpu_to_node(int cpu, s16 node);
>>>>>>>>>>
>>>>>>>>>>       #else
>>>>>>>>>>
>>>>>>>>>> @@ -57,6 +58,7 @@ static inline void early_numa_add_cpu(int cpuid, s16 node)    { }
>>>>>>>>>>       static inline void numa_add_cpu(unsigned int cpu)              { }
>>>>>>>>>>       static inline void numa_remove_cpu(unsigned int cpu)           { }
>>>>>>>>>>       static inline void set_cpuid_to_node(int cpuid, s16 node)      { }
>>>>>>>>>> +static inline void set_early_cpu_to_node(int cpu, s16 node)    { }
>>>>>>>>>>
>>>>>>>>>>       static inline int early_cpu_to_node(int cpu)
>>>>>>>>>>       {
>>>>>>>>>> diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c
>>>>>>>>>> index 84fe7f854820..62508aace644 100644
>>>>>>>>>> --- a/arch/loongarch/kernel/numa.c
>>>>>>>>>> +++ b/arch/loongarch/kernel/numa.c
>>>>>>>>>> @@ -34,6 +34,9 @@ static struct numa_meminfo numa_meminfo;
>>>>>>>>>>       cpumask_t cpus_on_node[MAX_NUMNODES];
>>>>>>>>>>       cpumask_t phys_cpus_on_node[MAX_NUMNODES];
>>>>>>>>>>       EXPORT_SYMBOL(cpus_on_node);
>>>>>>>>>> +static s16 __cpu_to_node[NR_CPUS] = {
>>>>>>>>>> +       [0 ... CONFIG_NR_CPUS - 1] = NUMA_NO_NODE
>>>>>>>>>> +};
>>>>>>>>>>
>>>>>>>>>>       /*
>>>>>>>>>>        * apicid, cpu, node mappings
>>>>>>>>>> @@ -117,11 +120,16 @@ int early_cpu_to_node(int cpu)
>>>>>>>>>>              int physid = cpu_logical_map(cpu);
>>>>>>>>>>
>>>>>>>>>>              if (physid < 0)
>>>>>>>>>> -               return NUMA_NO_NODE;
>>>>>>>>>> +               return __cpu_to_node[cpu];
>>>>>>>>>>
>>>>>>>>>>              return __cpuid_to_node[physid];
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +void set_early_cpu_to_node(int cpu, s16 node)
>>>>>>>>>> +{
>>>>>>>>>> +       __cpu_to_node[cpu] = node;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>       void __init early_numa_add_cpu(int cpuid, s16 node)
>>>>>>>>>>       {
>>>>>>>>>>              int cpu = __cpu_number_map[cpuid];
>>>>>>>>>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>>>>>>>>>> index 9afc2d8b3414..998668be858c 100644
>>>>>>>>>> --- a/arch/loongarch/kernel/smp.c
>>>>>>>>>> +++ b/arch/loongarch/kernel/smp.c
>>>>>>>>>> @@ -512,6 +512,7 @@ void __init smp_prepare_boot_cpu(void)
>>>>>>>>>>                              set_cpu_numa_node(cpu, node);
>>>>>>>>>>                      else {
>>>>>>>>>>                              set_cpu_numa_node(cpu, rr_node);
>>>>>>>>>> +                       set_early_cpu_to_node(cpu, rr_node);
>>>>>>>>>>                              rr_node = next_node_in(rr_node, node_online_map);
>>>>>>>>>>                      }
>>>>>>>>>>              }
>>>>>>>>>>
>>>>>>>>>> base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
>>>>>>>>>> --
>>>>>>>>>> 2.39.3
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ