lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAhV-H7+4xg3R-JJhRgVfvnj4Yy_BRBe3-2H8Jf5xwz9_oj9CQ@mail.gmail.com>
Date: Thu, 17 Oct 2024 11:56:36 +0800
From: Huacai Chen <chenhuacai@...nel.org>
To: maobibo <maobibo@...ngson.cn>
Cc: Jianmin Lv <lvjianmin@...ngson.cn>, loongarch@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, lixianglai@...ngson.cn, 
	WANG Xuerui <kernel@...0n.name>
Subject: Re: [PATCH] LoongArch: Fix cpu hotplug issue

On Thu, Oct 17, 2024 at 11:02 AM maobibo <maobibo@...ngson.cn> wrote:
>
>
>
> On 2024/10/14 下午6:11, Huacai Chen wrote:
> > On Mon, Oct 14, 2024 at 6:01 PM maobibo <maobibo@...ngson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/14 下午5:29, Huacai Chen wrote:
> >>> On Mon, Oct 14, 2024 at 5:12 PM maobibo <maobibo@...ngson.cn> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/10/14 下午4:23, Huacai Chen wrote:
> >>>>> On Mon, Oct 14, 2024 at 4:01 PM maobibo <maobibo@...ngson.cn> wrote:
> >>>>>>
> >>>>>> Huacai,
> >>>>>>
> >>>>>> On 2024/10/14 下午3:39, Huacai Chen wrote:
> >>>>>>> On Mon, Oct 14, 2024 at 3:21 PM maobibo <maobibo@...ngson.cn> wrote:
> >>>>>>>>
> >>>>>>>> Huacai,
> >>>>>>>>
> >>>>>>>> On 2024/10/14 下午3:05, Huacai Chen wrote:
> >>>>>>>>> Hi, Bibo,
> >>>>>>>>>
> >>>>>>>>> I'm a little confused, so please correct me if I'm wrong.
> >>>>>>>>>
> >>>>>>>>> On Mon, Oct 14, 2024 at 2:33 PM Bibo Mao <maobibo@...ngson.cn> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On LoongArch system, there are two places to set cpu numa node. One
> >>>>>>>>>> is in arch specified function smp_prepare_boot_cpu(), the other is
> >>>>>>>>>> in generic function early_numa_node_init(). The latter will overwrite
> >>>>>>>>>> the numa node information.
> >>>>>>>>>>
> >>>>>>>>>> However for hot-added cpu, cpu_logical_map() fails to its physical
> >>>>>>>>>> cpuid at beginning since it is not enabled in ACPI MADT table. So
> >>>>>>>>>> function early_cpu_to_node() also fails to get its numa node for
> >>>>>>>>>> hot-added cpu, and generic function early_numa_node_init() will
> >>>>>>>>>> overwrite incorrect numa node.
> >>>>>>>>> For hot-added cpus, we will call acpi_map_cpu() -->
> >>>>>>>>> acpi_map_cpu2node() --> set_cpuid_to_node(), and set_cpuid_to_node()
> >>>>>>>>> operates on __cpuid_to_node[]. So I think early_cpu_to_node() should
> >>>>>>>>> be correct?
> >>>>>>>>
> >>>>>>>> __cpuid_to_node[] is correct which is physical cpuid to numa node,
> >>>>>>>> however cpu_logical_map(cpu) is not set. It fails to get physical cpuid
> >>>>>>>> from logic cpu.
> >>>>>>>>
> >>>>>>>> int early_cpu_to_node(int cpu)
> >>>>>>>> {
> >>>>>>>>              int physid = cpu_logical_map(cpu);
> >>>>>>>>
> >>>>>>>> <<<<<<<<<<< Here physid is -1.
> >>>>>>> early_cpu_to_node() is not supposed to be called after boot, and if it
> >>>>>> Which calls early_cpu_to_node() after boot?
> >>>>>>
> >>>>>>> is really needed, I think a better solution is:
> >>>>>>>
> >>>>>>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
> >>>>>>> index f1a74b80f22c..998cf45fd3b7 100644
> >>>>>>> --- a/arch/loongarch/kernel/acpi.c
> >>>>>>> +++ b/arch/loongarch/kernel/acpi.c
> >>>>>>> @@ -311,6 +311,8 @@ static int __ref acpi_map_cpu2node(acpi_handle
> >>>>>>> handle, int cpu, int physid)
> >>>>>>>
> >>>>>>>             nid = acpi_get_node(handle);
> >>>>>>>             if (nid != NUMA_NO_NODE) {
> >>>>>>> +               __cpu_number_map[physid] = cpu;
> >>>>>>> +               __cpu_logical_map[cpu] = physid;
> >>>>>> This does not solve the problem. The above has been done in function
> >>>>>> cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);
> >>>>>>
> >>>>>> static int set_processor_mask(u32 id, u32 flags)
> >>>>>> {
> >>>>>> ...
> >>>>>>             if (flags & ACPI_MADT_ENABLED) {
> >>>>>>                     num_processors++;
> >>>>>>                     set_cpu_present(cpu, true);
> >>>>>>                     __cpu_number_map[cpuid] = cpu;
> >>>>>>                     __cpu_logical_map[cpu] = cpuid;
> >>>>>>             }
> >>>>>>
> >>>>>> The problem is that
> >>>>>>             smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
> >>>>>> <<<<<<<<<<<<<<<<
> >>>>>> set_cpu_numa_node() is called in function smp_prepare_boot_cpu()
> >>>>>>
> >>>>>>             early_numa_node_init();
> >>>>>>
> >>>>>> static void __init early_numa_node_init(void)
> >>>>>> {
> >>>>>> #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
> >>>>>> #ifndef cpu_to_node
> >>>>>>             int cpu;
> >>>>>>
> >>>>>>             /* The early_cpu_to_node() should be ready here. */
> >>>>>>             for_each_possible_cpu(cpu)
> >>>>>>                     set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
> >>>>>> <<<<<<<<<<<<<<<<
> >>>>>> * however here early_cpu_to_node is -1, so that cpu_to_node(cpu) will
> >>>>>> always return -1 in late. *, which causes cpu hotadd problem.
> >>>>> Still confused. For ACPI_MADT_ENABLED cpus, everything is right after
> >>>>> early_numa_node_init(). For !ACPI_MADT_ENABLED cpus, cpu_to_node()
> >>>>> returns -1 after early_numa_node_init() and before hot-add, but if
> >>>>> acpi_map_cpu() do things right, cpu_to_node() should still work well
> >>>>> after hot-add.
> >>>> yes, if "_PXM" information for hot-add cpu handle exist, it works well.
> >>>>
> >>>> However if "_PXM" information does not exist, it falls back to legacy
> >>>> method from smp_prepare_boot_cpu(). However cpu_numa_node information is
> >>>> overwritten with -1 by later function early_numa_node_init().
> >>> OK, now I finally get the key point. But no _PXM should be treated as
> >>> a BIOS bug, right?
> >> Currently if no numa information is added in qemu command line, there
> >> will be no "_PXM" information for hot-added cpu. Such as for this command:
> >>     qemu-system-loongarch64 -m 4096 -smp
> >> 1,maxcpus=4,sockets=1,cores=4,threads=1
> >>>
> >>>   From comments we can see:
> >>>
> >>>                    * If possible cpus > present cpus here (e.g. some possible
> >>>                    * cpus will be added by cpu-hotplug later), for possible but
> >>>                    * not present cpus, early_cpu_to_node will return NUMA_NO_NODE,
> >>>                    * and we just map them to online nodes in round-robin way.
> >>>                    * Once hotplugged, new correct mapping will be built for them.
> >>>
> >>> This means even with this patch, cpu_to_node() can return a "valid"
> >>> node rather than NUMA_NO_NODE, but this round-robin node is still an
> >>> incorrect node.
> >> The round-robin node is not standard, may it is copied from x86, I do
> >> not know how to use it however. At least SRAT tables provides numa
> >> information only that there is not logical cpu allocated in SRAT table
> >> parsing. How about something like this?
> > This looks better, but maybe Jianmin has a best solution?
> >
> >>
> >> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
> >> index f1a74b80f22c..bb9fdd318998 100644
> >> --- a/arch/loongarch/kernel/acpi.c
> >> +++ b/arch/loongarch/kernel/acpi.c
> >> @@ -310,6 +310,12 @@ static int __ref acpi_map_cpu2node(acpi_handle
> >> handle, int cpu, int physid)
> >>           int nid;
> >>
> >>           nid = acpi_get_node(handle);
> >> +       /*
> >> +        * Fall back to srat numa node information if _PXM is not provided
> > The comments may be wrong? I think SRAT node information is equal to _PXM.
>
> "_PXM" information does not belong to SRAT table, it is in DSDT CPU
> device table.
> For numa system "_PXM" information is required, however for uma system
> "_PXM" information is not required since there is only one memory node.
>
> X86 linux uses SRAT table for present cpus and _PXM information in cpu
> device for hot-added cpus. For uma system, _PXM is absent and rr method
> is used for online nodes, its value is correct still since there is only
> one memory node.
OK, then you can send this modification as V2, and we can wait for
Jianmin's comments.

Huacai

>
> Regards
> Bibo Mao
>
> >> +        */
> >> +       if (nid != NUMA_NO_NODE)
> >> +               nid = __cpuid_to_node[physid];
> >> +
> >>           if (nid != NUMA_NO_NODE) {
> > Maybe this condition can be removed now?
> >
> > Huacai
> >
> >>                   set_cpuid_to_node(physid, nid);
> >>                   node_set(nid, numa_nodes_parsed);
> >>
> >> Regards
> >> Bibo Mao
> >>>
> >>> Huacai
> >>>
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>>
> >>>>> Huacai
> >>>>>>
> >>>>>> Regards
> >>>>>> Bibo Mao
> >>>>>>
> >>>>>>
> >>>>>>>                     set_cpuid_to_node(physid, nid);
> >>>>>>>                     node_set(nid, numa_nodes_parsed);
> >>>>>>>                     set_cpu_numa_node(cpu, nid);
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>
> >>>>>>>>              if (physid < 0)
> >>>>>>>>                      return NUMA_NO_NODE;
> >>>>>>>>
> >>>>>>>>              return __cpuid_to_node[physid];
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> Bibo Mao
> >>>>>>>>>
> >>>>>>>>> Huacai
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Here static array __cpu_to_node and api set_early_cpu_to_node()
> >>>>>>>>>> is added, so that early_cpu_to_node is consistent with function
> >>>>>>>>>> cpu_to_node() for hot-added cpu.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
> >>>>>>>>>> ---
> >>>>>>>>>>       arch/loongarch/include/asm/numa.h |  2 ++
> >>>>>>>>>>       arch/loongarch/kernel/numa.c      | 10 +++++++++-
> >>>>>>>>>>       arch/loongarch/kernel/smp.c       |  1 +
> >>>>>>>>>>       3 files changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/numa.h b/arch/loongarch/include/asm/numa.h
> >>>>>>>>>> index b5f9de9f102e..e8e6fcfb006a 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/numa.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/numa.h
> >>>>>>>>>> @@ -50,6 +50,7 @@ static inline void set_cpuid_to_node(int cpuid, s16 node)
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>>       extern int early_cpu_to_node(int cpu);
> >>>>>>>>>> +extern void set_early_cpu_to_node(int cpu, s16 node);
> >>>>>>>>>>
> >>>>>>>>>>       #else
> >>>>>>>>>>
> >>>>>>>>>> @@ -57,6 +58,7 @@ static inline void early_numa_add_cpu(int cpuid, s16 node)    { }
> >>>>>>>>>>       static inline void numa_add_cpu(unsigned int cpu)              { }
> >>>>>>>>>>       static inline void numa_remove_cpu(unsigned int cpu)           { }
> >>>>>>>>>>       static inline void set_cpuid_to_node(int cpuid, s16 node)      { }
> >>>>>>>>>> +static inline void set_early_cpu_to_node(int cpu, s16 node)    { }
> >>>>>>>>>>
> >>>>>>>>>>       static inline int early_cpu_to_node(int cpu)
> >>>>>>>>>>       {
> >>>>>>>>>> diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c
> >>>>>>>>>> index 84fe7f854820..62508aace644 100644
> >>>>>>>>>> --- a/arch/loongarch/kernel/numa.c
> >>>>>>>>>> +++ b/arch/loongarch/kernel/numa.c
> >>>>>>>>>> @@ -34,6 +34,9 @@ static struct numa_meminfo numa_meminfo;
> >>>>>>>>>>       cpumask_t cpus_on_node[MAX_NUMNODES];
> >>>>>>>>>>       cpumask_t phys_cpus_on_node[MAX_NUMNODES];
> >>>>>>>>>>       EXPORT_SYMBOL(cpus_on_node);
> >>>>>>>>>> +static s16 __cpu_to_node[NR_CPUS] = {
> >>>>>>>>>> +       [0 ... CONFIG_NR_CPUS - 1] = NUMA_NO_NODE
> >>>>>>>>>> +};
> >>>>>>>>>>
> >>>>>>>>>>       /*
> >>>>>>>>>>        * apicid, cpu, node mappings
> >>>>>>>>>> @@ -117,11 +120,16 @@ int early_cpu_to_node(int cpu)
> >>>>>>>>>>              int physid = cpu_logical_map(cpu);
> >>>>>>>>>>
> >>>>>>>>>>              if (physid < 0)
> >>>>>>>>>> -               return NUMA_NO_NODE;
> >>>>>>>>>> +               return __cpu_to_node[cpu];
> >>>>>>>>>>
> >>>>>>>>>>              return __cpuid_to_node[physid];
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>> +void set_early_cpu_to_node(int cpu, s16 node)
> >>>>>>>>>> +{
> >>>>>>>>>> +       __cpu_to_node[cpu] = node;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       void __init early_numa_add_cpu(int cpuid, s16 node)
> >>>>>>>>>>       {
> >>>>>>>>>>              int cpu = __cpu_number_map[cpuid];
> >>>>>>>>>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> >>>>>>>>>> index 9afc2d8b3414..998668be858c 100644
> >>>>>>>>>> --- a/arch/loongarch/kernel/smp.c
> >>>>>>>>>> +++ b/arch/loongarch/kernel/smp.c
> >>>>>>>>>> @@ -512,6 +512,7 @@ void __init smp_prepare_boot_cpu(void)
> >>>>>>>>>>                              set_cpu_numa_node(cpu, node);
> >>>>>>>>>>                      else {
> >>>>>>>>>>                              set_cpu_numa_node(cpu, rr_node);
> >>>>>>>>>> +                       set_early_cpu_to_node(cpu, rr_node);
> >>>>>>>>>>                              rr_node = next_node_in(rr_node, node_online_map);
> >>>>>>>>>>                      }
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> base-commit: 6485cf5ea253d40d507cd71253c9568c5470cd27
> >>>>>>>>>> --
> >>>>>>>>>> 2.39.3
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ