lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhV-H4tJJqhJcrERJRVHg_FW4OOEV-OuHkX6PTP9L_ADKgfDg@mail.gmail.com>
Date: Wed, 30 Oct 2024 16:59:56 +0800
From: Huacai Chen <chenhuacai@...nel.org>
To: maobibo <maobibo@...ngson.cn>
Cc: Jianmin Lv <lvjianmin@...ngson.cn>, loongarch@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, lixianglai@...ngson.cn, 
	WANG Xuerui <kernel@...0n.name>
Subject: Re: [PATCH v2] LoongArch: Fix cpu hotplug issue

On Wed, Oct 30, 2024 at 4:48 PM maobibo <maobibo@...ngson.cn> wrote:
>
>
>
> On 2024/10/30 下午4:34, Huacai Chen wrote:
> > On Wed, Oct 30, 2024 at 4:25 PM maobibo <maobibo@...ngson.cn> wrote:
> >>
> >>
> >>
> >> On 2024/10/30 下午4:12, Huacai Chen wrote:
> >>> On Tue, Oct 29, 2024 at 7:49 PM maobibo <maobibo@...ngson.cn> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2024/10/29 下午6:36, Huacai Chen wrote:
> >>>>> On Mon, Oct 28, 2024 at 8:38 PM maobibo <maobibo@...ngson.cn> wrote:
> >>>>>>
> >>>>>> Hi Huacai,
> >>>>>>
> >>>>>> On 2024/10/22 上午9:31, Huacai Chen wrote:
> >>>>>>> On Tue, Oct 22, 2024 at 9:17 AM maobibo <maobibo@...ngson.cn> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2024/10/21 下午10:32, Huacai Chen wrote:
> >>>>>>>>> Hi, Bibo,
> >>>>>>>>>
> >>>>>>>>> This version still doesn't touch the round-robin method, but it
> >>>>>>>>> doesn't matter, I think I misunderstood something since V1...
> >>>>>>>> I do not understand why round-robin method need be modified, SRAT may be
> >>>>>>>> disabled with general function disable_srat(). Then round-robin method
> >>>>>>>> is required.
> >>>>>>> I don't mean round-robin should be modified, I mean I misunderstand round-robin.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Please correct me if I'm wrong: For cpus without ACPI_MADT_ENABLED, in
> >>>>>>>>> smp_prepare_boot_cpu() the round-robin node ids only apply to
> >>>>>>>>> cpu_to_node(), but __cpuid_to_node[] still record the right node ids.
> >>>>>>>>> early_cpu_to_node() returns NUMA_NO_NODE not because
> >>>>>>>>> __cpuid_to_node[] records NUMA_NO_NODE, but because cpu_logical_map()
> >>>>>>>>> < 0.
> >>>>>>>>>
> >>>>>>>>> If the above is correct, we don't need so complicated, because the
> >>>>>>>>> correct and simplest way is:
> >>>>>>>>> https://lore.kernel.org/loongarch/6b2b3e89-5a46-2d20-3dfb-7aae33839f49@loongson.cn/T/#m950eead5250e5992cc703bbe69622348cecfa465
> >>>>>>>>>
> >>>>>>>> It works also. Only that LoongArch kernel parsing about SRAT/MADT is
> >>>>>>>> badly. If you do not mind, I do not mind neither. It is not my duty for
> >>>>>>>> kernel side.
> >>>>>>> Yes, I don't mind, please use that simplest way.
> >>>>>> There is another problem with the simple way. eiointc reports error when
> >>>>>> cpu is online. The error message is:
> >>>>>>       Loongson-64bit Processor probed (LA464 Core)
> >>>>>>       CPU2 revision is: 0014c010 (Loongson-64bit)
> >>>>>>       FPU2 revision is: 00000001
> >>>>>>       eiointc: Error: invalid nodemap!
> >>>>>>       CPU 2 UP state irqchip/loongarch/eiointc:starting (100) failed (-1)
> >>>>>>
> >>>>>> The problem is that node_map of eiointc is problematic,
> >>>>>>
> >>>>>>
> >>>>>> static int cpu_to_eio_node(int cpu)
> >>>>>> {
> >>>>>>             return cpu_logical_map(cpu) / CORES_PER_EIO_NODE;
> >>>>>> }
> >>>>>>
> >>>>>> static int __init eiointc_init(struct eiointc_priv *priv, int parent_irq,
> >>>>>>                                    u64 node_map)
> >>>>>> {
> >>>>>>             int i;
> >>>>>>
> >>>>>>             node_map = node_map ? node_map : -1ULL;
> >>>>>>             for_each_possible_cpu(i) {
> >>>>>>                     if (node_map & (1ULL << (cpu_to_eio_node(i)))) {
> >>>>>>                             node_set(cpu_to_eio_node(i), priv->node_map);
> >>>>>>              ...
> >>>>>> The cause is that for possible not present cpu, *cpu_logical_map(cpu)*
> >>>>>> is -1, cpu_to_eio_node(i) will be equal to -1, so node_map of eiointc is
> >>>>>> problematic.
> >>>>> The error message seems from eiointc_router_init(), but it is a little
> >>>>> strange. Physical hot-add should be before logical hot-add. So
> >>>>> acpi_map_cpu() is before cpu_up(). acpi_map_cpu() calls
> >>>>> set_processor_mask() to setup logical-physical mapping, so in
> >>>>> eiointc_router_init() which is called by cpu_up(), cpu_logical_map()
> >>>>> should work well.
> >>>>>
> >>>>> Maybe in your case a whole node is hot-added? I don't think the
> >>>>> eiointc design can work with this case...
> >>>>>
> >>>>>>
> >>>>>> So cpu_logical_map(cpu) should be set during MADT parsing even if it is
> >>>>>> not enabled at beginning, it should not be set at hotplug runtime.
> >>>>> This will cause the logical cpu number be not continuous after boot.
> >>>>> Physical numbers have no requirement, but logical numbers should be
> >>>>> continuous.
> >>>> I do not understand such requirement about logical cpu should be
> >>>> continuous. You can check logical cpu allocation method on other
> >>>> architectures, or what does the requirement about logical cpu continuous
> >>>> come from.
> >>> 1, In an internal conference, it is said that non-continuous cpu
> >>> numbers make users think our processors have bugs.
> >>> 2, See prefill_possible_map(), it assumes logical numbers continuous
> >>> cpu_possible_mask and cpu_present_mask, which make it convenient for
> >>> "nr_cpus=xxx".
> >>> 3, Can you show me an example in a real machine that "processor" in
> >>> /proc/cpuinfo non-continues after boot and before soft hotplug?
> >> It is really wasting my time to discuss with you. You does not
> >> investigating implementation of other architectures, fully thinking in
> >> yourself way.
> > Totally wrong, I have implemented what you need, but you should make
> > other colleagues (not me) agree with your idea.
> > https://github.com/chenhuacai/linux/commit/d8dcf2844d5878b3ac5a42d074e781fe2ebfbae7
> So do you mean we should internal discuss inside and post outside? You
> can not decide this since you do not know. And actual code writer (lv
> jianjin) does not reply to you still :(
"Non-continuous number is unacceptable" is an internal decision, if
you want to break this decision, you should let other colleagues
agree, otherwise that is the real thing wastes your time.

>
> >
> > Imagine that the cpu_possible_mask is 0b11111111, cpu_present_mask is
> > 0b10101010 (non-continuous), how to make "nr_cpus=3" work in a simple
> > way?
> if (bitmap_weight(cpu_present_mask) >=  nr_cpus))
>     then new cpu fails to add.
No,  I means cpu_possible_mask = 0b11111111, cpu_present_mask =
0b10101010 after MADT parsing, if you need to make "nr_cpus=3" work,
you should modify them to cpu_possible_mask = 0b10101000,
cpu_present_mask = 0b10101000 before smp_init(). But it is difficult
to implement the modification.

> >
> >>
> >> Does the real machines support real cpu hotplug and memory hotplug?
> > ACPI_MADT_ENABLED is designed for virtual machines only?
> It is the HW board problem, the HW does not support cpu hotplug, neither
> memory hotplug and PCIE hotplug. HW board does not support.
So we cannot see non-continuous CPU numbers just because all HW boards
have problems that don't support CPU hotplug, even on x86?

Huacai

>
> Regards
> Bibo Mao
>
> >
> > Huacai
> >
> >>
> >> Regards
> >> Bibo Mao
> >>>
> >>>
> >>>
> >>>
> >>> Huacai
> >>>>
> >>>> Regards
> >>>> Bibo Mao
> >>>>
> >>>>>
> >>>>> Huacai
> >>>>>
> >>>>>>
> >>>>>> Regards
> >>>>>> Bibo Mao
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> Huacai
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Bibo Mao
> >>>>>>>>>
> >>>>>>>>> Huacai
> >>>>>>>>>
> >>>>>>>>> On Mon, Oct 21, 2024 at 4:04 PM Bibo Mao <maobibo@...ngson.cn> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On LoongArch system, there are two places to set cpu numa node. One
> >>>>>>>>>> is in arch specified function smp_prepare_boot_cpu(), the other is
> >>>>>>>>>> in generic function early_numa_node_init(). The latter will overwrite
> >>>>>>>>>> the numa node information.
> >>>>>>>>>>
> >>>>>>>>>> With hot-added cpu without numa information, cpu_logical_map() fails
> >>>>>>>>>> to its physical cpuid at beginning since it is not enabled in ACPI
> >>>>>>>>>> MADT table. So function early_cpu_to_node() also fails to get its
> >>>>>>>>>> numa node for hot-added cpu, and generic function
> >>>>>>>>>> early_numa_node_init() will overwrite with incorrect numa node.
> >>>>>>>>>>
> >>>>>>>>>> APIs topo_get_cpu() and topo_add_cpu() is added here, like other
> >>>>>>>>>> architectures logic cpu is allocated when parsing MADT table. When
> >>>>>>>>>> parsing SRAT table or hot-add cpu, logic cpu is acquired by searching
> >>>>>>>>>> all allocated logical cpu with matched physical id. It solves such
> >>>>>>>>>> problems such as:
> >>>>>>>>>>        1. Boot cpu is not the first entry in MADT table, the first entry
> >>>>>>>>>> will be overwritten with later boot cpu.
> >>>>>>>>>>        2. Physical cpu id not presented in MADT table is invalid, in later
> >>>>>>>>>> SRAT/hot-add cpu parsing, invalid physical cpu detected is added
> >>>>>>>>>>        3. For hot-add cpu, its logic cpu is allocated in MADT table parsing,
> >>>>>>>>>> so early_cpu_to_node() can be used for hot-add cpu and cpu_to_node()
> >>>>>>>>>> is correct for hot-add cpu.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
> >>>>>>>>>> ---
> >>>>>>>>>> v1 ... v2:
> >>>>>>>>>>        1. Like other architectures, allocate logic cpu when parsing MADT table.
> >>>>>>>>>>        2. Add invalid or duplicated physical cpuid parsing with SRAT table or
> >>>>>>>>>> hot-add cpu DSDT information.
> >>>>>>>>>> ---
> >>>>>>>>>>       arch/loongarch/include/asm/smp.h |  3 ++
> >>>>>>>>>>       arch/loongarch/kernel/acpi.c     | 24 ++++++++++------
> >>>>>>>>>>       arch/loongarch/kernel/setup.c    | 47 ++++++++++++++++++++++++++++++++
> >>>>>>>>>>       arch/loongarch/kernel/smp.c      |  9 +++---
> >>>>>>>>>>       4 files changed, 70 insertions(+), 13 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
> >>>>>>>>>> index 3383c9d24e94..c61b75937a77 100644
> >>>>>>>>>> --- a/arch/loongarch/include/asm/smp.h
> >>>>>>>>>> +++ b/arch/loongarch/include/asm/smp.h
> >>>>>>>>>> @@ -119,4 +119,7 @@ static inline void __cpu_die(unsigned int cpu)
> >>>>>>>>>>       #define cpu_logical_map(cpu)   0
> >>>>>>>>>>       #endif /* CONFIG_SMP */
> >>>>>>>>>>
> >>>>>>>>>> +int topo_add_cpu(int physid);
> >>>>>>>>>> +int topo_get_cpu(int physid);
> >>>>>>>>>> +
> >>>>>>>>>>       #endif /* __ASM_SMP_H */
> >>>>>>>>>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
> >>>>>>>>>> index f1a74b80f22c..84d9812d5f38 100644
> >>>>>>>>>> --- a/arch/loongarch/kernel/acpi.c
> >>>>>>>>>> +++ b/arch/loongarch/kernel/acpi.c
> >>>>>>>>>> @@ -78,10 +78,10 @@ static int set_processor_mask(u32 id, u32 flags)
> >>>>>>>>>>                      return -ENODEV;
> >>>>>>>>>>
> >>>>>>>>>>              }
> >>>>>>>>>> -       if (cpuid == loongson_sysconf.boot_cpu_id)
> >>>>>>>>>> -               cpu = 0;
> >>>>>>>>>> -       else
> >>>>>>>>>> -               cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS);
> >>>>>>>>>> +
> >>>>>>>>>> +       cpu = topo_add_cpu(cpuid);
> >>>>>>>>>> +       if (cpu < 0)
> >>>>>>>>>> +               return -EEXIST;
> >>>>>>>>>>
> >>>>>>>>>>              if (!cpu_enumerated)
> >>>>>>>>>>                      set_cpu_possible(cpu, true);
> >>>>>>>>>> @@ -203,8 +203,6 @@ void __init acpi_boot_table_init(void)
> >>>>>>>>>>                      goto fdt_earlycon;
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> -       loongson_sysconf.boot_cpu_id = read_csr_cpuid();
> >>>>>>>>>> -
> >>>>>>>>>>              /*
> >>>>>>>>>>               * Process the Multiple APIC Description Table (MADT), if present
> >>>>>>>>>>               */
> >>>>>>>>>> @@ -257,7 +255,7 @@ void __init numa_set_distance(int from, int to, int distance)
> >>>>>>>>>>       void __init
> >>>>>>>>>>       acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
> >>>>>>>>>>       {
> >>>>>>>>>> -       int pxm, node;
> >>>>>>>>>> +       int pxm, node, cpu;
> >>>>>>>>>>
> >>>>>>>>>>              if (srat_disabled())
> >>>>>>>>>>                      return;
> >>>>>>>>>> @@ -286,6 +284,11 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
> >>>>>>>>>>                      return;
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> +       cpu = topo_get_cpu(pa->apic_id);
> >>>>>>>>>> +       /* Check whether apic_id exists in MADT table */
> >>>>>>>>>> +       if (cpu < 0)
> >>>>>>>>>> +               return;
> >>>>>>>>>> +
> >>>>>>>>>>              early_numa_add_cpu(pa->apic_id, node);
> >>>>>>>>>>
> >>>>>>>>>>              set_cpuid_to_node(pa->apic_id, node);
> >>>>>>>>>> @@ -324,12 +327,17 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu
> >>>>>>>>>>       {
> >>>>>>>>>>              int cpu;
> >>>>>>>>>>
> >>>>>>>>>> -       cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);
> >>>>>>>>>> +       cpu = topo_get_cpu(physid);
> >>>>>>>>>> +       /* Check whether apic_id exists in MADT table */
> >>>>>>>>>>              if (cpu < 0) {
> >>>>>>>>>>                      pr_info(PREFIX "Unable to map lapic to logical cpu number\n");
> >>>>>>>>>>                      return cpu;
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>> +       num_processors++;
> >>>>>>>>>> +       set_cpu_present(cpu, true);
> >>>>>>>>>> +       __cpu_number_map[physid] = cpu;
> >>>>>>>>>> +       __cpu_logical_map[cpu] = physid;
> >>>>>>>>>>              acpi_map_cpu2node(handle, cpu, physid);
> >>>>>>>>>>
> >>>>>>>>>>              *pcpu = cpu;
> >>>>>>>>>> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> >>>>>>>>>> index 00e307203ddb..649e98640076 100644
> >>>>>>>>>> --- a/arch/loongarch/kernel/setup.c
> >>>>>>>>>> +++ b/arch/loongarch/kernel/setup.c
> >>>>>>>>>> @@ -65,6 +65,8 @@ EXPORT_SYMBOL(cpu_data);
> >>>>>>>>>>
> >>>>>>>>>>       struct loongson_board_info b_info;
> >>>>>>>>>>       static const char dmi_empty_string[] = "        ";
> >>>>>>>>>> +static int possible_cpus;
> >>>>>>>>>> +static bool bsp_added;
> >>>>>>>>>>
> >>>>>>>>>>       /*
> >>>>>>>>>>        * Setup information
> >>>>>>>>>> @@ -346,10 +348,55 @@ static void __init bootcmdline_init(char **cmdline_p)
> >>>>>>>>>>              *cmdline_p = boot_command_line;
> >>>>>>>>>>       }
> >>>>>>>>>>
> >>>>>>>>>> +int topo_get_cpu(int physid)
> >>>>>>>>>> +{
> >>>>>>>>>> +       int i;
> >>>>>>>>>> +
> >>>>>>>>>> +       for (i = 0; i < possible_cpus; i++)
> >>>>>>>>>> +               if (cpu_logical_map(i) == physid)
> >>>>>>>>>> +                       break;
> >>>>>>>>>> +
> >>>>>>>>>> +       if (i == possible_cpus)
> >>>>>>>>>> +               return -ENOENT;
> >>>>>>>>>> +
> >>>>>>>>>> +       return i;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>> +int topo_add_cpu(int physid)
> >>>>>>>>>> +{
> >>>>>>>>>> +       int cpu;
> >>>>>>>>>> +
> >>>>>>>>>> +       if (!bsp_added && (physid == loongson_sysconf.boot_cpu_id)) {
> >>>>>>>>>> +               bsp_added = true;
> >>>>>>>>>> +               return 0;
> >>>>>>>>>> +       }
> >>>>>>>>>> +
> >>>>>>>>>> +       cpu = topo_get_cpu(physid);
> >>>>>>>>>> +       if (cpu >= 0) {
> >>>>>>>>>> +               pr_warn("Adding duplicated physical cpuid 0x%x\n", physid);
> >>>>>>>>>> +               return -EEXIST;
> >>>>>>>>>> +       }
> >>>>>>>>>> +
> >>>>>>>>>> +       if (possible_cpus >= nr_cpu_ids)
> >>>>>>>>>> +               return -ERANGE;
> >>>>>>>>>> +
> >>>>>>>>>> +       __cpu_logical_map[possible_cpus] = physid;
> >>>>>>>>>> +       cpu = possible_cpus++;
> >>>>>>>>>> +       return cpu;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>> +static void __init topo_init(void)
> >>>>>>>>>> +{
> >>>>>>>>>> +       loongson_sysconf.boot_cpu_id = read_csr_cpuid();
> >>>>>>>>>> +       __cpu_logical_map[0] = loongson_sysconf.boot_cpu_id;
> >>>>>>>>>> +       possible_cpus++;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>       void __init platform_init(void)
> >>>>>>>>>>       {
> >>>>>>>>>>              arch_reserve_vmcore();
> >>>>>>>>>>              arch_reserve_crashkernel();
> >>>>>>>>>> +       topo_init();
> >>>>>>>>>>
> >>>>>>>>>>       #ifdef CONFIG_ACPI
> >>>>>>>>>>              acpi_table_upgrade();
> >>>>>>>>>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
> >>>>>>>>>> index 9afc2d8b3414..a3f466b89179 100644
> >>>>>>>>>> --- a/arch/loongarch/kernel/smp.c
> >>>>>>>>>> +++ b/arch/loongarch/kernel/smp.c
> >>>>>>>>>> @@ -291,10 +291,9 @@ static void __init fdt_smp_setup(void)
> >>>>>>>>>>                      if (cpuid >= nr_cpu_ids)
> >>>>>>>>>>                              continue;
> >>>>>>>>>>
> >>>>>>>>>> -               if (cpuid == loongson_sysconf.boot_cpu_id)
> >>>>>>>>>> -                       cpu = 0;
> >>>>>>>>>> -               else
> >>>>>>>>>> -                       cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS);
> >>>>>>>>>> +               cpu = topo_add_cpu(cpuid);
> >>>>>>>>>> +               if (cpu < 0)
> >>>>>>>>>> +                       continue;
> >>>>>>>>>>
> >>>>>>>>>>                      num_processors++;
> >>>>>>>>>>                      set_cpu_possible(cpu, true);
> >>>>>>>>>> @@ -302,7 +301,7 @@ static void __init fdt_smp_setup(void)
> >>>>>>>>>>                      __cpu_number_map[cpuid] = cpu;
> >>>>>>>>>>                      __cpu_logical_map[cpu] = cpuid;
> >>>>>>>>>>
> >>>>>>>>>> -               early_numa_add_cpu(cpu, 0);
> >>>>>>>>>> +               early_numa_add_cpu(cpuid, 0);
> >>>>>>>>>>                      set_cpuid_to_node(cpuid, 0);
> >>>>>>>>>>              }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
> >>>>>>>>>> --
> >>>>>>>>>> 2.39.3
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ