[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4172a465-a326-e3fa-ddba-5752a38f7237@loongson.cn>
Date: Wed, 30 Oct 2024 17:10:11 +0800
From: maobibo <maobibo@...ngson.cn>
To: Huacai Chen <chenhuacai@...nel.org>
Cc: Jianmin Lv <lvjianmin@...ngson.cn>, loongarch@...ts.linux.dev,
linux-kernel@...r.kernel.org, lixianglai@...ngson.cn,
WANG Xuerui <kernel@...0n.name>
Subject: Re: [PATCH v2] LoongArch: Fix cpu hotplug issue
On 2024/10/30 下午4:59, Huacai Chen wrote:
> On Wed, Oct 30, 2024 at 4:48 PM maobibo <maobibo@...ngson.cn> wrote:
>>
>>
>>
>> On 2024/10/30 下午4:34, Huacai Chen wrote:
>>> On Wed, Oct 30, 2024 at 4:25 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>
>>>>
>>>>
>>>> On 2024/10/30 下午4:12, Huacai Chen wrote:
>>>>> On Tue, Oct 29, 2024 at 7:49 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2024/10/29 下午6:36, Huacai Chen wrote:
>>>>>>> On Mon, Oct 28, 2024 at 8:38 PM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>>>
>>>>>>>> Hi Huacai,
>>>>>>>>
>>>>>>>> On 2024/10/22 上午9:31, Huacai Chen wrote:
>>>>>>>>> On Tue, Oct 22, 2024 at 9:17 AM maobibo <maobibo@...ngson.cn> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2024/10/21 下午10:32, Huacai Chen wrote:
>>>>>>>>>>> Hi, Bibo,
>>>>>>>>>>>
>>>>>>>>>>> This version still doesn't touch the round-robin method, but it
>>>>>>>>>>> doesn't matter, I think I misunderstood something since V1...
>>>>>>>>>> I do not understand why round-robin method need be modified, SRAT may be
>>>>>>>>>> disabled with general function disable_srat(). Then round-robin method
>>>>>>>>>> is required.
>>>>>>>>> I don't mean round-robin should be modified, I mean I misunderstand round-robin.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please correct me if I'm wrong: For cpus without ACPI_MADT_ENABLED, in
>>>>>>>>>>> smp_prepare_boot_cpu() the round-robin node ids only apply to
>>>>>>>>>>> cpu_to_node(), but __cpuid_to_node[] still record the right node ids.
>>>>>>>>>>> early_cpu_to_node() returns NUMA_NO_NODE not because
>>>>>>>>>>> __cpuid_to_node[] records NUMA_NO_NODE, but because cpu_logical_map()
>>>>>>>>>>> < 0.
>>>>>>>>>>>
>>>>>>>>>>> If the above is correct, we don't need so complicated, because the
>>>>>>>>>>> correct and simplest way is:
>>>>>>>>>>> https://lore.kernel.org/loongarch/6b2b3e89-5a46-2d20-3dfb-7aae33839f49@loongson.cn/T/#m950eead5250e5992cc703bbe69622348cecfa465
>>>>>>>>>>>
>>>>>>>>>> It works also. Only that LoongArch kernel parsing about SRAT/MADT is
>>>>>>>>>> badly. If you do not mind, I do not mind neither. It is not my duty for
>>>>>>>>>> kernel side.
>>>>>>>>> Yes, I don't mind, please use that simplest way.
>>>>>>>> There is another problem with the simple way. eiointc reports error when
>>>>>>>> cpu is online. The error message is:
>>>>>>>> Loongson-64bit Processor probed (LA464 Core)
>>>>>>>> CPU2 revision is: 0014c010 (Loongson-64bit)
>>>>>>>> FPU2 revision is: 00000001
>>>>>>>> eiointc: Error: invalid nodemap!
>>>>>>>> CPU 2 UP state irqchip/loongarch/eiointc:starting (100) failed (-1)
>>>>>>>>
>>>>>>>> The problem is that node_map of eiointc is problematic,
>>>>>>>>
>>>>>>>>
>>>>>>>> static int cpu_to_eio_node(int cpu)
>>>>>>>> {
>>>>>>>> return cpu_logical_map(cpu) / CORES_PER_EIO_NODE;
>>>>>>>> }
>>>>>>>>
>>>>>>>> static int __init eiointc_init(struct eiointc_priv *priv, int parent_irq,
>>>>>>>> u64 node_map)
>>>>>>>> {
>>>>>>>> int i;
>>>>>>>>
>>>>>>>> node_map = node_map ? node_map : -1ULL;
>>>>>>>> for_each_possible_cpu(i) {
>>>>>>>> if (node_map & (1ULL << (cpu_to_eio_node(i)))) {
>>>>>>>> node_set(cpu_to_eio_node(i), priv->node_map);
>>>>>>>> ...
>>>>>>>> The cause is that for possible not present cpu, *cpu_logical_map(cpu)*
>>>>>>>> is -1, cpu_to_eio_node(i) will be equal to -1, so node_map of eiointc is
>>>>>>>> problematic.
>>>>>>> The error message seems from eiointc_router_init(), but it is a little
>>>>>>> strange. Physical hot-add should be before logical hot-add. So
>>>>>>> acpi_map_cpu() is before cpu_up(). acpi_map_cpu() calls
>>>>>>> set_processor_mask() to setup logical-physical mapping, so in
>>>>>>> eiointc_router_init() which is called by cpu_up(), cpu_logical_map()
>>>>>>> should work well.
>>>>>>>
>>>>>>> Maybe in your case a whole node is hot-added? I don't think the
>>>>>>> eiointc design can work with this case...
>>>>>>>
>>>>>>>>
>>>>>>>> So cpu_logical_map(cpu) should be set during MADT parsing even if it is
>>>>>>>> not enabled at beginning, it should not be set at hotplug runtime.
>>>>>>> This will cause the logical cpu number be not continuous after boot.
>>>>>>> Physical numbers have no requirement, but logical numbers should be
>>>>>>> continuous.
>>>>>> I do not understand such requirement about logical cpu should be
>>>>>> continuous. You can check logical cpu allocation method on other
>>>>>> architectures, or what does the requirement about logical cpu continuous
>>>>>> come from.
>>>>> 1, In an internal conference, it is said that non-continuous cpu
>>>>> numbers make users think our processors have bugs.
>>>>> 2, See prefill_possible_map(), it assumes logical numbers continuous
>>>>> cpu_possible_mask and cpu_present_mask, which make it convenient for
>>>>> "nr_cpus=xxx".
>>>>> 3, Can you show me an example in a real machine that "processor" in
>>>>> /proc/cpuinfo non-continues after boot and before soft hotplug?
>>>> It is really wasting my time to discuss with you. You does not
>>>> investigating implementation of other architectures, fully thinking in
>>>> yourself way.
>>> Totally wrong, I have implemented what you need, but you should make
>>> other colleagues (not me) agree with your idea.
>>> https://github.com/chenhuacai/linux/commit/d8dcf2844d5878b3ac5a42d074e781fe2ebfbae7
>> So do you mean we should internal discuss inside and post outside? You
>> can not decide this since you do not know. And actual code writer (lv
>> jianjin) does not reply to you still :(
> "Non-continuous number is unacceptable" is an internal decision, if
> you want to break this decision, you should let other colleagues
> agree, otherwise that is the real thing wastes your time.
If so I will not touch any kernel code any more, please write the patch
to fix it to keep your maintainer position stable. Is that ok for you?
>
>>
>>>
>>> Imagine that the cpu_possible_mask is 0b11111111, cpu_present_mask is
>>> 0b10101010 (non-continuous), how to make "nr_cpus=3" work in a simple
>>> way?
>> if (bitmap_weight(cpu_present_mask) >= nr_cpus))
>> then new cpu fails to add.
> No, I means cpu_possible_mask = 0b11111111, cpu_present_mask =
> 0b10101010 after MADT parsing, if you need to make "nr_cpus=3" work,
> you should modify them to cpu_possible_mask = 0b10101000,
> cpu_present_mask = 0b10101000 before smp_init(). But it is difficult
> to implement the modification.
>
>>>
>>>>
>>>> Does the real machines support real cpu hotplug and memory hotplug?
>>> ACPI_MADT_ENABLED is designed for virtual machines only?
>> It is the HW board problem, the HW does not support cpu hotplug, neither
>> memory hotplug and PCIE hotplug. HW board does not support.
> So we cannot see non-continuous CPU numbers just because all HW boards
> have problems that don't support CPU hotplug, even on x86?
>
> Huacai
>
>>
>> Regards
>> Bibo Mao
>>
>>>
>>> Huacai
>>>
>>>>
>>>> Regards
>>>> Bibo Mao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Huacai
>>>>>>
>>>>>> Regards
>>>>>> Bibo Mao
>>>>>>
>>>>>>>
>>>>>>> Huacai
>>>>>>>
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Bibo Mao
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Huacai
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Bibo Mao
>>>>>>>>>>>
>>>>>>>>>>> Huacai
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 21, 2024 at 4:04 PM Bibo Mao <maobibo@...ngson.cn> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On LoongArch system, there are two places to set cpu numa node. One
>>>>>>>>>>>> is in arch specified function smp_prepare_boot_cpu(), the other is
>>>>>>>>>>>> in generic function early_numa_node_init(). The latter will overwrite
>>>>>>>>>>>> the numa node information.
>>>>>>>>>>>>
>>>>>>>>>>>> With hot-added cpu without numa information, cpu_logical_map() fails
>>>>>>>>>>>> to its physical cpuid at beginning since it is not enabled in ACPI
>>>>>>>>>>>> MADT table. So function early_cpu_to_node() also fails to get its
>>>>>>>>>>>> numa node for hot-added cpu, and generic function
>>>>>>>>>>>> early_numa_node_init() will overwrite with incorrect numa node.
>>>>>>>>>>>>
>>>>>>>>>>>> APIs topo_get_cpu() and topo_add_cpu() is added here, like other
>>>>>>>>>>>> architectures logic cpu is allocated when parsing MADT table. When
>>>>>>>>>>>> parsing SRAT table or hot-add cpu, logic cpu is acquired by searching
>>>>>>>>>>>> all allocated logical cpu with matched physical id. It solves such
>>>>>>>>>>>> problems such as:
>>>>>>>>>>>> 1. Boot cpu is not the first entry in MADT table, the first entry
>>>>>>>>>>>> will be overwritten with later boot cpu.
>>>>>>>>>>>> 2. Physical cpu id not presented in MADT table is invalid, in later
>>>>>>>>>>>> SRAT/hot-add cpu parsing, invalid physical cpu detected is added
>>>>>>>>>>>> 3. For hot-add cpu, its logic cpu is allocated in MADT table parsing,
>>>>>>>>>>>> so early_cpu_to_node() can be used for hot-add cpu and cpu_to_node()
>>>>>>>>>>>> is correct for hot-add cpu.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Bibo Mao <maobibo@...ngson.cn>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v1 ... v2:
>>>>>>>>>>>> 1. Like other architectures, allocate logic cpu when parsing MADT table.
>>>>>>>>>>>> 2. Add invalid or duplicated physical cpuid parsing with SRAT table or
>>>>>>>>>>>> hot-add cpu DSDT information.
>>>>>>>>>>>> ---
>>>>>>>>>>>> arch/loongarch/include/asm/smp.h | 3 ++
>>>>>>>>>>>> arch/loongarch/kernel/acpi.c | 24 ++++++++++------
>>>>>>>>>>>> arch/loongarch/kernel/setup.c | 47 ++++++++++++++++++++++++++++++++
>>>>>>>>>>>> arch/loongarch/kernel/smp.c | 9 +++---
>>>>>>>>>>>> 4 files changed, 70 insertions(+), 13 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/arch/loongarch/include/asm/smp.h b/arch/loongarch/include/asm/smp.h
>>>>>>>>>>>> index 3383c9d24e94..c61b75937a77 100644
>>>>>>>>>>>> --- a/arch/loongarch/include/asm/smp.h
>>>>>>>>>>>> +++ b/arch/loongarch/include/asm/smp.h
>>>>>>>>>>>> @@ -119,4 +119,7 @@ static inline void __cpu_die(unsigned int cpu)
>>>>>>>>>>>> #define cpu_logical_map(cpu) 0
>>>>>>>>>>>> #endif /* CONFIG_SMP */
>>>>>>>>>>>>
>>>>>>>>>>>> +int topo_add_cpu(int physid);
>>>>>>>>>>>> +int topo_get_cpu(int physid);
>>>>>>>>>>>> +
>>>>>>>>>>>> #endif /* __ASM_SMP_H */
>>>>>>>>>>>> diff --git a/arch/loongarch/kernel/acpi.c b/arch/loongarch/kernel/acpi.c
>>>>>>>>>>>> index f1a74b80f22c..84d9812d5f38 100644
>>>>>>>>>>>> --- a/arch/loongarch/kernel/acpi.c
>>>>>>>>>>>> +++ b/arch/loongarch/kernel/acpi.c
>>>>>>>>>>>> @@ -78,10 +78,10 @@ static int set_processor_mask(u32 id, u32 flags)
>>>>>>>>>>>> return -ENODEV;
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>> - if (cpuid == loongson_sysconf.boot_cpu_id)
>>>>>>>>>>>> - cpu = 0;
>>>>>>>>>>>> - else
>>>>>>>>>>>> - cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS);
>>>>>>>>>>>> +
>>>>>>>>>>>> + cpu = topo_add_cpu(cpuid);
>>>>>>>>>>>> + if (cpu < 0)
>>>>>>>>>>>> + return -EEXIST;
>>>>>>>>>>>>
>>>>>>>>>>>> if (!cpu_enumerated)
>>>>>>>>>>>> set_cpu_possible(cpu, true);
>>>>>>>>>>>> @@ -203,8 +203,6 @@ void __init acpi_boot_table_init(void)
>>>>>>>>>>>> goto fdt_earlycon;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> - loongson_sysconf.boot_cpu_id = read_csr_cpuid();
>>>>>>>>>>>> -
>>>>>>>>>>>> /*
>>>>>>>>>>>> * Process the Multiple APIC Description Table (MADT), if present
>>>>>>>>>>>> */
>>>>>>>>>>>> @@ -257,7 +255,7 @@ void __init numa_set_distance(int from, int to, int distance)
>>>>>>>>>>>> void __init
>>>>>>>>>>>> acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
>>>>>>>>>>>> {
>>>>>>>>>>>> - int pxm, node;
>>>>>>>>>>>> + int pxm, node, cpu;
>>>>>>>>>>>>
>>>>>>>>>>>> if (srat_disabled())
>>>>>>>>>>>> return;
>>>>>>>>>>>> @@ -286,6 +284,11 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
>>>>>>>>>>>> return;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> + cpu = topo_get_cpu(pa->apic_id);
>>>>>>>>>>>> + /* Check whether apic_id exists in MADT table */
>>>>>>>>>>>> + if (cpu < 0)
>>>>>>>>>>>> + return;
>>>>>>>>>>>> +
>>>>>>>>>>>> early_numa_add_cpu(pa->apic_id, node);
>>>>>>>>>>>>
>>>>>>>>>>>> set_cpuid_to_node(pa->apic_id, node);
>>>>>>>>>>>> @@ -324,12 +327,17 @@ int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu
>>>>>>>>>>>> {
>>>>>>>>>>>> int cpu;
>>>>>>>>>>>>
>>>>>>>>>>>> - cpu = set_processor_mask(physid, ACPI_MADT_ENABLED);
>>>>>>>>>>>> + cpu = topo_get_cpu(physid);
>>>>>>>>>>>> + /* Check whether apic_id exists in MADT table */
>>>>>>>>>>>> if (cpu < 0) {
>>>>>>>>>>>> pr_info(PREFIX "Unable to map lapic to logical cpu number\n");
>>>>>>>>>>>> return cpu;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> + num_processors++;
>>>>>>>>>>>> + set_cpu_present(cpu, true);
>>>>>>>>>>>> + __cpu_number_map[physid] = cpu;
>>>>>>>>>>>> + __cpu_logical_map[cpu] = physid;
>>>>>>>>>>>> acpi_map_cpu2node(handle, cpu, physid);
>>>>>>>>>>>>
>>>>>>>>>>>> *pcpu = cpu;
>>>>>>>>>>>> diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
>>>>>>>>>>>> index 00e307203ddb..649e98640076 100644
>>>>>>>>>>>> --- a/arch/loongarch/kernel/setup.c
>>>>>>>>>>>> +++ b/arch/loongarch/kernel/setup.c
>>>>>>>>>>>> @@ -65,6 +65,8 @@ EXPORT_SYMBOL(cpu_data);
>>>>>>>>>>>>
>>>>>>>>>>>> struct loongson_board_info b_info;
>>>>>>>>>>>> static const char dmi_empty_string[] = " ";
>>>>>>>>>>>> +static int possible_cpus;
>>>>>>>>>>>> +static bool bsp_added;
>>>>>>>>>>>>
>>>>>>>>>>>> /*
>>>>>>>>>>>> * Setup information
>>>>>>>>>>>> @@ -346,10 +348,55 @@ static void __init bootcmdline_init(char **cmdline_p)
>>>>>>>>>>>> *cmdline_p = boot_command_line;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> +int topo_get_cpu(int physid)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + int i;
>>>>>>>>>>>> +
>>>>>>>>>>>> + for (i = 0; i < possible_cpus; i++)
>>>>>>>>>>>> + if (cpu_logical_map(i) == physid)
>>>>>>>>>>>> + break;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (i == possible_cpus)
>>>>>>>>>>>> + return -ENOENT;
>>>>>>>>>>>> +
>>>>>>>>>>>> + return i;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +int topo_add_cpu(int physid)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + int cpu;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (!bsp_added && (physid == loongson_sysconf.boot_cpu_id)) {
>>>>>>>>>>>> + bsp_added = true;
>>>>>>>>>>>> + return 0;
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> + cpu = topo_get_cpu(physid);
>>>>>>>>>>>> + if (cpu >= 0) {
>>>>>>>>>>>> + pr_warn("Adding duplicated physical cpuid 0x%x\n", physid);
>>>>>>>>>>>> + return -EEXIST;
>>>>>>>>>>>> + }
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (possible_cpus >= nr_cpu_ids)
>>>>>>>>>>>> + return -ERANGE;
>>>>>>>>>>>> +
>>>>>>>>>>>> + __cpu_logical_map[possible_cpus] = physid;
>>>>>>>>>>>> + cpu = possible_cpus++;
>>>>>>>>>>>> + return cpu;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void __init topo_init(void)
>>>>>>>>>>>> +{
>>>>>>>>>>>> + loongson_sysconf.boot_cpu_id = read_csr_cpuid();
>>>>>>>>>>>> + __cpu_logical_map[0] = loongson_sysconf.boot_cpu_id;
>>>>>>>>>>>> + possible_cpus++;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> void __init platform_init(void)
>>>>>>>>>>>> {
>>>>>>>>>>>> arch_reserve_vmcore();
>>>>>>>>>>>> arch_reserve_crashkernel();
>>>>>>>>>>>> + topo_init();
>>>>>>>>>>>>
>>>>>>>>>>>> #ifdef CONFIG_ACPI
>>>>>>>>>>>> acpi_table_upgrade();
>>>>>>>>>>>> diff --git a/arch/loongarch/kernel/smp.c b/arch/loongarch/kernel/smp.c
>>>>>>>>>>>> index 9afc2d8b3414..a3f466b89179 100644
>>>>>>>>>>>> --- a/arch/loongarch/kernel/smp.c
>>>>>>>>>>>> +++ b/arch/loongarch/kernel/smp.c
>>>>>>>>>>>> @@ -291,10 +291,9 @@ static void __init fdt_smp_setup(void)
>>>>>>>>>>>> if (cpuid >= nr_cpu_ids)
>>>>>>>>>>>> continue;
>>>>>>>>>>>>
>>>>>>>>>>>> - if (cpuid == loongson_sysconf.boot_cpu_id)
>>>>>>>>>>>> - cpu = 0;
>>>>>>>>>>>> - else
>>>>>>>>>>>> - cpu = find_first_zero_bit(cpumask_bits(cpu_present_mask), NR_CPUS);
>>>>>>>>>>>> + cpu = topo_add_cpu(cpuid);
>>>>>>>>>>>> + if (cpu < 0)
>>>>>>>>>>>> + continue;
>>>>>>>>>>>>
>>>>>>>>>>>> num_processors++;
>>>>>>>>>>>> set_cpu_possible(cpu, true);
>>>>>>>>>>>> @@ -302,7 +301,7 @@ static void __init fdt_smp_setup(void)
>>>>>>>>>>>> __cpu_number_map[cpuid] = cpu;
>>>>>>>>>>>> __cpu_logical_map[cpu] = cpuid;
>>>>>>>>>>>>
>>>>>>>>>>>> - early_numa_add_cpu(cpu, 0);
>>>>>>>>>>>> + early_numa_add_cpu(cpuid, 0);
>>>>>>>>>>>> set_cpuid_to_node(cpuid, 0);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> base-commit: 42f7652d3eb527d03665b09edac47f85fb600924
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.39.3
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>
Powered by blists - more mailing lists