[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 3 Mar 2017 16:32:28 +0800
From: Dou Liyang <douly.fnst@...fujitsu.com>
To: <mingo@...nel.org>, <tglx@...utronix.de>, <hpa@...or.com>,
<rjw@...ysocki.net>, <lenb@...nel.org>, <xiaolong.ye@...el.com>,
<guzheng1@...wei.com>, <izumi.taku@...fujitsu.com>
CC: <x86@...nel.org>, <linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 0/5] Do repair works for the mapping of cpuid <->
nodeid
Hi All,
My Simple Test Result:
In our box: Fujitsu PQ2000 with 1 nodes for hot-plug.
Before the patchset:
+-------------------------------------+
| |
| NUMA node0 CPU: 0-23,256-279 +------+
| NUMA node1 CPU: 24-47,280-303 | |
| | |
+-------------------------------------+ |
Hot-plug
+-------------------------------------+ +
| | |
| NUMA Node0: 0-23, 256-279 <------+
| NUMA Node1: 24-47, 280-303 |
| NUMA Node2: 64|69, 72-77, 80-85, 88-93...
| NUMA Node3: 96-101, 104-109, 112-117,...
| | |
+-------------------------------------+ |
Hot-remove
+-------------------------------------+ |
| | |
| NUMA node0 CPU: 0-23,256-279 | |
| NUMA node1 CPU: 24-47,280-303 +^-----+
| |
| |
+-------------------------------------+
After the patchset:
+-------------------------------------+
| |
| NUMA node0 CPU: 0-23,48-71 +------+
| NUMA node1 CPU: 24-47,72-95 | |
| | |
+-------------------------------------+ |
Hot-plug
+-------------------------------------+ +
| | |
| NUMA node0 CPU: 0-23,48-71 <------+
| NUMA node1 CPU: 24-47,72-95 |
| NUMA node2 CPU: 96-143 +------+
| NUMA node3 CPU: 144-191 | |
| | |
+-------------------------------------+ |
Hot-remove
+-------------------------------------+ |
| | |
| NUMA node0 CPU: 0-23,48-71 | |
| NUMA node1 CPU: 24-47,72-95 +^-----+
| |
| |
+-------------------------------------+
And I also test some cases in VMs with QEmu.
And When I get more nodes, I will test the whole
function.
Thanks,
Liyang.
At 03/03/2017 04:02 PM, Dou Liyang wrote:
> [Summary]:
>
> 1, Revert two commits
> 2, Fix the order of Logical CPU IDs
> 3, Move the validation of processor IDs to hot-plug time.
>
> The mapping of "cpuid <-> nodeid" is established at boot time via ACPI
> tables to keep associations of workqueues and other node related items
> consistent across cpu hotplug as following:
>
> Step 1. Make the "Logical CPU ID <-> Processor ID/UID" fixed Using MADT:
> We generate the logical CPU IDs by the Local APIC/x2APIC IDs orderly and
> get the mapping of Processor ID/UID <-> Local Apic ID directly in MADT.
> So, we get the mapping of
> *Processor ID/UID <-> Local Apic ID <-> Logical CPU ID*
>
> Step 2. Make the "Processor ID/UID <-> Node ID(_PXM)" fixed Using DSDT:
> The maaping of "Processor ID/UID <-> Node ID(_PXM)" is ready-made in
> each entities. we just use it directly.
>
> But, ACPI tables are unreliable and failures with that boot time mapping
> have been reported on machines where the ACPI table and the physical
> information which is retrieved at actual hotplug is inconsistent. Here
> has already two bugs we found:
>
> 1. Duplicated Processor IDs in DSDT.
> It has been fixed by commits:
> '8e089eaa1999 ("acpi: Provide mechanism to validate processors
> in the ACPI tables")' and 'fd74da217df7 ("acpi: Validate processor id
> when mapping the processor")'
>
> 2. The _PXM in DSDT is inconsistent with the one in MADT.
> It may cause the bug, which is shown in:
> https://lkml.org/lkml/2017/2/12/200
>
> And one phenomenon is happened in some specific boxes:
>
> 1. The logical CPU IDs is discrete. Such as:
> Node2: 64-69, 72-77, 80-85, 88-93,...
>
> There may be more strange things happened in the futher. We shouldn't just
> only fix them everytime, we should solve this problem from the source to
> avoid such problems happened again and again.
>
> Find a simple and easy way:
>
> 1. Do the step 1 when the CPU flag is enabled
> 2. Do the step 2 at hot-plug time, not at boot time when we did some
> useless work.
>
> It also can make the mapping of "cpuid <-> nodeid" fixed and avoid
> excessive using of the ACPI tables.
>
> Change log:
> v2 -> v3: 1. rewirte the changelogs
> copy the changelogs Thomas Gleixner <tglx@...utronix.de>
> rewrite for the patch 1,2,4,5.
> 2. s/duplicate_processor_id()/acpi_duplicate_processor_id().
> by Thomas Gleixner <tglx@...utronix.de>'s advice.
> 3. modify the error handle in acpi_processor_ids_walk()
> by Thomas Gleixner <tglx@...utronix.de>'s advice.
> 4. add a new patch for restoring the order of CPU IDs
>
> v1 -> v2: 1. fix some comments.
> 2. add the verification of duplicate processor id.
>
> Dou Liyang (5):
> Revert"x86/acpi: Set persistent cpuid <-> nodeid mapping when booting"
> Revert"x86/acpi: Enable MADT APIs to return disabled apicids"
> x86/acpi: Restore the order of CPU IDs
> acpi/processor: Implement DEVICE operator for processor enumeration
> acpi/processor: Check for duplicate processor ids at hotplug time
>
> arch/x86/kernel/acpi/boot.c | 9 ++-
> arch/x86/kernel/apic/apic.c | 26 +++------
> drivers/acpi/acpi_processor.c | 57 +++++++++++++-----
> drivers/acpi/bus.c | 1 -
> drivers/acpi/processor_core.c | 133 +++++++-----------------------------------
> include/linux/acpi.h | 5 +-
> 6 files changed, 79 insertions(+), 152 deletions(-)
>
Powered by blists - more mailing lists