[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B5731E2.4040207@linux.intel.com>
Date: Thu, 21 Jan 2010 00:40:02 +0800
From: Haicheng Li <haicheng.li@...ux.intel.com>
To: David Rientjes <rientjes@...gle.com>
CC: Yinghai Lu <yinghai@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes
detected by ACPI.
David Rientjes wrote:
> On Tue, 19 Jan 2010, Haicheng Li wrote:
>> David, per my understanding, your concern should be like, with this fix, if
>> 3rd or 4th entry of Node0 has no address range, then Node0 won't be recoverd
>> with oldnode and won't be cleared in nodes_parsed. But how is it handled by
>> old code?
>>
>
> It's not evident with your machine because you do not have two SRAT
> entries for the same node id, one without ACPI_SRAT_MEM_HOT_PLUGGABLE and
> other with ACPI_SRAT_MEM_HOT_PLUGGABLE.
>
> The old code would preserve the address range for the former in oldnode
> and then reset its data in the struct bootnode since nodes_parsed has a
> bit set for that node. That's needed by later code that I've mentioned:
> acpi_get_nodes(), specifically, which breaks with your patch in addition
> to nodes_cover_memory() and e820_register_active_regions().
>
> Only when the previous oldnode entry does not have a valid address range,
> meaning it is [0, 0), does the bit get cleared in nodes_parsed.
Understood, the old code is meant to make nodes_parsed _NEVER_ include the node whose memory regions
are all hotpluggable.
>> - it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so
>> it handles the recover issue. but I think following patch can simply fix it as
>> well.
>>
>
> If it's not ACPI_SRAT_MEM_HOT_PLUGGABLE, we know the address range is
> already valid given the sanity checks that it has successfully passed
> through in acpi_numa_memory_affinity_init(), so we require no further
> checking. However, your patch will not reset the previous address range
> when a ACPI_SRAT_MEM_HOT_PLUGGABLE entry is found for the same address
> range and you're leaving the bit set in nodes_parsed.
I see. the precondition is that nodes_parsed should not include such hotpluggable node, then such
data of hotpluggable mem should be kept in nodes_add[] other than in nodes[].
>>> cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
>>> should be set in nodes_parsed if its corresponding node does not have a
>>> valid address range.
>> For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot
>> handle it.
>>
>
> It most certainly can since its sole purpose is to include memoryless
> nodes in node_possible_map. It has no other use case that would break as
> the result of adding hotpluggable nodes, hence the reason I suggested
> renaming it no_mem_nodes_parsed.
Yeah, so the key point is who should keep hotpluggable nodes, cpu_nodes_parsed or nodes_parsed?
Actually now I agree with you on this, let cpu_nodes_parsed keep hotpluggable nodes since it won't
break any old code. Originally my patch wanna let nodes_parsed keep hotpluggable nodes, which would
make things complex.
but name "no_mem_nodes_parsed" seems convoluted too because (from code logic) this nodemask is
usually based on CPU/APIC Affinity Structure.
How about rename cpu_nodes_parsed as "rest_nodes_parsed" (comparing with "mem_nodes_parsed), since
it handles
- nodes with CPU on
- nodes with hotpluggable memory region
?
>>> We have a reasonable expectation that nodes_parsed represents memory nodes
>>> given its use for e820_register_active_regions() and nodes_cover_memory() as
>>> well as acpi_get_nodes() for NUMA emulation, for example, which would be
>>> broken with this patch. See dc0985519.
>>>
>> either nodes_cover_memory() or e820_register_active_regions() or
>> acpi_get_nodes(), they all have node-addr-range check code, if the
>> node-addr-range is invalid, they won't be harmed.
>>
>
> Wrong, acpi_get_nodes() does not have such a check it only iterates over
> nodes_parsed. In other words, you'd be starting a new requirement for
> nodes_parsed with your patch: it would now be necessary to check for a
> valid (non-zero) address range for each set bit. Instead, I'm suggesting
> the nodes_parsed represents only nodes with valid memory, which is a
> reasonable expectation given the semantics of both it and cpu_nodes_parsed
> to handle their memoryless counterparts.
agreed. In term of this, using nodes_parsed to represent only nodes with valid memory can make
things simple.
> In other words, the following should easily fix the issue without breaking
> the existing logic that preserves the old address range for node ids that
> have SRAT entries both with and without ACPI_SRAT_MEM_HOT_PLUGGABLE.
> Could you give it a try?
of course, it fixes the issue because node_possible_map now includes hotpluggable node, and then
nr_node_ids becomes equal to maximum of possible nodes on the motherboard;).
let's add more changes to fix naming issue as well since it's too confusing for people to understand
the code logic. how about below patch?
---
arch/x86/mm/srat_64.c | 39 +++++++++++++++++++++++++--------------
1 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..aebbdd4 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -27,8 +27,17 @@ int acpi_numa __initdata;
static struct acpi_table_slit *acpi_slit;
-static nodemask_t nodes_parsed __initdata;
-static nodemask_t cpu_nodes_parsed __initdata;
+/* mem_nodes_parsed:
+ * - nodes with memory on
+ *
+ * rest_nodes_parsed:
+ * - nodes with CPU on
+ * - nodes with hotpluggable memory region
+ *
+ * We union these two nodemasks to get node_possible_map.
+ */
+static nodemask_t mem_nodes_parsed __initdata;
+static nodemask_t rest_nodes_parsed __initdata;
static struct bootnode nodes[MAX_NUMNODES] __initdata;
static struct bootnode nodes_add[MAX_NUMNODES];
@@ -134,7 +143,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)
apic_id = pa->apic_id;
apicid_to_node[apic_id] = node;
- node_set(node, cpu_nodes_parsed);
+ node_set(node, rest_nodes_parsed);
acpi_numa = 1;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
pxm, apic_id, node);
@@ -168,7 +177,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
else
apic_id = pa->apic_id;
apicid_to_node[apic_id] = node;
- node_set(node, cpu_nodes_parsed);
+ node_set(node, rest_nodes_parsed);
acpi_numa = 1;
printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
pxm, apic_id, node);
@@ -229,9 +238,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
}
- if (changed)
+ if (changed) {
+ node_set(node, rest_nodes_parsed);
printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
nd->start, nd->end);
+ }
}
/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -278,7 +289,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
}
nd = &nodes[node];
oldnode = *nd;
- if (!node_test_and_set(node, nodes_parsed)) {
+ if (!node_test_and_set(node, mem_nodes_parsed)) {
nd->start = start;
nd->end = end;
} else {
@@ -296,7 +307,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
/* restore nodes[node] */
*nd = oldnode;
if ((nd->start | nd->end) == 0)
- node_clear(node, nodes_parsed);
+ node_clear(node, mem_nodes_parsed);
}
node_memblk_range[num_node_memblks].start = start;
@@ -313,7 +324,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
unsigned long pxmram, e820ram;
pxmram = 0;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
unsigned long s = nodes[i].start >> PAGE_SHIFT;
unsigned long e = nodes[i].end >> PAGE_SHIFT;
pxmram += e - s;
@@ -341,7 +352,7 @@ int __init acpi_get_nodes(struct bootnode *physnodes)
int i;
int ret = 0;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
physnodes[ret].start = nodes[i].start;
physnodes[ret].end = nodes[i].end;
ret++;
@@ -370,7 +381,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
return -1;
}
- for_each_node_mask(i, nodes_parsed)
+ for_each_node_mask(i, mem_nodes_parsed)
e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
nodes[i].end >> PAGE_SHIFT);
/* for out of order entries in SRAT */
@@ -381,7 +392,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
}
/* Account for nodes with cpus and no memory */
- nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+ nodes_or(node_possible_map, mem_nodes_parsed, rest_nodes_parsed);
/* Finally register nodes */
for_each_node_mask(i, node_possible_map)
@@ -416,7 +427,7 @@ static int __init find_node_by_addr(unsigned long addr)
int ret = NUMA_NO_NODE;
int i;
- for_each_node_mask(i, nodes_parsed) {
+ for_each_node_mask(i, mem_nodes_parsed) {
/*
* Find the real node that this emulated node appears on. For
* the sake of simplicity, we only use a real node's starting
@@ -466,10 +477,10 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
memcpy(apicid_to_node, fake_apicid_to_node, sizeof(apicid_to_node));
- nodes_clear(nodes_parsed);
+ nodes_clear(mem_nodes_parsed);
for (i = 0; i < num_nodes; i++)
if (fake_nodes[i].start != fake_nodes[i].end)
- node_set(i, nodes_parsed);
+ node_set(i, mem_nodes_parsed);
}
static int null_slit_node_compare(int a, int b)
--
1.5.4.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists