lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 Jan 2010 00:40:02 +0800
From:	Haicheng Li <haicheng.li@...ux.intel.com>
To:	David Rientjes <rientjes@...gle.com>
CC:	Yinghai Lu <yinghai@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
	Andi Kleen <andi@...stfloor.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mm/srat_64.c: nodes_parsed should include all nodes
 detected by ACPI.

David Rientjes wrote:
> On Tue, 19 Jan 2010, Haicheng Li wrote:
>> David, per my understanding, your concern should be like, with this fix, if
>> 3rd or 4th entry of Node0 has no address range, then Node0 won't be recoverd
>> with oldnode and won't be cleared in nodes_parsed. But how is it handled by
>> old code?
>>
>
> It's not evident with your machine because you do not have two SRAT
> entries for the same node id, one without ACPI_SRAT_MEM_HOT_PLUGGABLE and
> other with ACPI_SRAT_MEM_HOT_PLUGGABLE.
>
> The old code would preserve the address range for the former in oldnode
> and then reset its data in the struct bootnode since nodes_parsed has a
> bit set for that node.  That's needed by later code that I've mentioned:
> acpi_get_nodes(), specifically, which breaks with your patch in addition
> to nodes_cover_memory() and e820_register_active_regions().
>
> Only when the previous oldnode entry does not have a valid address range,
> meaning it is [0, 0), does the bit get cleared in nodes_parsed.

Understood, the old code is meant to make nodes_parsed _NEVER_ include the node whose memory regions
are all hotpluggable.

>> - it recovers node with oldnode as long as current entry is HOT_PLUGGABLE. so
>> it handles the recover issue. but I think following patch can simply fix it as
>> well.
>>
>
> If it's not ACPI_SRAT_MEM_HOT_PLUGGABLE, we know the address range is
> already valid given the sanity checks that it has successfully passed
> through in acpi_numa_memory_affinity_init(), so we require no further
> checking.  However, your patch will not reset the previous address range
> when a ACPI_SRAT_MEM_HOT_PLUGGABLE entry is found for the same address
> range and you're leaving the bit set in nodes_parsed.

I see. the precondition is that nodes_parsed should not include such hotpluggable node, then such
data of hotpluggable mem should be kept in nodes_add[] other than in nodes[].

>>> cpu_nodes_parsed handles nodes without memory, there's no reason why a bit
>>> should be set in nodes_parsed if its corresponding node does not have a
>>> valid address range.
>> For a node has _NOT_ either CPU or Memory like Node1, cpu_nodes_parsed cannot
>> handle it.
>>
>
> It most certainly can since its sole purpose is to include memoryless
> nodes in node_possible_map.  It has no other use case that would break as
> the result of adding hotpluggable nodes, hence the reason I suggested
> renaming it no_mem_nodes_parsed.

Yeah, so the key point is who should keep hotpluggable nodes, cpu_nodes_parsed or nodes_parsed?
Actually now I agree with you on this, let cpu_nodes_parsed keep hotpluggable nodes since it won't
break any old code. Originally my patch wanna let nodes_parsed keep hotpluggable nodes, which would
make things complex.

but name "no_mem_nodes_parsed" seems convoluted too because (from code logic) this nodemask is
usually based on CPU/APIC Affinity Structure.
How about rename cpu_nodes_parsed as "rest_nodes_parsed" (comparing with "mem_nodes_parsed), since
it handles
   - nodes with CPU on
   - nodes with hotpluggable memory region
?

>>> We have a reasonable expectation that nodes_parsed represents memory nodes
>>> given its use for e820_register_active_regions() and nodes_cover_memory() as
>>> well as acpi_get_nodes() for NUMA emulation, for example, which would be
>>> broken with this patch.  See dc0985519.
>>>
>> either nodes_cover_memory() or e820_register_active_regions() or
>> acpi_get_nodes(), they all have node-addr-range check code, if the
>> node-addr-range is invalid, they won't be harmed.
>>
>
> Wrong, acpi_get_nodes() does not have such a check it only iterates over
> nodes_parsed.  In other words, you'd be starting a new requirement for
> nodes_parsed with your patch: it would now be necessary to check for a
> valid (non-zero) address range for each set bit.  Instead, I'm suggesting
> the nodes_parsed represents only nodes with valid memory, which is a
> reasonable expectation given the semantics of both it and cpu_nodes_parsed
> to handle their memoryless counterparts.

agreed. In term of this, using nodes_parsed to represent only nodes with valid memory can make
things simple.

> In other words, the following should easily fix the issue without breaking
> the existing logic that preserves the old address range for node ids that
> have SRAT entries both with and without ACPI_SRAT_MEM_HOT_PLUGGABLE.
> Could you give it a try?

of course, it fixes the issue because node_possible_map now includes hotpluggable node, and then
nr_node_ids becomes equal to maximum of possible nodes on the motherboard;).

let's add more changes to fix naming issue as well since it's too confusing for people to understand
the code logic. how about below patch?
---
  arch/x86/mm/srat_64.c |   39 +++++++++++++++++++++++++--------------
  1 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/srat_64.c b/arch/x86/mm/srat_64.c
index a271241..aebbdd4 100644
--- a/arch/x86/mm/srat_64.c
+++ b/arch/x86/mm/srat_64.c
@@ -27,8 +27,17 @@ int acpi_numa __initdata;

  static struct acpi_table_slit *acpi_slit;

-static nodemask_t nodes_parsed __initdata;
-static nodemask_t cpu_nodes_parsed __initdata;
+/* mem_nodes_parsed:
+ *  - nodes with memory on
+ *
+ * rest_nodes_parsed:
+ *  - nodes with CPU on
+ *  - nodes with hotpluggable memory region
+ *
+ * We union these two nodemasks to get node_possible_map.
+ */
+static nodemask_t mem_nodes_parsed __initdata;
+static nodemask_t rest_nodes_parsed __initdata;
  static struct bootnode nodes[MAX_NUMNODES] __initdata;
  static struct bootnode nodes_add[MAX_NUMNODES];

@@ -134,7 +143,7 @@ acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa)

  	apic_id = pa->apic_id;
  	apicid_to_node[apic_id] = node;
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, rest_nodes_parsed);
  	acpi_numa = 1;
  	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%04x -> Node %u\n",
  	       pxm, apic_id, node);
@@ -168,7 +177,7 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa)
  	else
  		apic_id = pa->apic_id;
  	apicid_to_node[apic_id] = node;
-	node_set(node, cpu_nodes_parsed);
+	node_set(node, rest_nodes_parsed);
  	acpi_numa = 1;
  	printk(KERN_INFO "SRAT: PXM %u -> APIC 0x%02x -> Node %u\n",
  	       pxm, apic_id, node);
@@ -229,9 +238,11 @@ update_nodes_add(int node, unsigned long start, unsigned long end)
  			printk(KERN_ERR "SRAT: Hotplug zone not continuous. Partly ignored\n");
  	}

-	if (changed)
+	if (changed) {
+		node_set(node, rest_nodes_parsed);
  		printk(KERN_INFO "SRAT: hot plug zone found %Lx - %Lx\n",
  				 nd->start, nd->end);
+	}
  }

  /* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
@@ -278,7 +289,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
  	}
  	nd = &nodes[node];
  	oldnode = *nd;
-	if (!node_test_and_set(node, nodes_parsed)) {
+	if (!node_test_and_set(node, mem_nodes_parsed)) {
  		nd->start = start;
  		nd->end = end;
  	} else {
@@ -296,7 +307,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
  		/* restore nodes[node] */
  		*nd = oldnode;
  		if ((nd->start | nd->end) == 0)
-			node_clear(node, nodes_parsed);
+			node_clear(node, mem_nodes_parsed);
  	}

  	node_memblk_range[num_node_memblks].start = start;
@@ -313,7 +324,7 @@ static int __init nodes_cover_memory(const struct bootnode *nodes)
  	unsigned long pxmram, e820ram;

  	pxmram = 0;
-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		unsigned long s = nodes[i].start >> PAGE_SHIFT;
  		unsigned long e = nodes[i].end >> PAGE_SHIFT;
  		pxmram += e - s;
@@ -341,7 +352,7 @@ int __init acpi_get_nodes(struct bootnode *physnodes)
  	int i;
  	int ret = 0;

-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		physnodes[ret].start = nodes[i].start;
  		physnodes[ret].end = nodes[i].end;
  		ret++;
@@ -370,7 +381,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
  		return -1;
  	}

-	for_each_node_mask(i, nodes_parsed)
+	for_each_node_mask(i, mem_nodes_parsed)
  		e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
  						nodes[i].end >> PAGE_SHIFT);
  	/* for out of order entries in SRAT */
@@ -381,7 +392,7 @@ int __init acpi_scan_nodes(unsigned long start, unsigned long end)
  	}

  	/* Account for nodes with cpus and no memory */
-	nodes_or(node_possible_map, nodes_parsed, cpu_nodes_parsed);
+	nodes_or(node_possible_map, mem_nodes_parsed, rest_nodes_parsed);

  	/* Finally register nodes */
  	for_each_node_mask(i, node_possible_map)
@@ -416,7 +427,7 @@ static int __init find_node_by_addr(unsigned long addr)
  	int ret = NUMA_NO_NODE;
  	int i;

-	for_each_node_mask(i, nodes_parsed) {
+	for_each_node_mask(i, mem_nodes_parsed) {
  		/*
  		 * Find the real node that this emulated node appears on.  For
  		 * the sake of simplicity, we only use a real node's starting
@@ -466,10 +477,10 @@ void __init acpi_fake_nodes(const struct bootnode *fake_nodes, int num_nodes)
  		__acpi_map_pxm_to_node(fake_node_to_pxm_map[i], i);
  	memcpy(apicid_to_node, fake_apicid_to_node, sizeof(apicid_to_node));

-	nodes_clear(nodes_parsed);
+	nodes_clear(mem_nodes_parsed);
  	for (i = 0; i < num_nodes; i++)
  		if (fake_nodes[i].start != fake_nodes[i].end)
-			node_set(i, nodes_parsed);
+			node_set(i, mem_nodes_parsed);
  }

  static int null_slit_node_compare(int a, int b)
-- 
1.5.4.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ