lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 May 2013 19:50:21 +0800
From:	Tang Chen <tangchen@...fujitsu.com>
To:	yinghai@...nel.org, tglx@...utronix.de, mingo@...hat.com,
	hpa@...or.com, penberg@...nel.org, jacob.shin@....com,
	akpm@...ux-foundation.org, isimatu.yasuaki@...fujitsu.com
Cc:	x86@...nel.org, linux-kernel@...r.kernel.org
Subject: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.

The following patch-set allocated pagetables to local node.
https://lkml.org/lkml/2013/4/11/829

Doing this will break memory hot-remove.

Before removing memory, the kernel offlines memory. If offlining
memory fails, the memory cannot be removed. The pagetables are
used by the kernel, so they cannot be offlined. Furthermore, they
cannot be removed.

Of course, we can free pagetable pages because the pagetables of
the to be removed memory are useless. But offlining memory doesn't
mean removing memory. If users only want to offline memory, the
pagetables should not be freed.

The minimum unit of memory online/offline is block. And by default,
one block contains one section, which by default is 128MB. There is
possiblity that half of the block is pagetable, and the other half
is movable memory.

When we offline this kind of block, the status of the block is
uncertain. We cannot simply free the pagetables in this block because
they may be used by other online blocks. But when doing memory
hot-remove, the failure of offlining blocks will break the memory
hot-remove logic.


In order to fix it, we have three solutions:

1. Reserve the whole block (128MB), making no user can use the rest
   parts of the block. And skip them when offlining memory.
   When all the other blocks are offlined, free the pagetable, and remove
   all the memory.

   But we may lose some memory for this purpose. 128MB is a little big
   to waste.


2. Keep this block online. Although the offline operation fails, it is
   OK to remove memory.

   But the offline operation will always fail. And generally speaking,
   there are a lot of reasons of offline failing, it is difficult to
   detect if it is OK to remove memory. So we don't suggest this way.


3. Migrate user pages and make this block offline. Offlining memory won't
   stop the kernel using the pagetables stored in them, so it will be OK.

   But this will change the semantics of "offline". I'm not sure if we
   can do it in this way.


So before we fix this problem, I think we should not allocate pagetables
to local node when CONFIG_MEMORY_HOTREMOVE is enabled. And recover it when
we confirm the direction and fix the problem.

This patch is based on
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

Any other solution for this problem is welcome.


Signed-off-by: Tang Chen <tangchen@...fujitsu.com>
---
 arch/x86/mm/init.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8d0007a..8cd8a2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -55,18 +55,23 @@ __ref void *alloc_low_pages(unsigned int num)
 
 	if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
 		unsigned long ret;
-		if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+#ifndef CONFIG_MEMORY_HOTPLUG
+		if (local_max_pfn_mapped > local_min_pfn_mapped) {
+			ret = memblock_find_in_range(
+					local_min_pfn_mapped << PAGE_SHIFT,
+					local_max_pfn_mapped << PAGE_SHIFT,
+					PAGE_SIZE * num , PAGE_SIZE);
+		} else
+#endif
+		{
 			if (low_min_pfn_mapped >= low_max_pfn_mapped)
 				panic("alloc_low_page: ran out of memory");
 			ret = memblock_find_in_range(
 					low_min_pfn_mapped << PAGE_SHIFT,
 					low_max_pfn_mapped << PAGE_SHIFT,
 					PAGE_SIZE * num , PAGE_SIZE);
-		} else
-			ret = memblock_find_in_range(
-					local_min_pfn_mapped << PAGE_SHIFT,
-					local_max_pfn_mapped << PAGE_SHIFT,
-					PAGE_SIZE * num , PAGE_SIZE);
+		}
+
 		if (!ret)
 			panic("alloc_low_page: can not alloc memory");
 		memblock_reserve(ret, PAGE_SIZE * num);
@@ -443,6 +448,11 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
 		if (new_mapped_ram_size > mapped_ram_size)
 			step_size <<= STEP_SIZE_SHIFT;
 		mapped_ram_size += new_mapped_ram_size;
+
+		if (is_low) {
+			low_min_pfn_mapped = local_min_pfn_mapped;
+			low_max_pfn_mapped = local_max_pfn_mapped;
+		}
 	}
 
 	if (real_end < end) {
@@ -450,11 +460,6 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
 		if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
 			local_max_pfn_mapped = end >> PAGE_SHIFT;
 	}
-
-	if (is_low) {
-		low_min_pfn_mapped = local_min_pfn_mapped;
-		low_max_pfn_mapped = local_max_pfn_mapped;
-	}
 }
 
 #ifndef CONFIG_NUMA
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ