lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Apr 2021 22:28:43 +0200
From:   Oscar Salvador <osalvador@...e.de>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...nel.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Pavel Tatashin <pasha.tatashin@...een.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added
 memory range

On Tue, Apr 06, 2021 at 01:11:11PM +0200, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>  a) an existing memory is consumed for that purpose
>     (eg: ~2MB per 128MB memory section on x86_64)
>  b) if the whole node is movable then we have off-node struct pages
>     which has performance drawbacks.
>  c) It might be there are no PMD_ALIGNED chunks so memmap array gets
>     populated with base pages.
> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
> 
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __populate_section_memmap.
> Part of the implementation also relies on memory_block structure gaining
> a new field which specifies the number of vmemmap_pages at the beginning.
> This patch also introduces the following functions:
> 
>  - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> 		       calls kasan_add_zero_shadow() or the vmemmap range and marks
> 		       online as many sections as vmemmap pages fully span.
>  - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> 			 present_pages
>  - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> 
> The new function memory_block_online() calls vmemmap_init_space() before
> doing the actual online_pages(). Should online_pages() fail, we clean up
> by calling vmemmap_adjust_pages() and vmemmap_deinit_space().
> 
> On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
> offline_pages(), because offline_pages() performs the tearing-down of kthreads
> and the rebuilding of the zonelists if the node/zone become empty.
> If offline_pages() fails, we account back vmemmap pages by vmemmap_adjust_pages().
> If it succeeds, we call vmemmap_deinit_space().
> 
> Hot-remove:
> 
>  We need to be careful when removing memory, as adding and
>  removing memory needs to be done with the same granularity.
>  To check that this assumption is not violated, we check the
>  memory range we want to remove and if a) any memory block has
>  vmemmap pages and b) the range spans more than a single memory
>  block, we scream out loud and refuse to proceed.
> 
>  If all is good and the range was using memmap on memory (aka vmemmap pages),
>  we construct an altmap structure so free_hugepage_table does the right
>  thing and calls vmem_altmap_free instead of free_pagetable.
> 
> Signed-off-by: Oscar Salvador <osalvador@...e.de>

Heh, it seems I spaced out today.

We need a few things on top:

- In case !CONFIG_MEMORY_HOTREMOVE, we still need vmemmap_deinit_space
  as we call it from memory_block_online() too in case online_pages()
  fails. So we need to move it out of #CONFIG_MEMORY_HOTREMOVE, with
  the others.
- If vmemmap pages fully spans sections, we need to mark those sections
  as online, since online_pages() will not do it for us.
  In vmemmap_deinit_space, we need to mark them back offline.
  Since vmemmap_deinit_space might get called from
  memory_block_online(), we need to move the offline_mem_sections()
  out of #CONFIG_MEMORY_HOTREMOVE code.

So, we would need the following on top:

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index cc3dbc0abf02..c7669d2accfd 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -111,6 +111,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro);
 extern void vmemmap_adjust_pages(unsigned long pfn, long nr_pages);
 extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages,
 			      int nid, int online_type);
+extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages);
 extern int online_pages(unsigned long pfn, unsigned long nr_pages,
 			int online_type, int nid);
 extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
@@ -317,7 +318,6 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {}

 #ifdef CONFIG_MEMORY_HOTREMOVE

-extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 747e1c21d8e2..76f4ca5ed230 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1383,10 +1383,8 @@ static inline int online_section_nr(unsigned long nr)

 #ifdef CONFIG_MEMORY_HOTPLUG
 void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
-#ifdef CONFIG_MEMORY_HOTREMOVE
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
 #endif
-#endif

 static inline struct mem_section *__pfn_to_section(unsigned long pfn)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 43f0daf922e6..68f11751cd82 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -885,6 +885,25 @@ int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages, int nid,
 	return ret;
 }

+void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages)
+{
+	unsigned long end_pfn = pfn + nr_pages;
+        /*
+	 * The pages associated with this vmemmap have been offlined, so
+	 * we can reset its state here in case we have page_init_poison.
+	 */
+	remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn, nr_pages);
+	kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
+
+	/*
+	 * It might be that the vmemmap_pages fully span sections. If that is
+	 * the case, mark those sections offline here as otherwise they will be
+	 * left online.
+	 */
+	if (nr_pages >= PAGES_PER_SECTION)
+		offline_mem_sections(pfn, ALIGN_DOWN(end_pfn, PAGES_PER_SECTION));
+}
+
 int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 		       int online_type, int nid)
 {
@@ -1672,16 +1691,6 @@ static int count_system_ram_pages_cb(unsigned long start_pfn,
 	return 0;
 }

-void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages)
-{
-	/*
-	 * The pages associated with this vmemmap have been offlined, so
-	 * we can reset its state here in case we have page_init_poison.
-	 */
-	remove_pfn_range_from_zone(page_zone(pfn_to_page(pfn)), pfn, nr_pages);
-	kasan_remove_zero_shadow(__va(PFN_PHYS(pfn)), PFN_PHYS(nr_pages));
-}
-
 int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 {
 	const unsigned long end_pfn = start_pfn + nr_pages;
diff --git a/mm/sparse.c b/mm/sparse.c
index 7bd23f9d6cef..8e96cf00536b 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -623,7 +623,6 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 	}
 }

-#ifdef CONFIG_MEMORY_HOTREMOVE
 /* Mark all memory sections within the pfn range as offline */
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -644,7 +643,6 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 		ms->section_mem_map &= ~SECTION_IS_ONLINE;
 	}
 }
-#endif

 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 static struct page * __meminit populate_section_memmap(unsigned long pfn,


-- 
Oscar Salvador
SUSE L3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ