[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201019182853.7467-1-gpiccoli@canonical.com>
Date: Mon, 19 Oct 2020 15:28:53 -0300
From: "Guilherme G. Piccoli" <gpiccoli@...onical.com>
To: linux-mm@...ck.org
Cc: kernel-hardening@...ts.openwall.com,
linux-hardening@...r.kernel.org,
linux-security-module@...r.kernel.org, gpiccoli@...onical.com,
kernel@...ccoli.net, cascardo@...onical.com,
Alexander Potapenko <glider@...gle.com>,
James Morris <jamorris@...ux.microsoft.com>,
Kees Cook <keescook@...omium.org>,
Michal Hocko <mhocko@...e.cz>,
Mike Kravetz <mike.kravetz@...cle.com>
Subject: [PATCH] mm, hugetlb: Avoid double clearing for hugetlb pages
Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1
boot options") introduced the option for clearing/initializing kernel
pages on allocation time - this can be achieved either using a parameter
or a Kconfig setting. The goal for the change was a kernel hardening measure.
Despite the performance degradation with "init_on_alloc" is considered
low, there is one case in which it can be noticed and it may impact
latency of the system - this is when hugetlb pages are allocated.
Those pages are meant to be used by userspace *only* (differently of THP,
for example). In allocation time for hugetlb, their component pages go
through the initialization/clearing process in
prep_new_page()
kernel_init_free_pages()
and, when used in userspace mappings, the hugetlb are _again_ cleared;
I've checked that in practice by running the kernel selftest[0] for
hugetlb mapping - the pages go through clear_huge_pages() on page
fault [ see hugetlb_no_page() ].
This patch proposes a way to prevent this resource waste by skipping
the page initialization/clearing if the page is a component of hugetlb
page (even if "init_on_alloc" or the respective Kconfig are set).
The performance improvement measured in [1] demonstrates that it is
effective and bring the hugetlb allocation time to the same level as
with "init_on_alloc" disabled. Despite we've used sysctl to allocate
hugetlb pages in our tests, the same delay happens in early boot time
when hugetlb parameters are set on kernel cmdline (and "init_on_alloc"
is set).
[0] tools/testing/selftests/vm/map_hugetlb.c
[1] Test results - all tests executed in a pristine kernel 5.9+, from
2020-10-19, at commit 7cf726a594353010. A virtual machine with 96G of
total memory was used, the test consists in allocating 64G of 2M hugetlb
pages. Results below:
* Without this patch, init_on_alloc=1
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 97892212 kB
Hugetlb: 0 kB
$ time echo 32768 > /proc/sys/vm/nr_hugepages
real 0m24.189s
user 0m0.000s
sys 0m24.184s
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 30784732 kB
Hugetlb: 67108864 kB
* Without this patch, init_on_alloc=0
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 97892752 kB
Hugetlb: 0 kB
$ time echo 32768 > /proc/sys/vm/nr_hugepages
real 0m0.316s
user 0m0.000s
sys 0m0.316s
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 30783628 kB
Hugetlb: 67108864 kB
* WITH this patch, init_on_alloc=1
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 97891952 kB
Hugetlb: 0 kB
$ time echo 32768 > /proc/sys/vm/nr_hugepages
real 0m0.209s
user 0m0.000s
sys 0m0.209s
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 30782964 kB
Hugetlb: 67108864 kB
* WITH this patch, init_on_alloc=0
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 97892620 kB
Hugetlb: 0 kB
$ time echo 32768 > /proc/sys/vm/nr_hugepages
real 0m0.206s
user 0m0.000s
sys 0m0.206s
$ cat /proc/meminfo |grep "MemA\|Hugetlb"
MemAvailable: 30783804 kB
Hugetlb: 67108864 kB
Signed-off-by: Guilherme G. Piccoli <gpiccoli@...onical.com>
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@...onical.com>
Cc: Alexander Potapenko <glider@...gle.com>
Cc: James Morris <jamorris@...ux.microsoft.com>
Cc: Kees Cook <keescook@...omium.org>
Cc: Michal Hocko <mhocko@...e.cz>
Cc: Mike Kravetz <mike.kravetz@...cle.com>
---
Hi everybody, thanks in advance for the review/comments. I'd like to
point 2 things related to the implementation:
1) I understand that adding GFP flags is not really welcome by the
mm community; I've considered passing that as function parameter but
that would be a hacky mess, so I decided to add the flag since it seems
this is a fair use of the flag mechanism (to control actions on pages).
If anybody has a better/simpler suggestion to implement this, I'm all
ears - thanks!
2) The checkpatch script gave me the following error, but I decided to
ignore it in order to maintain the same format present in the file:
ERROR: space required after that close brace '}'
#171: FILE: include/trace/events/mmflags.h:52:
include/linux/gfp.h | 14 +++++++++-----
include/linux/mm.h | 2 +-
include/trace/events/mmflags.h | 3 ++-
mm/hugetlb.c | 7 +++++++
tools/perf/builtin-kmem.c | 1 +
5 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c603237e006c..c03909f8e7b6 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -39,8 +39,9 @@ struct vm_area_struct;
#define ___GFP_HARDWALL 0x100000u
#define ___GFP_THISNODE 0x200000u
#define ___GFP_ACCOUNT 0x400000u
+#define ___GFP_NOINIT_ON_ALLOC 0x800000u
#ifdef CONFIG_LOCKDEP
-#define ___GFP_NOLOCKDEP 0x800000u
+#define ___GFP_NOLOCKDEP 0x1000000u
#else
#define ___GFP_NOLOCKDEP 0
#endif
@@ -215,16 +216,19 @@ struct vm_area_struct;
* %__GFP_COMP address compound page metadata.
*
* %__GFP_ZERO returns a zeroed page on success.
+ *
+ * %__GFP_NOINIT_ON_ALLOC avoids uspace pages to be double-cleared (like HugeTLB)
*/
-#define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN)
-#define __GFP_COMP ((__force gfp_t)___GFP_COMP)
-#define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN)
+#define __GFP_COMP ((__force gfp_t)___GFP_COMP)
+#define __GFP_ZERO ((__force gfp_t)___GFP_ZERO)
+#define __GFP_NOINIT_ON_ALLOC ((__force gfp_t)___GFP_NOINIT_ON_ALLOC)
/* Disable lockdep for GFP context tracking */
#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP)
/* Room for N __GFP_FOO bits */
-#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP))
+#define __GFP_BITS_SHIFT (24 + IS_ENABLED(CONFIG_LOCKDEP))
#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
/**
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe70aaf..7fa60d22a90a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2882,7 +2882,7 @@ DECLARE_STATIC_KEY_FALSE(init_on_alloc);
static inline bool want_init_on_alloc(gfp_t flags)
{
if (static_branch_unlikely(&init_on_alloc) &&
- !page_poisoning_enabled())
+ !page_poisoning_enabled() && !(flags & __GFP_NOINIT_ON_ALLOC))
return true;
return flags & __GFP_ZERO;
}
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 67018d367b9f..89b0c0ddcc52 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -48,7 +48,8 @@
{(unsigned long)__GFP_WRITE, "__GFP_WRITE"}, \
{(unsigned long)__GFP_RECLAIM, "__GFP_RECLAIM"}, \
{(unsigned long)__GFP_DIRECT_RECLAIM, "__GFP_DIRECT_RECLAIM"},\
- {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"}\
+ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\
+ {(unsigned long)__GFP_NOINIT_ON_ALLOC, "__GFP_NOINIT_ON_ALLOC"}\
#define show_gfp_flags(flags) \
(flags) ? __print_flags(flags, "|", \
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fe76f8fd5a73..c60a6726b0be 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1742,6 +1742,13 @@ static struct page *alloc_fresh_huge_page(struct hstate *h,
{
struct page *page;
+ /*
+ * Signal the following page allocs to avoid them being cleared
+ * in allocation time - since HugeTLB pages are *only* used as
+ * userspace pages, they'll be cleared by default before usage.
+ */
+ gfp_mask |= __GFP_NOINIT_ON_ALLOC;
+
if (hstate_is_gigantic(h))
page = alloc_gigantic_page(h, gfp_mask, nid, nmask);
else
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index a50dae2c4ae9..bef90d8bb7f6 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -660,6 +660,7 @@ static const struct {
{ "__GFP_RECLAIM", "R" },
{ "__GFP_DIRECT_RECLAIM", "DR" },
{ "__GFP_KSWAPD_RECLAIM", "KR" },
+ { "__GFP_NOINIT_ON_ALLOC", "NIA" },
};
static size_t max_gfp_len;
--
2.28.0
Powered by blists - more mailing lists