linux-kernel - [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150825095638.GA24750@gmail.com>
Date:	Tue, 25 Aug 2015 11:56:38 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	George Spelvin <linux@...izon.com>
Cc:	dave@...1.net, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux@...musvillemoes.dk, peterz@...radead.org, riel@...hat.com,
	rientjes@...gle.com, torvalds@...ux-foundation.org
Subject: [PATCH 3/3 v6] mm/vmalloc: Cache the vmalloc memory info


* George Spelvin <linux@...izon.com> wrote:

> (I hope I'm not annoying you by bikeshedding this too much, although I
> think this is improving.)

[ I don't mind, although I wish other, more critical parts of the kernel got this
  much attention as well ;-) ]

> Anyway, suggested changes for v6 (sigh...):
> 
> First: you do a second read of vmap_info_gen to optimize out the copy
> of vmalloc_info if it's easily seen as pointless, but given how small
> vmalloc_info is (two words!), i'd be inclined to omit that optimization.
> 
> Copy always, *then* see if it's worth keeping.  Smaller code, faster
> fast path, and is barely noticeable on the slow path.

Ok, done.

> Second, and this is up to you, I'd be inclined to go fully non-blocking and
> only spin_trylock().  If that fails, just skip the cache update.

So I'm not sure about this one: we have no guarantee of the order every updater 
reaches the spinlock, and we want the 'freshest' updater to do the update. The 
trylock might cause us to drop the 'freshest' update erroneously - so this change 
would introduce a 'stale data' bug I think.

> Third, ANSI C rules allow a compiler to assume that signed integer
> overflow does not occur.  That means that gcc is allowed to optimize
> "if (x - y > 0)" to "if (x > y)".

That's annoying ...

> Given that gcc has annoyed us by using this optimization in other
> contexts, It might be safer to make them unsigned (which is required to
> wrap properly) and cast to integer after subtraction.

Ok, done.

> Basically, the following (untested, but pretty damn simple):

I've attached v6 which applies your first and last suggestion, but not the trylock 
one.

I also removed _ONCE() accesses from the places that didn't need them.

I added your Reviewed-by optimistically, saving a v7 submission hopefully ;-)

Lightly tested.

Thanks,

	Ingo

==============================>
>From 8364822f9cff9da9f9858f0ca1f1ddc5bd3ad3a1 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...nel.org>
Date: Sat, 22 Aug 2015 12:28:01 +0200
Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info

Linus reported that for certain workloads such as 'make test' in the
Git build, glibc's qsort will read /proc/meminfo for every process
created (by way of get_phys_pages()), which causes the Git build
to generate a surprising amount of kernel overhead.

A fair chunk of the overhead is due to get_vmalloc_info() - which
walks a potentially long list to do its statistics.

Modify Linus's jiffies based patch to use generation counters
to cache the vmalloc info: vmap_unlock() increases the generation
counter, and the get_vmalloc_info() reads it and compares it
against a cached generation counter.

Also use a spinlock to make sure we always print a consistent
set of vmalloc statistics, FWIIW.

Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
Reviewed-by: George Spelvin <linux@...izon.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Rik van Riel <riel@...hat.com>
Cc: linux-mm@...ck.org
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
 mm/vmalloc.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 74 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 605138083880..a0a4274a7be9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 #define VM_LAZY_FREEING	0x02
 #define VM_VM_AREA	0x04
 
-static DEFINE_SPINLOCK(vmap_area_lock);
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
+
+#ifdef CONFIG_PROC_FS
+/*
+ * A seqlock and two generation counters for a simple cache of the
+ * vmalloc allocation statistics info printed in /proc/meminfo.
+ *
+ * ( The assumption of the optimization is that it's read frequently, but
+ *   modified infrequently. )
+ */
+static DEFINE_SPINLOCK(vmap_info_lock);
+static unsigned int vmap_info_gen = 1;
+static unsigned int vmap_info_cache_gen;
+static struct vmalloc_info vmap_info_cache;
+#endif
 
 static inline void vmap_lock(void)
 {
@@ -285,6 +299,9 @@ static inline void vmap_lock(void)
 
 static inline void vmap_unlock(void)
 {
+#ifdef CONFIG_PROC_FS
+	WRITE_ONCE(vmap_info_gen, vmap_info_gen+1);
+#endif
 	spin_unlock(&vmap_area_lock);
 }
 
@@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void)
 }
 module_init(proc_vmalloc_init);
 
-void get_vmalloc_info(struct vmalloc_info *vmi)
+static void calc_vmalloc_info(struct vmalloc_info *vmi)
 {
 	struct vmap_area *va;
 	unsigned long free_area_size;
@@ -2746,5 +2763,59 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 out:
 	rcu_read_unlock();
 }
-#endif
 
+/*
+ * Return a consistent snapshot of the current vmalloc allocation
+ * statistics, for /proc/meminfo:
+ */
+void get_vmalloc_info(struct vmalloc_info *vmi)
+{
+	unsigned int cache_gen, gen;
+
+	/*
+	 * The common case is that the cache is valid, so first
+	 * read it, then check its validity.
+	 *
+	 * The two read barriers make sure that we read
+	 * 'cache_gen', 'vmap_info_cache' and 'gen' in
+	 * precisely that order:
+	 */
+	cache_gen = vmap_info_cache_gen;
+	smp_rmb();
+	*vmi = vmap_info_cache;
+	smp_rmb();
+	gen = vmap_info_gen;
+
+	/*
+	 * If the generation counter of the cache matches that of
+	 * the vmalloc generation counter then return the cache:
+	 */
+	if (cache_gen == gen)
+		return;
+
+	/* Make sure 'gen' is read before the vmalloc info: */
+	smp_rmb();
+	calc_vmalloc_info(vmi);
+
+	/*
+	 * All updates to vmap_info_cache_gen go through this spinlock,
+	 * so when the cache got invalidated, we'll only mark it valid
+	 * again if we first fully write the new vmap_info_cache.
+	 *
+	 * This ensures that partial results won't be used and that the
+	 * vmalloc info belonging to the freshest update is used:
+	 */
+	spin_lock(&vmap_info_lock);
+	if ((int)(gen-vmap_info_cache_gen) > 0) {
+		vmap_info_cache = *vmi;
+		/*
+		 * Make sure the new cached data is visible before
+		 * the generation counter update:
+		 */
+		smp_wmb();
+		vmap_info_cache_gen = gen;
+	}
+	spin_unlock(&vmap_info_lock);
+}
+
+#endif /* CONFIG_PROC_FS */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/