linux-kernel - [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150824073422.GC13082@gmail.com>
Date:	Mon, 24 Aug 2015 09:34:22 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	George Spelvin <linux@...izon.com>
Cc:	dave@...1.net, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	linux@...musvillemoes.dk, peterz@...radead.org, riel@...hat.com,
	rientjes@...gle.com, torvalds@...ux-foundation.org
Subject: [PATCH 3/3 v4] mm/vmalloc: Cache the vmalloc memory info


* George Spelvin <linux@...izon.com> wrote:

> First, an actual, albeit minor, bug: initializing both vmap_info_gen
> and vmap_info_cache_gen to 0 marks the cache as valid, which it's not.

Ha! :-) Fixed.

> vmap_info_gen should be initialized to 1 to force an initial
> cache update.

Yeah.

> Second, I don't see why you need a 64-bit counter.  Seqlocks consider
> 32 bits (31 bits, actually, the lsbit means "update in progress") quite
> a strong enough guarantee.

Just out of general paranoia - but you are right, and this would lower the 
overhead on 32-bit SMP platforms a bit, plus it avoids 64-bit word tearing 
artifacts on 32 bit platforms as well.

I modified it to u32.

> Third, it seems as though vmap_info_cache_gen is basically a duplicate
> of vmap_info_lock.sequence.  It should be possible to make one variable
> serve both purposes.

Correct, I alluded to that in my description:

> > Note that there's an even simpler variant possible I think: we could use just 
> > the two generation counters and barriers to remove the seqlock.

> You just need a kludge to handle the case of multiple vamp_info updates
> between cache updates.
> 
> There are two simple ones:
> 
> 1) Avoid bumping vmap_info_gen unnecessarily.  In vmap_unlock(), do
> 	vmap_info_gen = (vmap_info_lock.sequence | 1) + 1;
> 2) - Make vmap_info_gen a seqcount_t
>    - In vmap_unlock(), do write_seqcount_barrier(&vmap_info_gen)
>    - In get_vmalloc_info, inside the seqlock critical section, do
>      vmap_info_lock.seqcount.sequence = vmap_info_gen.sequence - 1;
>      (Using the vmap_info_gen.sequence read while validating the
>      cache in the first place.)
> 
> I should try to write an actual patch illustrating this.

So I think something like the patch below is even simpler than trying to kludge 
generation counter semantics into seqcounts.

I used two generation counters and a spinlock. The fast path is completely 
lockless and lightweight on modern SMP platforms. (where smp_rmb() is a no-op or 
very cheap.)

There's not even a seqlock retry loop, instead an invalid cache causes us to fall 
back to the old behavior - and the freshest result is guaranteed to end up in the 
cache.

The linecount got a bit larger: but half of it is comments.

Note that the generation counters are signed integers so that this comparison can 
be done:

+       if (gen-vmap_info_cache_gen > 0) {

Thanks,

	Ingo

======================>
>From 1a4c168a22cc302282cbd1bf503ecfc4dc52b74f Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...nel.org>
Date: Sat, 22 Aug 2015 12:28:01 +0200
Subject: [PATCH] mm/vmalloc: Cache the vmalloc memory info

Linus reported that for scripting-intense workloads such as the
Git build, glibc's qsort will read /proc/meminfo for every process
created (by way of get_phys_pages()), which causes the Git build
to generate a surprising amount of kernel overhead.

A fair chunk of the overhead is due to get_vmalloc_info() - which
walks a potentially long list to do its statistics.

Modify Linus's jiffies based patch to use generation counters
to cache the vmalloc info: vmap_unlock() increases the generation
counter, and the get_vmalloc_info() reads it and compares it
against a cached generation counter.

Also use a seqlock to make sure we always print a consistent
set of vmalloc statistics.

Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Rik van Riel <riel@...hat.com>
Cc: linux-mm@...ck.org
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
 mm/vmalloc.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 80 insertions(+), 3 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 605138083880..23df06ebb48a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -276,7 +276,21 @@ EXPORT_SYMBOL(vmalloc_to_pfn);
 #define VM_LAZY_FREEING	0x02
 #define VM_VM_AREA	0x04
 
-static DEFINE_SPINLOCK(vmap_area_lock);
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock);
+
+#ifdef CONFIG_PROC_FS
+/*
+ * A seqlock and two generation counters for a simple cache of the
+ * vmalloc allocation statistics info printed in /proc/meminfo.
+ *
+ * ( The assumption of the optimization is that it's read frequently, but
+ *   modified infrequently. )
+ */
+static DEFINE_SPINLOCK(vmap_info_lock);
+static int vmap_info_gen = 1;
+static int vmap_info_cache_gen;
+static struct vmalloc_info vmap_info_cache;
+#endif
 
 static inline void vmap_lock(void)
 {
@@ -285,6 +299,9 @@ static inline void vmap_lock(void)
 
 static inline void vmap_unlock(void)
 {
+#ifdef CONFIG_PROC_FS
+	WRITE_ONCE(vmap_info_gen, vmap_info_gen+1);
+#endif
 	spin_unlock(&vmap_area_lock);
 }
 
@@ -2699,7 +2716,7 @@ static int __init proc_vmalloc_init(void)
 }
 module_init(proc_vmalloc_init);
 
-void get_vmalloc_info(struct vmalloc_info *vmi)
+static void calc_vmalloc_info(struct vmalloc_info *vmi)
 {
 	struct vmap_area *va;
 	unsigned long free_area_size;
@@ -2746,5 +2763,65 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
 out:
 	rcu_read_unlock();
 }
-#endif
 
+/*
+ * Return a consistent snapshot of the current vmalloc allocation
+ * statistics, for /proc/meminfo:
+ */
+void get_vmalloc_info(struct vmalloc_info *vmi)
+{
+	int gen = READ_ONCE(vmap_info_gen);
+
+	/*
+	 * If the generation counter of the cache matches that of
+	 * the vmalloc generation counter then return the cache:
+	 */
+	if (READ_ONCE(vmap_info_cache_gen) == gen) {
+		int gen_after;
+
+		/*
+		 * The two read barriers make sure that we read
+		 * 'gen', 'vmap_info_cache' and 'gen_after' in
+		 * precisely that order:
+		 */
+		smp_rmb();
+		*vmi = vmap_info_cache;
+
+		smp_rmb();
+		gen_after = READ_ONCE(vmap_info_gen);
+
+		/* The cache is still valid: */
+		if (gen == gen_after)
+			return;
+
+		/* Ok, the cache got invalidated just now, regenerate it */
+		gen = gen_after;
+	}
+
+	/* Make sure 'gen' is read before the vmalloc info */
+	smp_rmb();
+
+	calc_vmalloc_info(vmi);
+
+	/*
+	 * All updates to vmap_info_cache_gen go through this spinlock,
+	 * so when the cache got invalidated, we'll only mark it valid
+	 * again if we first fully write the new vmap_info_cache.
+	 *
+	 * This ensures that partial results won't be used.
+	 */
+	spin_lock(&vmap_info_lock);
+	if (gen-vmap_info_cache_gen > 0) {
+		vmap_info_cache = *vmi;
+		/*
+		 * Make sure the new cached data is visible before
+		 * the generation counter update:
+		 */
+		smp_wmb();
+
+		WRITE_ONCE(vmap_info_cache_gen, gen);
+	}
+	spin_unlock(&vmap_info_lock);
+}
+
+#endif /* CONFIG_PROC_FS */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/