linux-kernel - [PATCH 3/3] mm: vmevent: Sum per cpu pagesets stats asynchronously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1350036719-29031-3-git-send-email-anton.vorontsov@linaro.org>
Date:	Fri, 12 Oct 2012 03:11:59 -0700
From:	Anton Vorontsov <anton.vorontsov@...aro.org>
To:	Pekka Enberg <penberg@...nel.org>
Cc:	Mel Gorman <mgorman@...e.de>,
	Leonid Moiseichuk <leonid.moiseichuk@...ia.com>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Minchan Kim <minchan@...nel.org>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	John Stultz <john.stultz@...aro.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, linaro-kernel@...ts.linaro.org,
	patches@...aro.org, kernel-team@...roid.com
Subject: [PATCH 3/3] mm: vmevent: Sum per cpu pagesets stats asynchronously

Currently vmevent relies on the global page state stats, which is updated
once per stat_interval (1 second) or when the per CPU pageset stats hit
their threshold.

We can sum the vm_stat_diff values asynchronously: they will be possibly a
bit inconsistent, but overall this should improve accuracy, since with
previous scheme we would always use worst-case scenario (i.e. we'd wait
for threshold to hit or 1 second delay), but now we use somewhat average
accuracy.

The idea is very similar to zone_page_state_snapshot().

Note that this might cause more pressure on CPU caches, so we only use
this when userland explicitly asks for accuracy, plus since we gather
stats outside of any fastpath, this should be OK in general.

Signed-off-by: Anton Vorontsov <anton.vorontsov@...aro.org>
---
 mm/vmevent.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/mm/vmevent.c b/mm/vmevent.c
index 8113bda..a059bed 100644
--- a/mm/vmevent.c
+++ b/mm/vmevent.c
@@ -52,10 +52,51 @@ static u64 vmevent_attr_swap_pages(struct vmevent_watch *watch,
 #endif
 }
 
+/*
+ * In the worst case, this is inaccurate by
+ *
+ *	±(pcp->stat_threshold * zones * online_cpus)
+ *
+ * For say 4-core 2GB setup that would be ~350 KB worst case inaccuracy,
+ * but to reach this inaccuracy, CPUs would all need have to keep
+ * allocating (or freeing) pages from all the zones at the same time, and
+ * all their current vm_stat_diff values would need to be pretty close to
+ * pcp->stat_threshold.
+ *
+ * The larger the system, the more inaccurare vm_stat is (but at the same
+ * time, on large systems we care much less about small chunks of memory).
+ * When a more predicted behaviour is needed, userland can set a desired
+ * accuracy via attr->value2.
+ */
+static ulong vmevent_global_page_state(struct vmevent_attr *attr,
+				       enum zone_stat_item si)
+{
+	ulong global = global_page_state(si);
+#ifdef CONFIG_SMP
+	struct zone *zone;
+
+	if (!attr->value2)
+		return global;
+
+	for_each_populated_zone(zone) {
+		uint cpu;
+
+		for_each_online_cpu(cpu) {
+			struct per_cpu_pageset *pcp;
+
+			pcp = per_cpu_ptr(zone->pageset, cpu);
+
+			global += pcp->vm_stat_diff[si];
+		}
+	}
+#endif
+	return global;
+}
+
 static u64 vmevent_attr_free_pages(struct vmevent_watch *watch,
 				   struct vmevent_attr *attr)
 {
-	return global_page_state(NR_FREE_PAGES);
+	return vmevent_global_page_state(attr, NR_FREE_PAGES);
 }
 
 static u64 vmevent_attr_avail_pages(struct vmevent_watch *watch,
-- 
1.7.12.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/