lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1440489266-31127-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com>
Date:	Tue, 25 Aug 2015 13:24:26 +0530
From:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
To:	<davem@...emloft.net>, <kuznet@....inr.ac.ru>, <jmorris@...ei.org>,
	<yoshfuji@...ux-ipv6.org>, <kaber@...sh.net>
Cc:	<jiri@...nulli.us>, <edumazet@...gle.com>,
	<hannes@...essinduktion.org>, <tom@...bertland.com>,
	<azhou@...ira.com>, <ebiederm@...ssion.com>,
	<ipm@...rality.org.uk>, <nicolas.dichtel@...nd.com>,
	<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<raghavendra.kt@...ux.vnet.ibm.com>, <anton@....ibm.com>,
	<nacc@...ux.vnet.ibm.com>, <srikar@...ux.vnet.ibm.com>
Subject: [PATCH RFC 2/2] net: Optimize snmp stat aggregation by walking all the percpu data at once

Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 90 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.357s
cache miss             2.7%             1.38%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.56%  swapper          [kernel.kallsyms]     [k] snooze_loop
     8.72%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     7.59%  docker           [kernel.kallsyms]     [k] veth_stats_one
     3.65%  swapper          [kernel.kallsyms]     [k] _raw_spin_lock

Signed-off-by: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
---
 net/ipv6/addrconf.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 21c2c81..2ec905f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4624,16 +4624,24 @@ static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib,
 }
 
 static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib,
-				      int items, int bytes, size_t syncpoff)
+					int items, int bytes, size_t syncpoff)
 {
-	int i;
+	int i, c;
+	u64 *tmp;
 	int pad = bytes - sizeof(u64) * items;
 	BUG_ON(pad < 0);
 
+	tmp = kcalloc(items, sizeof(u64), GFP_KERNEL);
+
 	/* Use put_unaligned() because stats may not be aligned for u64. */
 	put_unaligned(items, &stats[0]);
+
+	for_each_possible_cpu(c)
+		for (i = 1; i < items; i++)
+			tmp[i] += snmp_get_cpu_field64(mib, c, i, syncpoff);
+
 	for (i = 1; i < items; i++)
-		put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]);
+		put_unaligned(tmp[i], &stats[i]);
 
 	memset(&stats[items], 0, pad);
 }
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ