netdev - [RFC PATCH] net: frag limit checks need to use percpu_counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <150417481955.28907.15567119824187929000.stgit@firesoul>
Date:   Thu, 31 Aug 2017 12:20:19 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     liujian56@...wei.com
Cc:     netdev@...r.kernel.org, Florian Westphal <fw@...len.de>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: [RFC PATCH] net: frag limit checks need to use
 percpu_counter_compare

To: Liujian can you please test this patch?
 I want to understand if using __percpu_counter_compare() solves
 the problem correctness wise (even-though this will be slower
 than using a simple atomic_t on your big system).

Fix bug in fragmentation codes use of the percpu_counter API, that
cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet.  Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The __percpu_counter_compare() does the right thing, and takes into
account the number of (online) CPUs and batch size, to account for
this and call __percpu_counter_sum() when needed.

On systems with many CPUs this will unfortunately always result in the
heavier fully locked __percpu_counter_sum() which touch the
per_cpu_ptr of all (online) CPUs.

On systems with a smaller number of CPUs this solution is also not
optimal, because __percpu_counter_compare()/__percpu_counter_sum()
doesn't help synchronize the global counter.
 Florian Westphal have an idea of adding some counter sync points,
which should help address this issue.
---
 include/net/inet_frag.h  |   16 ++++++++++++++--
 net/ipv4/inet_fragment.c |    6 +++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 6fdcd2427776..b586e320783d 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -147,9 +147,21 @@ static inline bool inet_frag_evicting(struct inet_frag_queue *q)
  */
 static unsigned int frag_percpu_counter_batch = 130000;
 
-static inline int frag_mem_limit(struct netns_frags *nf)
+static inline bool frag_mem_over_limit(struct netns_frags *nf, int thresh)
 {
-	return percpu_counter_read(&nf->mem);
+	/* When reading counter here, __percpu_counter_compare() call
+	 * will invoke __percpu_counter_sum() when needed.  Which
+	 * depend on num_online_cpus()*batch size, as each CPU can
+	 * potentential can hold a batch count.
+	 *
+	 * With many CPUs this heavier sum operation will
+	 * unfortunately always occur.
+	 */
+	if (__percpu_counter_compare(&nf->mem, thresh,
+				     frag_percpu_counter_batch) > 0)
+		return true;
+	else
+		return false;
 }
 
 static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 96e95e83cc61..ee2cf56900e6 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -120,7 +120,7 @@ static void inet_frag_secret_rebuild(struct inet_frags *f)
 static bool inet_fragq_should_evict(const struct inet_frag_queue *q)
 {
 	return q->net->low_thresh == 0 ||
-	       frag_mem_limit(q->net) >= q->net->low_thresh;
+		frag_mem_over_limit(q->net, q->net->low_thresh);
 }
 
 static unsigned int
@@ -355,7 +355,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 {
 	struct inet_frag_queue *q;
 
-	if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) {
+	if (!nf->high_thresh || frag_mem_over_limit(nf, nf->high_thresh)) {
 		inet_frag_schedule_worker(f);
 		return NULL;
 	}
@@ -396,7 +396,7 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 	struct inet_frag_queue *q;
 	int depth = 0;
 
-	if (frag_mem_limit(nf) > nf->low_thresh)
+	if (frag_mem_over_limit(nf, nf->low_thresh))
 		inet_frag_schedule_worker(f);
 
 	hash &= (INETFRAGS_HASHSZ - 1);