lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 04 Dec 2012 14:30:46 +0100
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Florian Westphal <fw@...len.de>
Cc:	Jesper Dangaard Brouer <brouer@...hat.com>, netdev@...r.kernel.org,
	Thomas Graf <tgraf@...g.ch>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Cong Wang <amwang@...hat.com>,
	Herbert Xu <herbert@...dor.hengli.com.au>
Subject: [net-next PATCH V3-evictor] net: frag evictor,
	avoid killing warm frag queues

The fragmentation evictor system have a very unfortunate eviction
system for killing fragment, when the system is put under pressure.

If packets are coming in too fast, the evictor code kills "warm"
fragments too quickly.  Resulting in close to zero throughput, as
fragments are killed before they have a chance to complete

This is related to the bad interaction with the LRU (Least Recently
Used) list.  Under load the LRU list sort-of changes meaning/behavior.
When the LRU head is very new/warm, then the head is most likely the
one with most fragments and the tail (latest used or added element)
with least.

Solved by, introducing a creation "jiffie" timestamp (creation_ts).
If the element is tried evicted in same jiffie, then perform tail drop
on the LRU list instead.

Signed-off-by: Jesper Dangaard Brouer <jbrouer@...hat.com>

---
V2:
 - Drop the INET_FRAG_FIRST_IN idea for detecting dropped "head" packets

V3:
 - Move the tail drop, from inet_frag_alloc() to inet_frag_evictor()
   This will be close to the same semantics, but at a higher cost.


 include/net/inet_frag.h  |    1 +
 net/ipv4/inet_fragment.c |   12 ++++++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 32786a0..7b897b2 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -24,6 +24,7 @@ struct inet_frag_queue {
 	ktime_t			stamp;
 	int			len;        /* total length of orig datagram */
 	int			meat;
+	u32			creation_ts;/* jiffies when queue was created*/
 	__u8			last_in;    /* first/last segment arrived? */
 
 #define INET_FRAG_COMPLETE	4
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 4750d2b..d8bf59b 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -178,6 +178,16 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force)
 
 		q = list_first_entry(&nf->lru_list,
 				struct inet_frag_queue, lru_list);
+
+		/* When head of LRU is very new/warm, then the head is
+		 * most likely the one with most fragments and the
+		 * tail with least, thus drop tail
+		 */
+		if (!force && q->creation_ts == (u32) jiffies) {
+			q = list_entry(&nf->lru_list.prev,
+				struct inet_frag_queue, lru_list);
+		}
+
 		atomic_inc(&q->refcnt);
 		read_unlock(&f->lock);
 
@@ -243,11 +253,13 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf,
 		struct inet_frags *f, void *arg)
 {
 	struct inet_frag_queue *q;
+	// Note: We could also perform the tail drop here
 
 	q = kzalloc(f->qsize, GFP_ATOMIC);
 	if (q == NULL)
 		return NULL;
 
+	q->creation_ts = (u32) jiffies;
 	q->net = nf;
 	f->constructor(q, arg);
 	atomic_add(f->qsize, &nf->mem);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ