netdev - RE: [net-next PATCH V3-evictor] net: frag evictor,avoid killing warm frag queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 4 Dec 2012 14:32:36 -0000
From:	"David Laight" <David.Laight@...LAB.COM>
To:	"Jesper Dangaard Brouer" <brouer@...hat.com>,
	"Eric Dumazet" <eric.dumazet@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	"Florian Westphal" <fw@...len.de>
Cc:	<netdev@...r.kernel.org>, "Thomas Graf" <tgraf@...g.ch>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	"Cong Wang" <amwang@...hat.com>,
	"Herbert Xu" <herbert@...dor.hengli.com.au>
Subject: RE: [net-next PATCH V3-evictor] net: frag evictor,avoid killing warm frag queues

> The fragmentation evictor system have a very unfortunate eviction
> system for killing fragment, when the system is put under pressure.
> 
> If packets are coming in too fast, the evictor code kills "warm"
> fragments too quickly.  Resulting in close to zero throughput, as
> fragments are killed before they have a chance to complete
> 
> This is related to the bad interaction with the LRU (Least Recently
> Used) list.  Under load the LRU list sort-of changes meaning/behavior.
> When the LRU head is very new/warm, then the head is most likely the
> one with most fragments and the tail (latest used or added element)
> with least.
> 
> Solved by, introducing a creation "jiffie" timestamp (creation_ts).
> If the element is tried evicted in same jiffie, then perform tail drop
> on the LRU list instead.

I'm not at all sure a 'same tick' test is actually sensible.

Some quick 'ball park' maths:
10Mbps is about 1kB per jiffie (1kHz tick).
So at 10Mbps you'll see a frame every 1.5 jiffies.
So if the source of any fragment stream is some remote ADSL
connection (rather than a local Ge LAN connection) you'll
never manage to assemble the entire datagram.
Anyone connecting from dialup will fail miserably.

This means you need to keep some initial fragments for much
longer than 1 jiffie before discarding them.

Anything with more than one in-sequence fragment is a good
candidate for keeping (especially if the inter-fragment time
is being monitored).

Anything with out of sequence fragments is a very good choice
for discard.

But yes, discarding some 'new' fragments is very likely necessary
in order to manage to actually receive anything.

Sometimes I've wondered whether discarding some transmits would
help when the rx side is congested (and v.v.). I've never tried
modelling that one with any traffic flows! But forcing the remote
to timeout might reduce the overall traffic, trouble is the
discards would have to be carefully chosen!
(I was thinking about this mainly with regard to routers that
are having issues feeding outbound data into a narrow pipe.)

	David