[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1353810665.2590.4774.camel@edumazet-glaptop>
Date: Sat, 24 Nov 2012 18:31:05 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Florian Westphal <fw@...len.de>, netdev@...r.kernel.org,
Pablo Neira Ayuso <pablo@...filter.org>,
Thomas Graf <tgraf@...g.ch>, Cong Wang <amwang@...hat.com>,
Patrick McHardy <kaber@...sh.net>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Herbert Xu <herbert@...dor.hengli.com.au>
Subject: Re: [RFC net-next PATCH V1 0/9] net: fragmentation performance
scalability on NUMA/SMP systems
On Fri, 2012-11-23 at 14:08 +0100, Jesper Dangaard Brouer wrote:
> This patchset implements significant performance improvements for
> fragmentation handling in the kernel, with a focus on NUMA and SMP
> based systems.
>
> Review:
>
> Please review these patches. I have on purpose added comments in the
> code with the "//" comments style. These comments are to be removed
> before applying. They serve as a questions to, you, the reviewer.
>
> The fragmentation code today:
>
> The fragmentation code "protects" kernel resources, by implementing
> some memory resource limitation code. This is centered around a
> global readers-writer lock, and (per network namespace) an atomic mem
> counter and a LRU (Least-Recently-Used) list. (Although separate
> global variables and namespace resources, are kept for IPv4, IPv6
> and Netfilter reassembly.)
>
> The code tries to keep the memory usage between a high and low
> threshold (see: /proc/sys/net/ipv4/ipfrag_{high,low}_thresh). The
> "evictor" code cleans up fragments, when the high threshold is
> exceeded, and stops only, when the low threshold is reached.
>
> The scalability problem:
>
> Having a global/central variable for a resource limit is obviously a
> scalability issue on SMP systems, and even amplified on a NUMA based
> system.
>
But ... , what practical workload even use fragments ?
Sure, netperf -t UDP_STREAM uses frags, but its a benchmark.
The only heavy user was NFS in the days it was using UDP, a very long
time ago.
A single lost fragment means the whole packet is lost.
Another problem with fragments is the lack of 4-tuple hashing, as only
the first frag contains the dst/src ports.
Also there is the sysctl_ipfrag_max_dist issue...
Hint : many NIC provide TSO (TCP offload), but none provide UFO,
probably because there is no demand for it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists