[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1354269846.11754.381.camel@localhost>
Date: Fri, 30 Nov 2012 11:04:06 +0100
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: David Miller <davem@...emloft.net>, fw@...len.de,
netdev@...r.kernel.org, pablo@...filter.org, tgraf@...g.ch,
amwang@...hat.com, kaber@...sh.net, paulmck@...ux.vnet.ibm.com,
herbert@...dor.hengli.com.au
Subject: Re: [net-next PATCH V2 1/9] net: frag evictor, avoid killing warm
frag queues
On Thu, 2012-11-29 at 15:01 -0800, Eric Dumazet wrote:
> On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
>
> > For example lets give a threshold of 2000 MBytes:
> >
> > [root@...gon ~]# sysctl -w net/ipv4/ipfrag_high_thresh=$(((1024**2*2000)))
> > net.ipv4.ipfrag_high_thresh = 2097152000
> >
> > [root@...gon ~]# sysctl -w net/ipv4/ipfrag_low_thresh=$(((1024**2*2000)-655350))
> > net.ipv4.ipfrag_low_thresh = 2096496650
> >
> > 4x10 Netperf adjusted output:
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 229376 65507 20.00 298685 0 7826.35
> > 212992 20.00 27 0.71
> >
> > 229376 65507 20.00 366668 0 9607.71
> > 212992 20.00 13 0.34
> >
> > 229376 65507 20.00 254790 0 6676.20
> > 212992 20.00 14 0.37
> >
> > 229376 65507 20.00 309293 0 8104.33
> > 212992 20.00 15 0.39
> >
> > Can we agree that the current evictor strategy is broken?
>
> Not really, you drop packets because of another limit.
Then tell me which limit?
And notice the result is the same for 200 MBytes threshold.
As I wrote *just* above the section you quoted:
On Thu, 2012-11-29 at 23:17 +0100, Jesper Dangaard Brouer wrote:
[...] Thus, we must drop packets, or else the NIC will do it for
> us... for fragments we need do this "dropping" more intelligent.
So, I think it is the NIC dropping packets, in this case... what do you
claim?
I still claim the the current evictor strategy is broken!
We need to drop fragments more intelligently in software. As DaveM
correctly states, the code/algorithm needs some "probability
of fulfillment" taken into account. Which is actually what my evictor
code implements (I don't claim its perfect, as it currently does have
fairness/fair-queue issues, I have a plan for fixing it, but lets not
clutter up this answer).
So, let me instead show, with tests, that the evictor strategy is
broken, while keeping the original default thresh settings:
# grep . /proc/sys/net/ipv4/ipfrag_*_thresh
/proc/sys/net/ipv4/ipfrag_high_thresh:262144
/proc/sys/net/ipv4/ipfrag_low_thresh:196608
Test purpose, I will on a single 10G link demonstrate, that starting
several "N" netperf UDP fragmentation flows, will hurt performance, and
then claim this is caused by the bad evictor strategy.
Test setup:
- Disable Ethernet flow control
- netperf packet size 65507
- Run netserver on one NUMA node
- Start netperf clients against a NIC on the other NUMA node
- (The NUMA imbalance helps the effect occur at lower N)
Result: N=1 8040 Mbit/s
Result: N=2 9584 Mbit/s (4739+4845)
Result: N=3 4055 Mbit/s (1436+1371+1248)
Result: N=4 2247 Mbit/s (1538+29+54+626)
Result: N=5 879 Mbit/s (78+152+226+125+298)
Result: N=6 293 Mbit/s (85+55+32+57+46+18)
Result: N=7 354 Mbit/s (70+47+33+80+20+72+32)
Can we, now, agree that the current evictor strategy is broken?!?
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists