[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1317966007.3457.47.camel@edumazet-laptop>
Date: Fri, 07 Oct 2011 07:40:07 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: starlight@...nacle.cx
Cc: linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Christoph Lameter <cl@...two.org>, Willy Tarreau <w@....eu>,
Ingo Molnar <mingo@...e.hu>,
Stephen Hemminger <stephen.hemminger@...tta.com>,
Benjamin LaHaise <bcrl@...ck.org>,
Joe Perches <joe@...ches.com>,
Chetan Loke <Chetan.Loke@...scout.com>,
Con Kolivas <conman@...ivas.org>,
Serge Belyshev <belyshev@...ni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18 -> 2.6.32
Le jeudi 06 octobre 2011 à 23:27 -0400, starlight@...nacle.cx a écrit :
> After writing the last post, the large
> difference in IRQ rate between the older
> and newer kernels caught my eye.
>
> I wonder if the hugely lower rate in the older
> kernels reflects a more agile shifting
> into and out of NAPI mode by the network
> bottom-half.
>
> In this test the sending system
> pulses data out on millisecond boundaries
> due to the behavior of nsleep(), which
> is used to establish the playback pace.
>
> If the older kernels are switching to NAPI
> for much of surge and the switching out
> once the pulse falls off, it might
> conceivably result in much better latency
> and overall performance.
>
> All tests were run with Intel 82571
> network interfaces and the 'e1000e'
> device driver. Some used the driver
> packaged with the kernel, some used
> Intel driver compiled from the source
> found on sourceforge.net. Never could
> detected any difference between the two.
>
> Since data in the production environment
> also tends to arrive in bursts, I don't find
> the pulsing playback behavior a detriment.
>
Thats exactly the opposite : Your old kernel is not fast enough to
enter/exit NAPI on every incoming frame.
Instead of one IRQ per incoming frame, you have less interrupts :
A napi run processes more than 1 frame.
Now increase your incoming rate, and you'll discover a new kernel will
be able to process more frames without losses.
About your thread model :
You have one thread that reads the incoming frame, and do a distribution
on several queues based on some flow parameters. Then you wakeup a
second thread.
This kind of model is very expensive and triggers lot of false sharing.
New kernels are able to perform this fanout in kernel land.
You really should take a look at Documentation/networking/scaling.txt
[ An other way of doing this fanout is using some iptables rules :
check following commit changelog for an idea ]
commit e8648a1fdb54da1f683784b36a17aa65ea56e931
Author: Eric Dumazet <eric.dumazet@...il.com>
Date: Fri Jul 23 12:59:36 2010 +0200
netfilter: add xt_cpu match
In some situations a CPU match permits a better spreading of
connections, or select targets only for a given cpu.
With Remote Packet Steering or multiqueue NIC and appropriate IRQ
affinities, we can distribute trafic on available cpus, per session.
(all RX packets for a given flow is handled by a given cpu)
Some legacy applications being not SMP friendly, one way to scale a
server is to run multiple copies of them.
Instead of randomly choosing an instance, we can use the cpu number as a
key so that softirq handler for a whole instance is running on a single
cpu, maximizing cache effects in TCP/UDP stacks.
Using NAT for example, a four ways machine might run four copies of
server application, using a separate listening port for each instance,
but still presenting an unique external port :
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 0 \
-j REDIRECT --to-port 8080
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 1 \
-j REDIRECT --to-port 8081
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 2 \
-j REDIRECT --to-port 8082
iptables -t nat -A PREROUTING -p tcp --dport 80 -m cpu --cpu 3 \
-j REDIRECT --to-port 8083
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists