[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <6.2.5.6.2.20111007143050.039bd578@binnacle.cx>
Date: Fri, 07 Oct 2011 14:37:47 -0400
From: starlight@...nacle.cx
To: chetan loke <loke.chetan@...il.com>
Cc: Eric Dumazet <eric.dumazet@...il.com>,
linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Christoph Lameter <cl@...two.org>, Willy Tarreau <w@....eu>,
Ingo Molnar <mingo@...e.hu>,
Stephen Hemminger <stephen.hemminger@...tta.com>,
Benjamin LaHaise <bcrl@...ck.org>,
Joe Perches <joe@...ches.com>, lokechetan@...il.com,
Con Kolivas <conman@...ivas.org>,
Serge Belyshev <belyshev@...ni.sinp.msu.ru>
Subject: Re: big picture UDP/IP performance question re 2.6.18
-> 2.6.32
At 02:09 PM 10/7/2011 -0400, chetan loke wrote:
>I'm a little confused. Seems like there are
>conflicting goals. If you want to bypass the
>kernel-protocol-stack then you have the following
>options: a) kernel af_packet. This is where we
>would get a chance to test all the kernel features
>etc.
Perhaps I haven't been sufficiently clear.
The "packet socket" mode I refer to in the
earlier post was using AF/PF_PACKET mode sockets
as in
socket(PF_PACKET, SOCK_RAW, eth_p_all);
Have run it in both normal and memory mapped
modes. MMAP mode is a slight bit more expensive
due to the cache pressure from the additional
copy. On the 6174 MMAP seems to be a smidgen
better in certain tests, but in the end both
read() and mapped approaches are effectively
identical on performance--and generally match
the cost of UDP sockets almost exactly.
b) Use non-commodity(?) NICs(from vendors
>you mentioned): where it might have some on-board
>memory(cushion) and so it can absorb the spikes
>and can also smoothen out too many
>PCI-transactions for bursty (and small payload -
>as in 64 byte traffic). But wait, when you use the
>libs provided by these vendors, then their
>driver(especially the Rx path) is not so much
>working in inline mode as NIC drivers in case a)
>above. This driver with a special Rx-path purely
>exists for managing your mmap'd queues.So
>of-course it's going to be faster that the
>traditional inline drivers. In this partial-inline
>mode, the adapter might i) batch the packets and
>ii) send a single notification to the
>host-side. With that single event you are now
>processing 1+ packets.
Kernel bypass is probably the best answer for
what we do. Problem has been lack of maturity
in their driver software. Looks like it's reaching
a point where they cover our use case. As mentioned
earlier, Solarflare could not match the Intel
82599 + ixgbe for this app last year. Was a
disaster. Myricom is focused on UDP (better
for us), but only just added multi-core IRQ
doorbell wakeups in recent months. Previously
one had to accept all IRQs on a single core or
poll, neither of which works for us.
>You got it. In case of tilera there are two modes:
>tile-cpu in device mode: beats most of the
>non-COTS NICs. It runs linux on the adapter
>side. Imagine having the flexibility/power to
>program the ASIC using your favorite OS. Its
>orgasmic. So go for it! tile-cpu in host-mode:
>Yes, it could be a game changer.
We almost went for the 1st gen Tile64 outboard
NIC approach, but were concerned about whether
they would survive--still are. Intel has
crushed more than a few competitors along
the way. If Google or Facebook buys into the
Tile-Gx it becomes a safe choice overnight.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists