[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ea8c29a-25bb-c632-9622-3a8b123ce32c@fb.com>
Date: Tue, 14 Nov 2017 07:50:48 +0800
From: Alexei Starovoitov <ast@...com>
To: Björn Töpel <bjorn.topel@...il.com>,
"Karlsson, Magnus" <magnus.karlsson@...el.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
<michael.lundkvist@...csson.com>, <ravineet.singh@...csson.com>,
Daniel Borkmann <daniel@...earbox.net>,
Netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Tushar Dave <tushar.n.dave@...cle.com>,
<eric.dumazet@...il.com>
CC: Björn Töpel <bjorn.topel@...el.com>,
<jesse.brandeburg@...el.com>, <anjali.singhai@...el.com>,
<rami.rosen@...el.com>, <jeffrey.b.shaw@...el.com>,
<ferruh.yigit@...el.com>, <qi.z.zhang@...el.com>,
<davem@...emloft.net>
Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support
On 11/13/17 9:07 PM, Björn Töpel wrote:
> 2017-10-31 13:41 GMT+01:00 Björn Töpel <bjorn.topel@...il.com>:
>> From: Björn Töpel <bjorn.topel@...el.com>
>>
> [...]
>>
>> We'll do a presentation on AF_PACKET V4 in NetDev 2.2 [1] Seoul,
>> Korea, and our paper with complete benchmarks will be released shortly
>> on the NetDev 2.2 site.
>>
>
> We're back in the saddle after an excellent netdevconf week. Kudos to
> the organizers; We had a blast! Thanks for all the constructive
> feedback.
>
> I'll summarize the major points, that we'll address in the next RFC
> below.
>
> * Instead of extending AF_PACKET with yet another version, introduce a
> new address/packet family. As for naming had some name suggestions:
> AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for
> AF_ZEROCOPY, unless there're no strong opinions against it.
>
> * No explicit zerocopy enablement. Use the zeropcopy path if
> supported, if not -- fallback to the skb path, for netdevs that
> don't support the required ndos. Further, we'll have the zerocopy
> behavior for the skb path as well, meaning that an AF_ZEROCOPY
> socket will consume the skb and we'll honor skb->queue_mapping,
> meaning that we only consume the packets for the enabled queue.
>
> * Limit the scope of the first patchset to Rx only, and introduce Tx
> in a separate patchset.
all sounds good to me except above bit.
I don't remember people suggesting to split it this way.
What's the value of it without tx?
> * Minimize the size of the i40e zerocopy patches, by moving the driver
> specific code to separate patches.
>
> * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use
> XDP redirect map call with ingress flag.
>
> * Extend the XDP redirect to support explicit allocator/destructor
> functions. Right now, XDP redirect assumes that the page allocator
> was used, and the XDP redirect cleanup path is decreasing the page
> count of the XDP buffer. This assumption breaks for the zerocopy
> case.
>
>
> Björn
>
>
>> We based this patch set on net-next commit e1ea2f9856b7 ("Merge
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net").
>>
>> Please focus your review on:
>>
>> * The V4 user space interface
>> * PACKET_ZEROCOPY and its semantics
>> * Packet array interface
>> * XDP semantics when excuting in zero-copy mode (user space passed
>> buffers)
>> * XDP_PASS_TO_KERNEL semantics
>>
>> To do:
>>
>> * Investigate the user-space ring structure’s performance problems
>> * Continue the XDP integration into packet arrays
>> * Optimize performance
>> * SKB <-> V4 conversions in tp4a_populate & tp4a_flush
>> * Packet buffer is unnecessarily pinned for virtual devices
>> * Support shared packet buffers
>> * Unify V4 and SKB receive path in I40E driver
>> * Support for packets spanning multiple frames
>> * Disassociate the packet array implementation from the V4 queue
>> structure
>>
>> We would really like to thank the reviewers of the limited
>> distribution RFC for all their comments that have helped improve the
>> interfaces and the code significantly: Alexei Starovoitov, Alexander
>> Duyck, Jesper Dangaard Brouer, and John Fastabend. The internal team
>> at Intel that has been helping out reviewing code, writing tests, and
>> sanity checking our ideas: Rami Rosen, Jeff Shaw, Ferruh Yigit, and Qi
>> Zhang, your participation has really helped.
>>
>> Thanks: Björn and Magnus
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.netdevconf.org_2.2_&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qR6oNZj1CqLATni4ibTgAQ&m=lKyFxON3kKygiOgECLBfmqRwM7ZyXFSUvLED1vP-gos&s=44jzm1W8xkGyZSZVANRygzHz6y4XHbYrYBRM-K5RhTc&e=
>>
>> Björn Töpel (7):
>> packet: introduce AF_PACKET V4 userspace API
>> packet: implement PACKET_MEMREG setsockopt
>> packet: enable AF_PACKET V4 rings
>> packet: wire up zerocopy for AF_PACKET V4
>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Rx support
>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Tx support
>> samples/tpacket4: added tpbench
>>
>> Magnus Karlsson (7):
>> packet: enable Rx for AF_PACKET V4
>> packet: enable Tx support for AF_PACKET V4
>> netdevice: add AF_PACKET V4 zerocopy ops
>> veth: added support for PACKET_ZEROCOPY
>> samples/tpacket4: added veth support
>> i40e: added XDP support for TP4 enabled queue pairs
>> xdp: introducing XDP_PASS_TO_KERNEL for PACKET_ZEROCOPY use
>>
>> drivers/net/ethernet/intel/i40e/i40e.h | 3 +
>> drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 9 +
>> drivers/net/ethernet/intel/i40e/i40e_main.c | 837 ++++++++++++-
>> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 582 ++++++++-
>> drivers/net/ethernet/intel/i40e/i40e_txrx.h | 38 +
>> drivers/net/veth.c | 174 +++
>> include/linux/netdevice.h | 16 +
>> include/linux/tpacket4.h | 1502 ++++++++++++++++++++++++
>> include/uapi/linux/bpf.h | 1 +
>> include/uapi/linux/if_packet.h | 65 +-
>> net/packet/af_packet.c | 1252 +++++++++++++++++---
>> net/packet/internal.h | 9 +
>> samples/tpacket4/Makefile | 12 +
>> samples/tpacket4/bench_all.sh | 28 +
>> samples/tpacket4/tpbench.c | 1390 ++++++++++++++++++++++
>> 15 files changed, 5674 insertions(+), 244 deletions(-)
>> create mode 100644 include/linux/tpacket4.h
>> create mode 100644 samples/tpacket4/Makefile
>> create mode 100755 samples/tpacket4/bench_all.sh
>> create mode 100644 samples/tpacket4/tpbench.c
>>
>> --
>> 2.11.0
>>
Powered by blists - more mailing lists