[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNh5DWsT6uT9nvzPeUp=XFip5meDammfTXMdd4b6wDqqeQ@mail.gmail.com>
Date: Tue, 14 Nov 2017 06:33:59 +0100
From: Björn Töpel <bjorn.topel@...il.com>
To: Alexei Starovoitov <ast@...com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
michael.lundkvist@...csson.com, ravineet.singh@...csson.com,
Daniel Borkmann <daniel@...earbox.net>,
Netdev <netdev@...r.kernel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Tushar Dave <tushar.n.dave@...cle.com>, eric.dumazet@...il.com,
Björn Töpel <bjorn.topel@...el.com>,
jesse.brandeburg@...el.com, anjali.singhai@...el.com,
rami.rosen@...el.com, jeffrey.b.shaw@...el.com,
ferruh.yigit@...el.com, qi.z.zhang@...el.com, davem@...emloft.net
Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support
2017-11-14 0:50 GMT+01:00 Alexei Starovoitov <ast@...com>:
> On 11/13/17 9:07 PM, Björn Töpel wrote:
>>
>> 2017-10-31 13:41 GMT+01:00 Björn Töpel <bjorn.topel@...il.com>:
>>>
>>> From: Björn Töpel <bjorn.topel@...el.com>
>>>
>> [...]
>>>
>>>
>>> We'll do a presentation on AF_PACKET V4 in NetDev 2.2 [1] Seoul,
>>> Korea, and our paper with complete benchmarks will be released shortly
>>> on the NetDev 2.2 site.
>>>
>>
>> We're back in the saddle after an excellent netdevconf week. Kudos to
>> the organizers; We had a blast! Thanks for all the constructive
>> feedback.
>>
>> I'll summarize the major points, that we'll address in the next RFC
>> below.
>>
>> * Instead of extending AF_PACKET with yet another version, introduce a
>> new address/packet family. As for naming had some name suggestions:
>> AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for
>> AF_ZEROCOPY, unless there're no strong opinions against it.
>>
>> * No explicit zerocopy enablement. Use the zeropcopy path if
>> supported, if not -- fallback to the skb path, for netdevs that
>> don't support the required ndos. Further, we'll have the zerocopy
>> behavior for the skb path as well, meaning that an AF_ZEROCOPY
>> socket will consume the skb and we'll honor skb->queue_mapping,
>> meaning that we only consume the packets for the enabled queue.
>>
>> * Limit the scope of the first patchset to Rx only, and introduce Tx
>> in a separate patchset.
>
>
> all sounds good to me except above bit.
> I don't remember people suggesting to split it this way.
> What's the value of it without tx?
>
We definitely need Tx for our use-cases! I'll rephrase, so the
idea was making the initial patch set without Tx *driver*
specific code, e.g. use ndo_xdp_xmit/flush at a later point.
So AF_ZEROCOPY, the socket parts, would have Tx support.
@John Did I recall that correctly?
>> * Minimize the size of the i40e zerocopy patches, by moving the driver
>> specific code to separate patches.
>>
>> * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use
>> XDP redirect map call with ingress flag.
>>
>> * Extend the XDP redirect to support explicit allocator/destructor
>> functions. Right now, XDP redirect assumes that the page allocator
>> was used, and the XDP redirect cleanup path is decreasing the page
>> count of the XDP buffer. This assumption breaks for the zerocopy
>> case.
>>
>>
>> Björn
>>
>>
>>> We based this patch set on net-next commit e1ea2f9856b7 ("Merge
>>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net").
>>>
>>> Please focus your review on:
>>>
>>> * The V4 user space interface
>>> * PACKET_ZEROCOPY and its semantics
>>> * Packet array interface
>>> * XDP semantics when excuting in zero-copy mode (user space passed
>>> buffers)
>>> * XDP_PASS_TO_KERNEL semantics
>>>
>>> To do:
>>>
>>> * Investigate the user-space ring structure’s performance problems
>>> * Continue the XDP integration into packet arrays
>>> * Optimize performance
>>> * SKB <-> V4 conversions in tp4a_populate & tp4a_flush
>>> * Packet buffer is unnecessarily pinned for virtual devices
>>> * Support shared packet buffers
>>> * Unify V4 and SKB receive path in I40E driver
>>> * Support for packets spanning multiple frames
>>> * Disassociate the packet array implementation from the V4 queue
>>> structure
>>>
>>> We would really like to thank the reviewers of the limited
>>> distribution RFC for all their comments that have helped improve the
>>> interfaces and the code significantly: Alexei Starovoitov, Alexander
>>> Duyck, Jesper Dangaard Brouer, and John Fastabend. The internal team
>>> at Intel that has been helping out reviewing code, writing tests, and
>>> sanity checking our ideas: Rami Rosen, Jeff Shaw, Ferruh Yigit, and Qi
>>> Zhang, your participation has really helped.
>>>
>>> Thanks: Björn and Magnus
>>>
>>> [1]
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.netdevconf.org_2.2_&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qR6oNZj1CqLATni4ibTgAQ&m=lKyFxON3kKygiOgECLBfmqRwM7ZyXFSUvLED1vP-gos&s=44jzm1W8xkGyZSZVANRygzHz6y4XHbYrYBRM-K5RhTc&e=
>>>
>>>
>>> Björn Töpel (7):
>>> packet: introduce AF_PACKET V4 userspace API
>>> packet: implement PACKET_MEMREG setsockopt
>>> packet: enable AF_PACKET V4 rings
>>> packet: wire up zerocopy for AF_PACKET V4
>>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Rx support
>>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Tx support
>>> samples/tpacket4: added tpbench
>>>
>>> Magnus Karlsson (7):
>>> packet: enable Rx for AF_PACKET V4
>>> packet: enable Tx support for AF_PACKET V4
>>> netdevice: add AF_PACKET V4 zerocopy ops
>>> veth: added support for PACKET_ZEROCOPY
>>> samples/tpacket4: added veth support
>>> i40e: added XDP support for TP4 enabled queue pairs
>>> xdp: introducing XDP_PASS_TO_KERNEL for PACKET_ZEROCOPY use
>>>
>>> drivers/net/ethernet/intel/i40e/i40e.h | 3 +
>>> drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 9 +
>>> drivers/net/ethernet/intel/i40e/i40e_main.c | 837 ++++++++++++-
>>> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 582 ++++++++-
>>> drivers/net/ethernet/intel/i40e/i40e_txrx.h | 38 +
>>> drivers/net/veth.c | 174 +++
>>> include/linux/netdevice.h | 16 +
>>> include/linux/tpacket4.h | 1502
>>> ++++++++++++++++++++++++
>>> include/uapi/linux/bpf.h | 1 +
>>> include/uapi/linux/if_packet.h | 65 +-
>>> net/packet/af_packet.c | 1252
>>> +++++++++++++++++---
>>> net/packet/internal.h | 9 +
>>> samples/tpacket4/Makefile | 12 +
>>> samples/tpacket4/bench_all.sh | 28 +
>>> samples/tpacket4/tpbench.c | 1390
>>> ++++++++++++++++++++++
>>> 15 files changed, 5674 insertions(+), 244 deletions(-)
>>> create mode 100644 include/linux/tpacket4.h
>>> create mode 100644 samples/tpacket4/Makefile
>>> create mode 100755 samples/tpacket4/bench_all.sh
>>> create mode 100644 samples/tpacket4/tpbench.c
>>>
>>> --
>>> 2.11.0
>>>
>
Powered by blists - more mailing lists