[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ+HfNgu+RE6G3=4dnyfaEZGxcw-NB1dAZKNgrZm0TFKooVdKQ@mail.gmail.com>
Date: Tue, 10 Apr 2018 08:47:08 +0200
From: Björn Töpel <bjorn.topel@...il.com>
To: William Tu <u9012063@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Björn Töpel <bjorn.topel@...el.com>,
michael.lundkvist@...csson.com,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
Anjali Singhai Jain <anjali.singhai@...el.com>,
"Zhang, Qi Z" <qi.z.zhang@...el.com>, ravineet.singh@...csson.com
Subject: Re: [RFC PATCH v2 00/14] Introducing AF_XDP support
2018-04-09 23:51 GMT+02:00 William Tu <u9012063@...il.com>:
> On Tue, Mar 27, 2018 at 9:59 AM, Björn Töpel <bjorn.topel@...il.com> wrote:
>> From: Björn Töpel <bjorn.topel@...el.com>
>>
>> This RFC introduces a new address family called AF_XDP that is
>> optimized for high performance packet processing and, in upcoming
>> patch sets, zero-copy semantics. In this v2 version, we have removed
>> all zero-copy related code in order to make it smaller, simpler and
>> hopefully more review friendly. This RFC only supports copy-mode for
>> the generic XDP path (XDP_SKB) for both RX and TX and copy-mode for RX
>> using the XDP_DRV path. Zero-copy support requires XDP and driver
>> changes that Jesper Dangaard Brouer is working on. Some of his work is
>> already on the mailing list for review. We will publish our zero-copy
>> support for RX and TX on top of his patch sets at a later point in
>> time.
>>
>> An AF_XDP socket (XSK) is created with the normal socket()
>> syscall. Associated with each XSK are two queues: the RX queue and the
>> TX queue. A socket can receive packets on the RX queue and it can send
>> packets on the TX queue. These queues are registered and sized with
>> the setsockopts XDP_RX_QUEUE and XDP_TX_QUEUE, respectively. It is
>> mandatory to have at least one of these queues for each socket. In
>> contrast to AF_PACKET V2/V3 these descriptor queues are separated from
>> packet buffers. An RX or TX descriptor points to a data buffer in a
>> memory area called a UMEM. RX and TX can share the same UMEM so that a
>> packet does not have to be copied between RX and TX. Moreover, if a
>> packet needs to be kept for a while due to a possible retransmit, the
>> descriptor that points to that packet can be changed to point to
>> another and reused right away. This again avoids copying data.
>>
>> This new dedicated packet buffer area is called a UMEM. It consists of
>> a number of equally size frames and each frame has a unique frame
>> id. A descriptor in one of the queues references a frame by
>> referencing its frame id. The user space allocates memory for this
>> UMEM using whatever means it feels is most appropriate (malloc, mmap,
>> huge pages, etc). This memory area is then registered with the kernel
>> using the new setsockopt XDP_UMEM_REG. The UMEM also has two queues:
>> the FILL queue and the COMPLETION queue. The fill queue is used by the
>> application to send down frame ids for the kernel to fill in with RX
>> packet data. References to these frames will then appear in the RX
>> queue of the XSK once they have been received. The completion queue,
>> on the other hand, contains frame ids that the kernel has transmitted
>> completely and can now be used again by user space, for either TX or
>> RX. Thus, the frame ids appearing in the completion queue are ids that
>> were previously transmitted using the TX queue. In summary, the RX and
>> FILL queues are used for the RX path and the TX and COMPLETION queues
>> are used for the TX path.
>>
> Can we register a UMEM to multiple device's queue?
>
No, one UMEM, one netdev queue in this RFC. That being said, there's
nothing stopping a user from creating an additional UMEM, say UMEM',
pointing to the same memory as UMEM, but bound to another
netdev/queue. Note that the user space application has to make sure
that the buffer handling is sane (user/kernel frame ownership).
We used to allow to share UMEM between unrelated sockets, but after
the introduction of the UMEM queues (fill/completion) that's no the
case any more. For the zero-copy scenario, having to manage multiple
DMA mappings per UMEM was a bit of a mess, so we went for the simpler
(current) solution with one UMEM per netdev/queue.
> So far the l2fwd sample code is sending/receiving from the same
> queue. I'm thinking about forwarding packets from one device to another.
> Now I'm copying packets from one device's RX desc to another device's TX
> completion queue. But this introduces one extra copy.
>
So you've setup two identical UMEMs? Then you can just forward the
incoming Rx descriptor to the other netdev's Tx queue. Note, that you
only need to copy the descriptor, not the actual frame data.
> One way I can do is to call bpf_redirect helper function, but sometimes
> I still need to process the packet in userspace.
>
> I like this work!
> Thanks a lot.
Happy to hear that, and thanks a bunch for trying it out. Keep that
feedback coming!
Björn
> William
Powered by blists - more mailing lists