[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALDO+SYvxrnqKy=7P2c_agWDJEn2spwNQr0ynE2-JTfWPAaEEg@mail.gmail.com>
Date: Tue, 10 Apr 2018 07:14:18 -0700
From: William Tu <u9012063@...il.com>
To: Björn Töpel <bjorn.topel@...il.com>
Cc: "Karlsson, Magnus" <magnus.karlsson@...el.com>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Alexander Duyck <alexander.duyck@...il.com>,
John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <ast@...com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Björn Töpel <bjorn.topel@...el.com>,
michael.lundkvist@...csson.com,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
Anjali Singhai Jain <anjali.singhai@...el.com>,
"Zhang, Qi Z" <qi.z.zhang@...el.com>, ravineet.singh@...csson.com
Subject: Re: [RFC PATCH v2 00/14] Introducing AF_XDP support
On Mon, Apr 9, 2018 at 11:47 PM, Björn Töpel <bjorn.topel@...il.com> wrote:
> 2018-04-09 23:51 GMT+02:00 William Tu <u9012063@...il.com>:
>> On Tue, Mar 27, 2018 at 9:59 AM, Björn Töpel <bjorn.topel@...il.com> wrote:
>>> From: Björn Töpel <bjorn.topel@...el.com>
>>>
>>> This RFC introduces a new address family called AF_XDP that is
>>> optimized for high performance packet processing and, in upcoming
>>> patch sets, zero-copy semantics. In this v2 version, we have removed
>>> all zero-copy related code in order to make it smaller, simpler and
>>> hopefully more review friendly. This RFC only supports copy-mode for
>>> the generic XDP path (XDP_SKB) for both RX and TX and copy-mode for RX
>>> using the XDP_DRV path. Zero-copy support requires XDP and driver
>>> changes that Jesper Dangaard Brouer is working on. Some of his work is
>>> already on the mailing list for review. We will publish our zero-copy
>>> support for RX and TX on top of his patch sets at a later point in
>>> time.
>>>
>>> An AF_XDP socket (XSK) is created with the normal socket()
>>> syscall. Associated with each XSK are two queues: the RX queue and the
>>> TX queue. A socket can receive packets on the RX queue and it can send
>>> packets on the TX queue. These queues are registered and sized with
>>> the setsockopts XDP_RX_QUEUE and XDP_TX_QUEUE, respectively. It is
>>> mandatory to have at least one of these queues for each socket. In
>>> contrast to AF_PACKET V2/V3 these descriptor queues are separated from
>>> packet buffers. An RX or TX descriptor points to a data buffer in a
>>> memory area called a UMEM. RX and TX can share the same UMEM so that a
>>> packet does not have to be copied between RX and TX. Moreover, if a
>>> packet needs to be kept for a while due to a possible retransmit, the
>>> descriptor that points to that packet can be changed to point to
>>> another and reused right away. This again avoids copying data.
>>>
>>> This new dedicated packet buffer area is called a UMEM. It consists of
>>> a number of equally size frames and each frame has a unique frame
>>> id. A descriptor in one of the queues references a frame by
>>> referencing its frame id. The user space allocates memory for this
>>> UMEM using whatever means it feels is most appropriate (malloc, mmap,
>>> huge pages, etc). This memory area is then registered with the kernel
>>> using the new setsockopt XDP_UMEM_REG. The UMEM also has two queues:
>>> the FILL queue and the COMPLETION queue. The fill queue is used by the
>>> application to send down frame ids for the kernel to fill in with RX
>>> packet data. References to these frames will then appear in the RX
>>> queue of the XSK once they have been received. The completion queue,
>>> on the other hand, contains frame ids that the kernel has transmitted
>>> completely and can now be used again by user space, for either TX or
>>> RX. Thus, the frame ids appearing in the completion queue are ids that
>>> were previously transmitted using the TX queue. In summary, the RX and
>>> FILL queues are used for the RX path and the TX and COMPLETION queues
>>> are used for the TX path.
>>>
>> Can we register a UMEM to multiple device's queue?
>>
>
> No, one UMEM, one netdev queue in this RFC. That being said, there's
> nothing stopping a user from creating an additional UMEM, say UMEM',
> pointing to the same memory as UMEM, but bound to another
> netdev/queue. Note that the user space application has to make sure
> that the buffer handling is sane (user/kernel frame ownership).
>
> We used to allow to share UMEM between unrelated sockets, but after
> the introduction of the UMEM queues (fill/completion) that's no the
> case any more. For the zero-copy scenario, having to manage multiple
> DMA mappings per UMEM was a bit of a mess, so we went for the simpler
> (current) solution with one UMEM per netdev/queue.
>
>> So far the l2fwd sample code is sending/receiving from the same
>> queue. I'm thinking about forwarding packets from one device to another.
>> Now I'm copying packets from one device's RX desc to another device's TX
>> completion queue. But this introduces one extra copy.
>>
>
> So you've setup two identical UMEMs? Then you can just forward the
> incoming Rx descriptor to the other netdev's Tx queue. Note, that you
> only need to copy the descriptor, not the actual frame data.
>
Thanks!
I will give it a try, I guess you're saying I can do below:
int sfd1; // for device1
int sfd2; // for device2
...
// create 2 umem
umem1 = calloc(1, sizeof(*umem));
umem2 = calloc(1, sizeof(*umem));
// allocate 1 shared buffer, 1 xdp_umem_reg
posix_memalign(&bufs, ...)
mr.addr = (__u64)bufs; // shared for umem1,2
...
// umem reg the same mr
setsockopt(sfd1, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr))
setsockopt(sfd2, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr))
// setup fill, completion, mmap for sfd1 and sfd2
...
Since both device can put frame data in 'bufs', I only need to copy
the descs between 2 umem1 and umem2. Am I understand correct?
Regards,
William
Powered by blists - more mailing lists