[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68FE1809-84AE-4A38-8D01-EED0252460EC@gmail.com>
Date: Thu, 18 Apr 2019 10:55:02 -0700
From: "Jonathan Lemon" <jonathan.lemon@...il.com>
To: "Björn Töpel" <bjorn.topel@...il.com>
Cc: Netdev <netdev@...r.kernel.org>,
"Karlsson, Magnus" <magnus.karlsson@...el.com>, kernel-team@...com,
"Jesper Dangaard Brouer" <brouer@...hat.com>,
maciej.fijalkowski@...el.com
Subject: Re: [PATCH RFC] xdp: Support zero-copy XDP_TX from AF_XDP sockets.
On 18 Apr 2019, at 3:10, Björn Töpel wrote:
> On Wed, 17 Apr 2019 at 21:58, Jonathan Lemon
> <jonathan.lemon@...il.com> wrote:
>>
>> When the XDP program attached to a zero-copy AF_XDP socket returns
>> XDP_TX,
>> queue the umem frame on the XDP TX ring, and arrange for it to be
>> released
>> via the ZCA free routine, which should place it back onto the reuseq.
>>
>
> There are a bunch of compiler errors, but I'll leave them out from the
> comments!
Yes, sigh - generated from the wrong tree. I did compile it, honest!
>> +static int i40e_xmit_rcvd_zc(struct i40e_ring *rx_ring, struct
>> xdp_buff *xdp)
>> +{
>> + struct i40e_ring *xdp_ring;
>> + struct i40e_tx_desc *tx_desc;
>> + struct i40e_tx_buffer *tx_bi;
>> + struct xdp_frame *xdpf;
>> + dma_addr_t dma;
>> +
>> + xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
>> +
>> + if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
>> + xdp_ring->tx_stats.tx_busy++;
>> + return I40E_XDP_CONSUMED;
>> + }
>> + xpdf = convert_to_xdp_frame_keep_zc(xdp);
>> + if (unlikely(!xpdf)
>> + return I40E_XDP_CONSUMED;
>> + xpdf->handle = xdp->handle;
>> +
>> + dma = xdp_umem_get_dma(rx_ring->xsk_umem, xdp->handle);
>> + tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
>> + tx_bi->bytecount = xpdf->len;
>> + tx_bi->gso_segs = 1;
>> + tx_bi->xdpf = xdpf;
>> + tx_bi->tx_flags = I40E_TX_FLAGS_ZC_FRAME;
>> +
>> + tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use);
>> + tx_desc->buffer_addr = cpu_to_le64(dma);
>> + tx_desc->cmd_type_offset_bsz =
>> build_ctob(I40E_TX_DESC_CMD_ICRC |
>> +
>> I40E_TX_DESC_CMD_EOP,
>> + 0, xpdf->len, 0);
>> + smp_wmb();
>> +
>> + xdp_ring->next_to_use++;
>> + if (xdp_ring->next_to_use == xdp_ring->count)
>> + xdp_ring->next_to_use = 0;
>> +
>> + tx_bi->next_to_watch = tx_desc;
>> +
>> + return I40E_XDP_TX;
>> +}
>
> What you're basically doing here is a AF_XDP Tx, but triggered from
> the Rx path, and instead of completion (after the packet is sent) to
> the completion ring, it's recycled to the Rx HW ring (via zca_free). I
> like the idea but we need more plumbing first. Let me expand;
> Unfortunately, the current recycle mechanism requires that at the
> point of recycling, there has to be space in Rx ring. In the XDP_TX
> case, there's no completion ring, and we cannot guarantee that there's
> space on Rx ring (since Rx and Tx are asynchronous).
Yes - this is why it's an RFC. Your current design keeps the RX reuse
context in the driver itself, instead of using the reuse queue provided
by umem. I believe the driver only uses it when the ring is torn down,
so this needs to be addressed.
Since we're transmitting something that was just received, on TX
completion
the buffer should logically be placed back into the fill queue, but with
the
current SPSC mechanism, this isn't possible. Hence the ReuseQ. I think
this
should be transparently be made part of the FQ mechanism, so the RQ is
always
checked before using the FQ. I believe xsk_umem_peek_addr_rq() was added
for this purpose, but should really be the main API.
> IOW, Rx recycling
> can currently *only* be done in an Rx context.
I was assuming that the Tx was part of the same VSI, so the Rx ring
would
be accessible from the napi_poll() routine.
> What I would like to do, is moving i40e-xsk to Jesper's page-pool,
> instead of the existing recycle mechanism. Then we could return the
> descriptor to the pool, if the Rx ring doesn't have space for the
> completed/sent buffer.
Hm, not sure how this would help? The xdp_return() goes to the ZCA free
routine for zero-copy buffers, bypassing the page pool, doesn't it?
> TL;DR version: Passing zc-frames in XDP_TX cannot be done properly
> until the Rx recycle mechanism is more robust. :-(
Agree. So i40e_zca_free() would look more like:
if (space in rx_bi)
/* locally cache buffer */
i40e_reuse_rx_buffer_zc(rx, buf);
else
xsk_umem_fq_reuse(rx->umem, buf)
and the the two alloc_rx_buffers_*_zc() would need to be adjusted?
--
Jonathan
Powered by blists - more mailing lists