lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Apr 2019 10:55:02 -0700
From:   "Jonathan Lemon" <jonathan.lemon@...il.com>
To:     "Björn Töpel" <bjorn.topel@...il.com>
Cc:     Netdev <netdev@...r.kernel.org>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>, kernel-team@...com,
        "Jesper Dangaard Brouer" <brouer@...hat.com>,
        maciej.fijalkowski@...el.com
Subject: Re: [PATCH RFC] xdp: Support zero-copy XDP_TX from AF_XDP sockets.

On 18 Apr 2019, at 3:10, Björn Töpel wrote:

> On Wed, 17 Apr 2019 at 21:58, Jonathan Lemon 
> <jonathan.lemon@...il.com> wrote:
>>
>> When the XDP program attached to a zero-copy AF_XDP socket returns 
>> XDP_TX,
>> queue the umem frame on the XDP TX ring, and arrange for it to be 
>> released
>> via the ZCA free routine, which should place it back onto the reuseq.
>>
>
> There are a bunch of compiler errors, but I'll leave them out from the 
> comments!

Yes, sigh - generated from the wrong tree. I did compile it, honest!


>> +static int i40e_xmit_rcvd_zc(struct i40e_ring *rx_ring, struct 
>> xdp_buff *xdp)
>> +{
>> +       struct i40e_ring *xdp_ring;
>> +       struct i40e_tx_desc *tx_desc;
>> +       struct i40e_tx_buffer *tx_bi;
>> +       struct xdp_frame *xdpf;
>> +       dma_addr_t dma;
>> +
>> +       xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
>> +
>> +       if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
>> +               xdp_ring->tx_stats.tx_busy++;
>> +               return I40E_XDP_CONSUMED;
>> +       }
>> +       xpdf = convert_to_xdp_frame_keep_zc(xdp);
>> +       if (unlikely(!xpdf)
>> +               return I40E_XDP_CONSUMED;
>> +       xpdf->handle = xdp->handle;
>> +
>> +       dma = xdp_umem_get_dma(rx_ring->xsk_umem, xdp->handle);
>> +       tx_bi = &xdp_ring->tx_bi[xdp_ring->next_to_use];
>> +       tx_bi->bytecount = xpdf->len;
>> +       tx_bi->gso_segs = 1;
>> +       tx_bi->xdpf = xdpf;
>> +       tx_bi->tx_flags = I40E_TX_FLAGS_ZC_FRAME;
>> +
>> +       tx_desc = I40E_TX_DESC(xdp_ring, xdp_ring->next_to_use);
>> +       tx_desc->buffer_addr = cpu_to_le64(dma);
>> +       tx_desc->cmd_type_offset_bsz = 
>> build_ctob(I40E_TX_DESC_CMD_ICRC |
>> +                                                 
>> I40E_TX_DESC_CMD_EOP,
>> +                                                 0, xpdf->len, 0);
>> +       smp_wmb();
>> +
>> +       xdp_ring->next_to_use++;
>> +       if (xdp_ring->next_to_use == xdp_ring->count)
>> +               xdp_ring->next_to_use = 0;
>> +
>> +       tx_bi->next_to_watch = tx_desc;
>> +
>> +       return I40E_XDP_TX;
>> +}
>
> What you're basically doing here is a AF_XDP Tx, but triggered from
> the Rx path, and instead of completion (after the packet is sent) to
> the completion ring, it's recycled to the Rx HW ring (via zca_free). I
> like the idea but we need more plumbing first. Let me expand;
> Unfortunately, the current recycle mechanism requires that at the
> point of recycling, there has to be space in Rx ring. In the XDP_TX
> case, there's no completion ring, and we cannot guarantee that there's
> space on Rx ring (since Rx and Tx are asynchronous).

Yes - this is why it's an RFC. Your current design keeps the RX reuse
context in the driver itself, instead of using the reuse queue provided
by umem. I believe the driver only uses it when the ring is torn down,
so this needs to be addressed.

Since we're transmitting something that was just received, on TX 
completion
the buffer should logically be placed back into the fill queue, but with 
the
current SPSC mechanism, this isn't possible. Hence the ReuseQ. I think 
this
should be transparently be made part of the FQ mechanism, so the RQ is 
always
checked before using the FQ. I believe xsk_umem_peek_addr_rq() was added
for this purpose, but should really be the main API.


> IOW, Rx recycling
> can currently *only* be done in an Rx context.

I was assuming that the Tx was part of the same VSI, so the Rx ring 
would
be accessible from the napi_poll() routine.


> What I would like to do, is moving i40e-xsk to Jesper's page-pool,
> instead of the existing recycle mechanism. Then we could return the
> descriptor to the pool, if the Rx ring doesn't have space for the
> completed/sent buffer.

Hm, not sure how this would help? The xdp_return() goes to the ZCA free
routine for zero-copy buffers, bypassing the page pool, doesn't it?


> TL;DR version: Passing zc-frames in XDP_TX cannot be done properly
> until the Rx recycle mechanism is more robust. :-(

Agree. So i40e_zca_free() would look more like:

   if (space in rx_bi)
     /* locally cache buffer */
     i40e_reuse_rx_buffer_zc(rx, buf);
   else
     xsk_umem_fq_reuse(rx->umem, buf)

and the the two alloc_rx_buffers_*_zc() would need to be adjusted?
-- 
Jonathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ