[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3986f106-051d-46c8-8ec3-82558f670253@gmail.com>
Date: Thu, 21 Dec 2023 19:36:27 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: David Wei <dw@...idwei.uk>, io-uring@...r.kernel.org,
netdev@...r.kernel.org
Cc: Jens Axboe <axboe@...nel.dk>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jesper Dangaard Brouer
<hawk@...nel.org>, David Ahern <dsahern@...nel.org>,
Mina Almasry <almasrymina@...gle.com>
Subject: Re: [RFC PATCH v3 13/20] io_uring: implement pp memory provider for
zc rx
On 12/19/23 21:03, David Wei wrote:
> From: Pavel Begunkov <asml.silence@...il.com>
>
> We're adding a new pp memory provider to implement io_uring zerocopy
> receive. It'll be "registered" in pp and used in later paches.
>
> The typical life cycle of a buffer goes as follows: first it's allocated
> to a driver with the initial refcount set to 1. The drivers fills it
> with data, puts it into an skb and passes down the stack, where it gets
> queued up to a socket. Later, a zc io_uring request will be receiving
> data from the socket from a task context. At that point io_uring will
> tell the userspace that this buffer has some data by posting an
> appropriate completion. It'll also elevating the refcount by
> IO_ZC_RX_UREF, so the buffer is not recycled while userspace is reading
> the data. When the userspace is done with the buffer it should return it
> back to io_uring by adding an entry to the buffer refill ring. When
> necessary io_uring will poll the refill ring, compare references
> including IO_ZC_RX_UREF and reuse the buffer.
>
> Initally, all buffers are placed in a spinlock protected ->freelist.
> It's a slow path stash, where buffers are considered to be unallocated
> and not exposed to core page pool. On allocation, pp will first try
> all its caches, and the ->alloc_pages callback if everything else
> failed.
>
> The hot path for io_pp_zc_alloc_pages() is to grab pages from the refill
> ring. The consumption from the ring is always done in the attached napi
> context, so no additional synchronisation required. If that fails we'll
> be getting buffers from the ->freelist.
>
> Note: only ->freelist are considered unallocated for page pool, so we
> only add pages_state_hold_cnt when allocating from there. Subsequently,
> as page_pool_return_page() and others bump the ->pages_state_release_cnt
> counter, io_pp_zc_release_page() can only use ->freelist, which is not a
> problem as it's not a slow path.
>
> Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
> Signed-off-by: David Wei <dw@...idwei.uk>
> ---
...
> +static void io_zc_rx_ring_refill(struct page_pool *pp,
> + struct io_zc_rx_ifq *ifq)
> +{
> + unsigned int entries = io_zc_rx_rqring_entries(ifq);
> + unsigned int mask = ifq->rq_entries - 1;
> + struct io_zc_rx_pool *pool = ifq->pool;
> +
> + if (unlikely(!entries))
> + return;
> +
> + while (entries--) {
> + unsigned int rq_idx = ifq->cached_rq_head++ & mask;
> + struct io_uring_rbuf_rqe *rqe = &ifq->rqes[rq_idx];
> + u32 pgid = rqe->off / PAGE_SIZE;
> + struct io_zc_rx_buf *buf = &pool->bufs[pgid];
> +
> + if (!io_zc_rx_put_buf_uref(buf))
> + continue;
It's worth to note that here we have to add a dma sync as per
discussions with page pool folks.
> + io_zc_add_pp_cache(pp, buf);
> + if (pp->alloc.count >= PP_ALLOC_CACHE_REFILL)
> + break;
> + }
> + smp_store_release(&ifq->ring->rq.head, ifq->cached_rq_head);
> +}
> +
> +static void io_zc_rx_refill_slow(struct page_pool *pp, struct io_zc_rx_ifq *ifq)
> +{
> + struct io_zc_rx_pool *pool = ifq->pool;
> +
> + spin_lock_bh(&pool->freelist_lock);
> + while (pool->free_count && pp->alloc.count < PP_ALLOC_CACHE_REFILL) {
> + struct io_zc_rx_buf *buf;
> + u32 pgid;
> +
> + pgid = pool->freelist[--pool->free_count];
> + buf = &pool->bufs[pgid];
> +
> + io_zc_add_pp_cache(pp, buf);
> + pp->pages_state_hold_cnt++;
> + trace_page_pool_state_hold(pp, io_zc_buf_to_pp_page(buf),
> + pp->pages_state_hold_cnt);
> + }
> + spin_unlock_bh(&pool->freelist_lock);
> +}
...
--
Pavel Begunkov
Powered by blists - more mailing lists