[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <93256513-08d8-5b15-aa98-c1e83af60b54@gmail.com>
Date: Tue, 15 Jun 2021 11:01:18 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Olivier Langlois <olivier@...llion01.com>,
Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring: store back buffer in case of failure
On 6/11/21 8:46 PM, Olivier Langlois wrote:
> the man page says the following:
>
> If succesful, the resulting CQE will have IORING_CQE_F_BUFFER set
> in the flags part of the struct, and the upper IORING_CQE_BUFFER_SHIFT
> bits will contain the ID of the selected buffers.
>
> in order to respect this contract, the buffer is stored back in case of
> an error. There are several reasons to do that:
>
> 1. doing otherwise is counter-intuitive and error-prone (I cannot think
> of a single example of a syscall failing and still require the user to
> free the allocated resources). Especially when the documention
> explicitly mention that this is the behavior to expect.
>
> 2. it is inefficient because the buffer is unneeded since there is no
> data to transfer back to the user and the buffer will need to be
> returned back to io_uring to avoid a leak.
>
> Signed-off-by: Olivier Langlois <olivier@...llion01.com>
> ---
> fs/io_uring.c | 97 +++++++++++++++++++++++++++++++++------------------
> 1 file changed, 64 insertions(+), 33 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 42380ed563c4..502d7cd81a8c 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
[...]
> +static unsigned int io_put_kbuf(struct io_kiocb *req, struct io_buffer *kbuf,
> + u16 bgid, long res, unsigned int issue_flags)
> {
> unsigned int cflags;
> + struct io_ring_ctx *ctx = req->ctx;
> + struct io_buffer *head;
> + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>
> cflags = kbuf->bid << IORING_CQE_BUFFER_SHIFT;
> cflags |= IORING_CQE_F_BUFFER;
> req->flags &= ~REQ_F_BUFFER_SELECTED;
> - kfree(kbuf);
> +
> + /*
> + * Theoritically, res == 0 could be included as well but that would
> + * break the contract established in the man page saying that
> + * a buffer is returned if the operation is successful.
> + */
> + if (unlikely(res < 0)) {
> + io_ring_submit_lock(ctx, !force_nonblock);
io_complete_rw() is called from an IRQ context, so it can't sleep/wait.
> +
> + lockdep_assert_held(&ctx->uring_lock);
> +
> + head = xa_load(&ctx->io_buffers, bgid);
> + if (head) {
> + list_add_tail(&kbuf->list, &head->list);
> + cflags = 0;
> + } else {
> + INIT_LIST_HEAD(&kbuf->list);
> + if (!xa_insert(&ctx->io_buffers, bgid, kbuf, GFP_KERNEL))
> + cflags = 0;
> + }
> + io_ring_submit_unlock(ctx, !force_nonblock);
> + }
> + if (cflags)
> + kfree(kbuf);
> return cflags;
> }
>
[...]
> -static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
> +static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret,
> + unsigned int issue_flags)
> {
> switch (ret) {
> case -EIOCBQUEUED:
> @@ -2728,7 +2775,7 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
> ret = -EINTR;
> fallthrough;
> default:
> - kiocb->ki_complete(kiocb, ret, 0);
> + kiocb->ki_complete(kiocb, ret, issue_flags);
Don't remember what the second argument of .ki_complete is for,
but it definitely should not be used for issue_flags. E.g. because
we get two different meanings for it depending on a context --
either a result from the block layer or issue_flags.
--
Pavel Begunkov
Powered by blists - more mailing lists