[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec1647f3-2c37-04be-bdbd-ab78b9f07a03@gmail.com>
Date: Mon, 21 Feb 2022 18:00:17 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Dylan Yudaken <dylany@...com>, Jens Axboe <axboe@...nel.dk>,
io-uring@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw
On 2/21/22 14:16, Dylan Yudaken wrote:
> In read/write ops, preincrement f_pos when no offset is specified, and
> then attempt fix up the position after IO completes if it completed less
> than expected. This fixes the problem where multiple queued up IO will all
> obtain the same f_pos, and so perform the same read/write.
>
> This is still not as consistent as sync r/w, as it is able to advance the
> file offset past the end of the file. It seems it would be quite a
> performance hit to work around this limitation - such as by keeping track
> of concurrent operations - and the downside does not seem to be too
> problematic.
>
> The attempt to fix up the f_pos after will at least mean that in situations
> where a single operation is run, then the position will be consistent.
>
> Co-developed-by: Jens Axboe <axboe@...nel.dk>
> Signed-off-by: Jens Axboe <axboe@...nel.dk>
> Signed-off-by: Dylan Yudaken <dylany@...com>
> ---
> fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 68 insertions(+), 13 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index abd8c739988e..a951d0754899 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb *kiocb, ssize_t ret)
[...]
> + return false;
> }
> }
> - return is_stream ? NULL : &kiocb->ki_pos;
> + *ppos = is_stream ? NULL : &kiocb->ki_pos;
> + return false;
> +}
> +
> +static inline void
> +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64 actual)
That's a lot of inlining, I wouldn't be surprised if the compiler
will even refuse to do that.
io_kiocb_done_pos() {
// rest of it
}
inline io_kiocb_done_pos() {
if (!(flags & CUR_POS));
return;
__io_kiocb_done_pos();
}
io_kiocb_update_pos() is huge as well
> +{
> + u64 expected;
> +
> + if (likely(!(req->flags & REQ_F_CUR_POS)))
> + return;
> +
> + expected = req->rw.len;
> + if (actual >= expected)
> + return;
> +
> + /*
> + * It's not definitely safe to lock here, and the assumption is,
> + * that if we cannot lock the position that it will be changing,
> + * and if it will be changing - then we can't update it anyway
> + */
> + if (req->file->f_mode & FMODE_ATOMIC_POS
> + && !mutex_trylock(&req->file->f_pos_lock))
> + return;
> +
> + /*
> + * now we want to move the pointer, but only if everything is consistent
> + * with how we left it originally
> + */
> + if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
> + req->file->f_pos = kiocb->ki_pos;
I wonder, is it good enough / safe to just assign it considering that
the request was executed outside of locks? vfs_seek()?
> +
> + /* else something else messed with f_pos and we can't do anything */
> +
> + if (req->file->f_mode & FMODE_ATOMIC_POS)
> + mutex_unlock(&req->file->f_pos_lock);
> }
Do we even care about races while reading it? E.g.
pos = READ_ONCE();
>
> - ppos = io_kiocb_update_pos(req, kiocb);
> -
> ret = rw_verify_area(READ, req->file, ppos, req->result);
> if (unlikely(ret)) {
> kfree(iovec);
> + io_kiocb_done_pos(req, kiocb, 0);
Why do we update it on failure?
[...]
> - ppos = io_kiocb_update_pos(req, kiocb);
> -
> ret = rw_verify_area(WRITE, req->file, ppos, req->result);
> if (unlikely(ret))
> goto out_free;
> @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
> return ret ?: -EAGAIN;
> }
> out_free:
> + io_kiocb_done_pos(req, kiocb, 0);
Looks weird. It appears we don't need it on failure and
successes are covered by kiocb_done() / ->ki_complete
> /* it's reportedly faster than delegating the null check to kfree() */
> if (iovec)
> kfree(iovec);
--
Pavel Begunkov
Powered by blists - more mailing lists