[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54b21f27-2c0d-1b3d-b35f-a88bdb766c54@linux.alibaba.com>
Date: Tue, 22 Feb 2022 15:20:44 +0800
From: Hao Xu <haoxu@...ux.alibaba.com>
To: Pavel Begunkov <asml.silence@...il.com>,
Dylan Yudaken <dylany@...com>, Jens Axboe <axboe@...nel.dk>,
io-uring@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v2 4/4] io_uring: pre-increment f_pos on rw
On 2/22/22 02:00, Pavel Begunkov wrote:
> On 2/21/22 14:16, Dylan Yudaken wrote:
>> In read/write ops, preincrement f_pos when no offset is specified, and
>> then attempt fix up the position after IO completes if it completed less
>> than expected. This fixes the problem where multiple queued up IO
>> will all
>> obtain the same f_pos, and so perform the same read/write.
>>
>> This is still not as consistent as sync r/w, as it is able to advance
>> the
>> file offset past the end of the file. It seems it would be quite a
>> performance hit to work around this limitation - such as by keeping
>> track
>> of concurrent operations - and the downside does not seem to be too
>> problematic.
>>
>> The attempt to fix up the f_pos after will at least mean that in
>> situations
>> where a single operation is run, then the position will be consistent.
>>
>> Co-developed-by: Jens Axboe <axboe@...nel.dk>
>> Signed-off-by: Jens Axboe <axboe@...nel.dk>
>> Signed-off-by: Dylan Yudaken <dylany@...com>
>> ---
>> fs/io_uring.c | 81 ++++++++++++++++++++++++++++++++++++++++++---------
>> 1 file changed, 68 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index abd8c739988e..a951d0754899 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -3066,21 +3066,71 @@ static inline void io_rw_done(struct kiocb
>> *kiocb, ssize_t ret)
>
> [...]
>
>> + return false;
>> }
>> }
>> - return is_stream ? NULL : &kiocb->ki_pos;
>> + *ppos = is_stream ? NULL : &kiocb->ki_pos;
>> + return false;
>> +}
>> +
>> +static inline void
>> +io_kiocb_done_pos(struct io_kiocb *req, struct kiocb *kiocb, u64
>> actual)
>
> That's a lot of inlining, I wouldn't be surprised if the compiler
> will even refuse to do that.
>
> io_kiocb_done_pos() {
> // rest of it
> }
>
> inline io_kiocb_done_pos() {
> if (!(flags & CUR_POS));
> return;
> __io_kiocb_done_pos();
> }
>
> io_kiocb_update_pos() is huge as well
>
>> +{
>> + u64 expected;
>> +
>> + if (likely(!(req->flags & REQ_F_CUR_POS)))
>> + return;
>> +
>> + expected = req->rw.len;
>> + if (actual >= expected)
>> + return;
>> +
>> + /*
>> + * It's not definitely safe to lock here, and the assumption is,
>> + * that if we cannot lock the position that it will be changing,
>> + * and if it will be changing - then we can't update it anyway
>> + */
>> + if (req->file->f_mode & FMODE_ATOMIC_POS
>> + && !mutex_trylock(&req->file->f_pos_lock))
>> + return;
>> +
>> + /*
>> + * now we want to move the pointer, but only if everything is
>> consistent
>> + * with how we left it originally
>> + */
>> + if (req->file->f_pos == kiocb->ki_pos + (expected - actual))
>> + req->file->f_pos = kiocb->ki_pos;
>
> I wonder, is it good enough / safe to just assign it considering that
> the request was executed outside of locks? vfs_seek()?
>
>> +
>> + /* else something else messed with f_pos and we can't do
>> anything */
>> +
>> + if (req->file->f_mode & FMODE_ATOMIC_POS)
>> + mutex_unlock(&req->file->f_pos_lock);
>> }
>
> Do we even care about races while reading it? E.g.
> pos = READ_ONCE();
>
>> - ppos = io_kiocb_update_pos(req, kiocb);
>> -
>> ret = rw_verify_area(READ, req->file, ppos, req->result);
>> if (unlikely(ret)) {
>> kfree(iovec);
>> + io_kiocb_done_pos(req, kiocb, 0);
>
> Why do we update it on failure?
It seems like a fallback, if no pos change, fallback file->f_pos to the
original place
>
> [...]
>
>> - ppos = io_kiocb_update_pos(req, kiocb);
>> -
>> ret = rw_verify_area(WRITE, req->file, ppos, req->result);
>> if (unlikely(ret))
>> goto out_free;
>> @@ -3858,6 +3912,7 @@ static int io_write(struct io_kiocb *req,
>> unsigned int issue_flags)
>> return ret ?: -EAGAIN;
>> }
>> out_free:
>> + io_kiocb_done_pos(req, kiocb, 0);
>
> Looks weird. It appears we don't need it on failure and
> successes are covered by kiocb_done() / ->ki_complete
>
>> /* it's reportedly faster than delegating the null check to
>> kfree() */
>> if (iovec)
>> kfree(iovec);
>
Powered by blists - more mailing lists