[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <FD8FD9BD-1E94-4A84-88EB-3A1531BCF556@gmail.com>
Date: Tue, 10 Aug 2021 19:33:28 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Olivier Langlois <olivier@...llion01.com>,
Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task
work
> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@...il.com> wrote:
>
> On 8/10/21 9:28 AM, Nadav Amit wrote:
>>
>> Unfortunately, there seems to be yet another issue (unless my code
>> somehow caused it). It seems that when SQPOLL is used, there are cases
>> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
>> never goes down to zero.
>>
>> Debugging... (while also trying to make some progress with my code)
>
> It's most likely because a request has been lost (mis-refcounted).
> Let us know if you need any help. Would be great to solve it for 5.14.
> quick tips:
>
> 1) if not already, try out Jens' 5.14 branch
> git://git.kernel.dk/linux-block io_uring-5.14
>
> 2) try to characterise the io_uring use pattern. Poll requests?
> Read/write requests? Send/recv? Filesystem vs bdev vs sockets?
>
> If easily reproducible, you can match io_alloc_req() with it
> getting into io_dismantle_req();
So actually the problem is more of a missing IO-uring functionality that I need. When an I/O is queued for async completion (i.e., after returning -EIOCBQUEUED), there should be a way for io-uring to cancel these I/Os if needed. Otherwise they might potentially never complete, as happens in my use-case.
AIO has ki_cancel() for this matter. So I presume the proper solution would be to move ki_cancel() from aio_kiocb to kiocb so it can be used by both io-uring and aio. And then - to use this infrastructure.
But it is messy. There is already a bug in the (few) uses of kiocb_set_cancel_fn() that blindly assume AIO is used and not IO-uring. Then, I am not sure about some things in the AIO code. Oh boy. I’ll work on an RFC.
Powered by blists - more mailing lists