lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 10 Aug 2021 19:33:28 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Pavel Begunkov <asml.silence@...il.com>
Cc:     Olivier Langlois <olivier@...llion01.com>,
        Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task
 work



> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@...il.com> wrote:
> 
> On 8/10/21 9:28 AM, Nadav Amit wrote:
>> 
>> Unfortunately, there seems to be yet another issue (unless my code
>> somehow caused it). It seems that when SQPOLL is used, there are cases
>> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
>> never goes down to zero.
>> 
>> Debugging... (while also trying to make some progress with my code)
> 
> It's most likely because a request has been lost (mis-refcounted).
> Let us know if you need any help. Would be great to solve it for 5.14.
> quick tips: 
> 
> 1) if not already, try out Jens' 5.14 branch
> git://git.kernel.dk/linux-block io_uring-5.14
> 
> 2) try to characterise the io_uring use pattern. Poll requests?
> Read/write requests? Send/recv? Filesystem vs bdev vs sockets?
> 
> If easily reproducible, you can match io_alloc_req() with it
> getting into io_dismantle_req();

So actually the problem is more of a missing IO-uring functionality that I need. When an I/O is queued for async completion (i.e., after returning -EIOCBQUEUED), there should be a way for io-uring to cancel these I/Os if needed. Otherwise they might potentially never complete, as happens in my use-case.

AIO has ki_cancel() for this matter. So I presume the proper solution would be to move ki_cancel() from aio_kiocb to kiocb so it can be used by both io-uring and aio. And then - to use this infrastructure.

But it is messy. There is already a bug in the (few) uses of kiocb_set_cancel_fn() that blindly assume AIO is used and not IO-uring. Then, I am not sure about some things in the AIO code. Oh boy. I’ll work on an RFC.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ